This page intentionally left blank
Quantum Computation and Quantum Information
10th Anniversary Edition
One of the most cited books in physics of all time, Quantum Computation and Quantum
Information remains the best textbook in this exciting field of science. This 10th
Anniversary Edition includes a new Introduction and Afterword from the authors
setting the work in context.
This comprehensive textbook describes such remarkable effects as fast quantum
algorithms, quantum teleportation, quantum cryptography, and quantum
error-correction. Quantum mechanics and computer science are introduced, before
moving on to describe what a quantum computer is, how it can be used to solve problems
faster than “classical” computers, and its real-world implementation. It concludes with
an in-depth treatment of quantum information.
Containing a wealth of figures and exercises, this well-known textbook is ideal for
courses on the subject, and will interest beginning graduate students and researchers in
physics, computer science, mathematics, and electrical engineering.
MICHAEL NIELSEN was educated at the University of Queensland, and as a Fulbright
Scholar at the University of New Mexico. He worked at Los Alamos National
Laboratory, as the Richard Chace Tolman Fellow at Caltech, was Foundation Professor
of Quantum Information Science and a Federation Fellow at the University of
Queensland, and a Senior Faculty Member at the Perimeter Institute for Theoretical
Physics. He left Perimeter Institute to write a book about open science and now lives in
Toronto.
ISAAC CHUANG is a Professor at the Massachusetts Institute of Technology, jointly
appointed in Electrical Engineering & Computer Science, and in Physics. He leads the
quanta research group at the Center for Ultracold Atoms, in the MIT Research
Laboratory of Electronics, which seeks to understand and create information technology
and intelligence from the fundamental building blocks of physical systems, atoms, and
molecules.
In praise of the book 10 years after publication
Ten years after its initial publication, “Mike and Ike” (as it’s affectionately called) remains the quantum
computing textbook to which all others are compared. No other book in the field matches its scope:
from experimental implementation to complexity classes, from the philosophical justifications for the
Church-Turing Thesis to the nitty-gritty of bra/ket manipulation. A dog-eared copy sits on my desk;
the section on trace distance and fidelity alone has been worth many times the price of the book to me.
Scott Aaronson, Massachusetts Institute of Technology
Quantum information processing has become a huge interdisciplinary field at the intersection of both,
theoretical and experimental quantum physics, computer science, mathematics, quantum engineering
and, more recently, even quantum metrology. The book by Michael Nielsen and Isaac Chuang was
seminal in many ways: it paved the way for a broader, yet deep understanding of the underlying
science, it introduced a common language now widely used by a growing community and it became
the standard book in the field for a whole decade. In spite of the fast progress in the field, even after
10 years the book provides the basic introduction into the field for students and scholars alike and
the 10th anniversary edition will remain a bestseller for a long time to come. The foundations of
quantum computation and quantum information processing are excellently laid out in this book and
it also provides an overview over some experimental techniques that have become the testing ground
for quantum information processing during the last decade. In view of the rapid progress of the field
the book will continue to be extremely valuable for all entering this highly interdisciplinary research
area and it will always provide the reference for those who grew up with it. This is an excellent book,
well written, highly commendable, and in fact imperative for everybody in the field.
Rainer Blatt, Universtität Innsbruck
My well-perused copy of Nielsen and Chuang is, as always, close at hand as I write this. It appears
that the material that Mike and Ike chose to cover, which was a lot, has turned out to be a large portion
of what will become the eternal verities of this still-young field. When another researcher asks me to
give her a clear explanation of some important point of quantum information science, I breathe a sigh
of relief when I recall that it is in this book – my job is easy, I just send her there.
David DiVincenzo, IBM T. J. Watson Research Center
If there is anything you want to know, or remind yourself, about quantum information science, then
look no further than this comprehensive compendium by Ike and Mike. Whether you are an expert, a
student or a casual reader, tap into this treasure chest of useful and well presented information.
Artur Ekert, Mathematical Institute, University of Oxford
Nearly every child who has read Harry Potter believes that if you just say the right thing or do the
right thing, you can coerce matter to do something fantastic. But what adult would believe it? Until
quantum computation and quantum information came along in the early 1990s, nearly none. The
quantum computer is the Philosopher’s Stone of our century, and Nielsen and Chuang is our basic
book of incantations. Ten years have passed since its publication, and it is as basic to the field as it
ever was. Matter will do wonderful things if asked to, but we must first understand its language. No
book written since (there was no before) does the job of teaching the language of quantum theory’s
possibilities like Nielsen and Chuang’s.
Chris Fuchs, Perimeter Institute for Theoretical Physics
Nielsen and Chuang is the bible of the quantum information field. It appeared 10 years ago, yet even
though the field has changed enormously in these 10 years - the book still covers most of the important
concepts of the field.
Lov Grover, Bell Labs
Quantum Computation and Quantum Information, commonly referred to as “Mike and Ike,” continues
to be a most valuable resource for background information on quantum information processing. As a
mathematically-impaired experimentalist, I particularly appreciate the fact that armed with a modest
background in quantum mechanics, it is possible to pick up at any point in the book and readily grasp
the basic ideas being discussed. To me, it is still “the” book on the subject.
David Wineland, National Institute of Standards and Technology, Boulder, Colorado
Endorsements for the original publication
Chuang and Nielsen have produced the first comprehensive study of quantum computation. To
develop a robust understanding of this subject one must integrate many ideas whose origins are
variously within physics, computer science, or mathematics. Until this text, putting together the
essential material, much less mastering it, has been a challenge. Our Universe has intrinsic capabilities and limitations on the processing of information. What these are will ultimately determine
the course of technology and shape our efforts to find a fundamental physical theory. This book is
an excellent way for any scientist or graduate student – in any of the related fields – to enter the
discussion.
Michael Freedman, Fields Medalist, Microsoft
Nielsen and Chuang’s new text is remarkably thorough and up-to-date, covering many aspects
of this rapidly evolving field from a physics perspective, complementing the computer science
perspective of Gruska’s 1999 text. The authors have succeeded in producing a self-contained book
accessible to anyone with a good undergraduate grounding in math, computer science or physical
sciences. An independent student could spend an enjoyable year reading this book and emerge ready
to tackle the current literature and do serious research. To streamline the exposition, footnotes have
been gathered into short but lively History and Further Reading sections at the end of each chapter.
Charles H Bennett, IBM
This is an excellent book. The field is already too big to cover completely in one book, but Nielsen
and Chuang have made a good selection of topics, and explain the topics they have chosen very
well.
Peter Shor, Massachusetts Institute of Technology
Quantum Computation and Quantum Information
10th Anniversary Edition
Michael A. Nielsen & Isaac L. Chuang
CAMBRIDGE UNIVERSITY PRESS
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore,
São Paulo, Delhi, Dubai, Tokyo, Mexico City
Cambridge University Press
The Edinburgh Building, Cambridge CB2 8RU, UK
Published in the United States of America by Cambridge University Press, New York
www.cambridge.org
Information on this title: www.cambridge.org/9781107002173
C M. Nielsen and I. Chuang 2010
This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2000
Reprinted 2002, 2003, 2004, 2007, 2009
10th Anniversary edition published 2010
Printed in the United Kingdom at the University Press, Cambridge
A catalog record for this publication is available from the British Library
ISBN 978-1-107-00217-3 Hardback
Cambridge University Press has no responsibility for the persistence or
accuracy of URLs for external or third-party internet websites referred to in
this publication, and does not guarantee that any content on such websites is,
or will remain, accurate or appropriate.
To our parents,
and our teachers
Contents
Introduction to the Tenth Anniversary Edition
page xvii
Afterword to the Tenth Anniversary Edition
xix
Preface
xxi
Acknowledgements
Nomenclature and notation
Part I Fundamental concepts
1 Introduction and overview
1.1 Global perspectives
1.1.1 History of quantum computation and quantum
information
1.1.2 Future directions
1.2 Quantum bits
1.2.1 Multiple qubits
1.3 Quantum computation
1.3.1 Single qubit gates
1.3.2 Multiple qubit gates
1.3.3 Measurements in bases other than the computational basis
1.3.4 Quantum circuits
1.3.5 Qubit copying circuit?
1.3.6 Example: Bell states
1.3.7 Example: quantum teleportation
1.4 Quantum algorithms
1.4.1 Classical computations on a quantum computer
1.4.2 Quantum parallelism
1.4.3 Deutsch’s algorithm
1.4.4 The Deutsch–Jozsa algorithm
1.4.5 Quantum algorithms summarized
1.5 Experimental quantum information processing
1.5.1 The Stern–Gerlach experiment
1.5.2 Prospects for practical quantum information processing
1.6 Quantum information
1.6.1 Quantum information theory: example problems
1.6.2 Quantum information in a wider context
xxvii
xxix
1
1
1
2
12
13
16
17
17
20
22
22
24
25
26
28
29
30
32
34
36
42
43
46
50
52
58
x
Contents
2 Introduction to quantum mechanics
2.1 Linear algebra
2.1.1 Bases and linear independence
2.1.2 Linear operators and matrices
2.1.3 The Pauli matrices
2.1.4 Inner products
2.1.5 Eigenvectors and eigenvalues
2.1.6 Adjoints and Hermitian operators
2.1.7 Tensor products
2.1.8 Operator functions
2.1.9 The commutator and anti-commutator
2.1.10 The polar and singular value decompositions
2.2 The postulates of quantum mechanics
2.2.1 State space
2.2.2 Evolution
2.2.3 Quantum measurement
2.2.4 Distinguishing quantum states
2.2.5 Projective measurements
2.2.6 POVM measurements
2.2.7 Phase
2.2.8 Composite systems
2.2.9 Quantum mechanics: a global view
2.3 Application: superdense coding
2.4 The density operator
2.4.1 Ensembles of quantum states
2.4.2 General properties of the density operator
2.4.3 The reduced density operator
2.5 The Schmidt decomposition and purifications
2.6 EPR and the Bell inequality
60
61
62
63
65
65
68
69
71
75
76
78
80
80
81
84
86
87
90
93
93
96
97
98
99
101
105
109
111
3 Introduction to computer science
3.1 Models for computation
3.1.1 Turing machines
3.1.2 Circuits
3.2 The analysis of computational problems
3.2.1 How to quantify computational resources
3.2.2 Computational complexity
3.2.3 Decision problems and the complexity classes P and NP
3.2.4 A plethora of complexity classes
3.2.5 Energy and computation
3.3 Perspectives on computer science
120
122
122
129
135
136
138
141
150
153
161
Part II Quantum computation
171
4 Quantum circuits
4.1 Quantum algorithms
4.2 Single qubit operations
171
172
174
Contents
4.3 Controlled operations
4.4 Measurement
4.5 Universal quantum gates
4.5.1 Two-level unitary gates are universal
4.5.2 Single qubit and CNOT gates are universal
4.5.3 A discrete set of universal operations
4.5.4 Approximating arbitrary unitary gates is generically hard
4.5.5 Quantum computational complexity
4.6 Summary of the quantum circuit model of computation
4.7 Simulation of quantum systems
4.7.1 Simulation in action
4.7.2 The quantum simulation algorithm
4.7.3 An illustrative example
4.7.4 Perspectives on quantum simulation
xi
177
185
188
189
191
194
198
200
202
204
204
206
209
211
5 The quantum Fourier transform and its applications
5.1 The quantum Fourier transform
5.2 Phase estimation
5.2.1 Performance and requirements
5.3 Applications: order-finding and factoring
5.3.1 Application: order-finding
5.3.2 Application: factoring
5.4 General applications of the quantum Fourier
transform
5.4.1 Period-finding
5.4.2 Discrete logarithms
5.4.3 The hidden subgroup problem
5.4.4 Other quantum algorithms?
216
217
221
223
226
226
232
6 Quantum search algorithms
6.1 The quantum search algorithm
6.1.1 The oracle
6.1.2 The procedure
6.1.3 Geometric visualization
6.1.4 Performance
6.2 Quantum search as a quantum simulation
6.3 Quantum counting
6.4 Speeding up the solution of NP-complete problems
6.5 Quantum search of an unstructured database
6.6 Optimality of the search algorithm
6.7 Black box algorithm limits
248
248
248
250
252
253
255
261
263
265
269
271
7 Quantum computers: physical realization
7.1 Guiding principles
7.2 Conditions for quantum computation
7.2.1 Representation of quantum information
7.2.2 Performance of unitary transformations
277
277
279
279
281
234
236
238
240
242
xii
Contents
7.3
7.4
7.5
7.6
7.7
7.8
7.2.3 Preparation of fiducial initial states
7.2.4 Measurement of output result
Harmonic oscillator quantum computer
7.3.1 Physical apparatus
7.3.2 The Hamiltonian
7.3.3 Quantum computation
7.3.4 Drawbacks
Optical photon quantum computer
7.4.1 Physical apparatus
7.4.2 Quantum computation
7.4.3 Drawbacks
Optical cavity quantum electrodynamics
7.5.1 Physical apparatus
7.5.2 The Hamiltonian
7.5.3 Single-photon single-atom absorption and
refraction
7.5.4 Quantum computation
Ion traps
7.6.1 Physical apparatus
7.6.2 The Hamiltonian
7.6.3 Quantum computation
7.6.4 Experiment
Nuclear magnetic resonance
7.7.1 Physical apparatus
7.7.2 The Hamiltonian
7.7.3 Quantum computation
7.7.4 Experiment
Other implementation schemes
Part III Quantum information
8 Quantum noise and quantum operations
8.1 Classical noise and Markov processes
8.2 Quantum operations
8.2.1 Overview
8.2.2 Environments and quantum operations
8.2.3 Operator-sum representation
8.2.4 Axiomatic approach to quantum operations
8.3 Examples of quantum noise and quantum operations
8.3.1 Trace and partial trace
8.3.2 Geometric picture of single qubit quantum
operations
8.3.3 Bit flip and phase flip channels
8.3.4 Depolarizing channel
8.3.5 Amplitude damping
8.3.6 Phase damping
281
282
283
283
284
286
286
287
287
290
296
297
298
300
303
306
309
309
317
319
321
324
325
326
331
336
343
353
353
354
356
356
357
360
366
373
374
374
376
378
380
383
Contents
8.4 Applications of quantum operations
8.4.1 Master equations
8.4.2 Quantum process tomography
8.5 Limitations of the quantum operations formalism
9 Distance measures for quantum information
9.1 Distance measures for classical information
9.2 How close are two quantum states?
9.2.1 Trace distance
9.2.2 Fidelity
9.2.3 Relationships between distance measures
9.3 How well does a quantum channel preserve information?
xiii
386
386
389
394
399
399
403
403
409
415
416
10 Quantum error-correction
10.1 Introduction
10.1.1 The three qubit bit flip code
10.1.2 Three qubit phase flip code
10.2 The Shor code
10.3 Theory of quantum error-correction
10.3.1 Discretization of the errors
10.3.2 Independent error models
10.3.3 Degenerate codes
10.3.4 The quantum Hamming bound
10.4 Constructing quantum codes
10.4.1 Classical linear codes
10.4.2 Calderbank–Shor–Steane codes
10.5 Stabilizer codes
10.5.1 The stabilizer formalism
10.5.2 Unitary gates and the stabilizer formalism
10.5.3 Measurement in the stabilizer formalism
10.5.4 The Gottesman–Knill theorem
10.5.5 Stabilizer code constructions
10.5.6 Examples
10.5.7 Standard form for a stabilizer code
10.5.8 Quantum circuits for encoding, decoding, and
correction
10.6 Fault-tolerant quantum computation
10.6.1 Fault-tolerance: the big picture
10.6.2 Fault-tolerant quantum logic
10.6.3 Fault-tolerant measurement
10.6.4 Elements of resilient quantum computation
425
426
427
430
432
435
438
441
444
444
445
445
450
453
454
459
463
464
464
467
470
11 Entropy and information
11.1 Shannon entropy
11.2 Basic properties of entropy
11.2.1 The binary entropy
11.2.2 The relative entropy
500
500
502
502
504
472
474
475
482
489
493
xiv
Contents
11.2.3 Conditional entropy and mutual information
11.2.4 The data processing inequality
11.3 Von Neumann entropy
11.3.1 Quantum relative entropy
11.3.2 Basic properties of entropy
11.3.3 Measurements and entropy
11.3.4 Subadditivity
11.3.5 Concavity of the entropy
11.3.6 The entropy of a mixture of quantum states
11.4 Strong subadditivity
11.4.1 Proof of strong subadditivity
11.4.2 Strong subadditivity: elementary applications
505
509
510
511
513
514
515
516
518
519
519
522
12 Quantum information theory
12.1 Distinguishing quantum states and the accessible information
12.1.1 The Holevo bound
12.1.2 Example applications of the Holevo bound
12.2 Data compression
12.2.1 Shannon’s noiseless channel coding theorem
12.2.2 Schumacher’s quantum noiseless channel coding theorem
12.3 Classical information over noisy quantum channels
12.3.1 Communication over noisy classical channels
12.3.2 Communication over noisy quantum channels
12.4 Quantum information over noisy quantum channels
12.4.1 Entropy exchange and the quantum Fano inequality
12.4.2 The quantum data processing inequality
12.4.3 Quantum Singleton bound
12.4.4 Quantum error-correction, refrigeration and Maxwell’s demon
12.5 Entanglement as a physical resource
12.5.1 Transforming bi-partite pure state entanglement
12.5.2 Entanglement distillation and dilution
12.5.3 Entanglement distillation and quantum error-correction
12.6 Quantum cryptography
12.6.1 Private key cryptography
12.6.2 Privacy amplification and information reconciliation
12.6.3 Quantum key distribution
12.6.4 Privacy and coherent information
12.6.5 The security of quantum key distribution
528
529
531
534
536
537
542
546
548
554
561
561
564
568
569
571
573
578
580
582
582
584
586
592
593
Appendices
608
Appendix 1:
Notes on basic probability theory
Appendix 2: Group theory
A2.1 Basic definitions
A2.1.1 Generators
A2.1.2 Cyclic groups
A2.1.3 Cosets
608
610
610
611
611
612
Contents
A2.2 Representations
A2.2.1 Equivalence and reducibility
A2.2.2 Orthogonality
A2.2.3 The regular representation
A2.3 Fourier transforms
Appendix 3:
The Solovay--Kitaev theorem
xv
612
612
613
614
615
617
Appendix 4: Number theory
A4.1 Fundamentals
A4.2 Modular arithmetic and Euclid’s algorithm
A4.3 Reduction of factoring to order-finding
A4.4 Continued fractions
625
625
626
633
635
Appendix 5:
Public key cryptography and the RSA cryptosystem
640
Appendix 6:
Proof of Lieb’s theorem
645
Bibliography
649
Index
665
Introduction to the Tenth Anniversary Edition
Quantum mechanics has the curious distinction of being simultaneously the most successful and the most mysterious of our scientific theories. It was developed in fits and
starts over a remarkable period from 1900 to the 1920s, maturing into its current form in
the late 1920s. In the decades following the 1920s, physicists had great success applying
quantum mechanics to understand the fundamental particles and forces of nature, culminating in the development of the standard model of particle physics. Over the same
period, physicists had equally great success in applying quantum mechanics to understand
an astonishing range of phenomena in our world, from polymers to semiconductors, from
superfluids to superconductors. But, while these developments profoundly advanced our
understanding of the natural world, they did only a little to improve our understanding
of quantum mechanics.
This began to change in the 1970s and 1980s, when a few pioneers were inspired to
ask whether some of the fundamental questions of computer science and information
theory could be applied to the study of quantum systems. Instead of looking at quantum
systems purely as phenomena to be explained as they are found in nature, they looked at
them as systems that can be designed. This seems a small change in perspective, but the
implications are profound. No longer is the quantum world taken merely as presented,
but instead it can be created. The result was a new perspective that inspired both a
resurgence of interest in the fundamentals of quantum mechanics, and also many new
questions combining physics, computer science, and information theory. These include
questions such as: what are the fundamental physical limitations on the space and time
required to construct a quantum state? How much time and space are required for a given
dynamical operation? What makes quantum systems difficult to understand and simulate
by conventional classical means?
Writing this book in the late 1990s, we were fortunate to be writing at a time when
these and other fundamental questions had just crystallized out. Ten years later it is
clear such questions offer a sustained force encouraging a broad research program at the
foundations of physics and computer science. Quantum information science is here to
stay. Although the theoretical foundations of the field remain similar to what we discussed
10 years ago, detailed knowledge in many areas has greatly progressed. Originally, this book
served as a comprehensive overview of the field, bringing readers near to the forefront
of research. Today, the book provides a basic foundation for understanding the field,
appropriate either for someone who desires a broad perspective on quantum information
science, or an entryway for further investigation of the latest research literature. Of course,
xviii
Introduction to the Tenth Anniversary Edition
many fundamental challenges remain, and meeting those challenges promises to stimulate
exciting and unexpected links among many disparate parts of physics, computer science,
and information theory. We look forward to the decades ahead!
– Michael A. Nielsen and Isaac L. Chuang, March, 2010.
Afterword to the Tenth Anniversary Edition
An enormous amount has happened in quantum information science in the 10 years since
the first edition of this book, and in this afterword we cannot summarize even a tiny
fraction of that work. But a few especially striking developments merit comment, and may
perhaps whet your appetite for more.
Perhaps the most impressive progress has been in the area of experimental implementation. While we are still many years from building large-scale quantum computers, much
progress has been made. Superconducting circuits have been used to implement simple
two-qubit quantum algorithms, and three-qubit systems are nearly within reach. Qubits
based on nuclear spins and single photons have been used, respectively, to demonstrate
proof-of-principle for simple forms of quantum error correction and quantum simulation.
But the most impressive progress of all has been made with trapped ion systems, which
have been used to implement many two- and three-qubit algorithms and algorithmic
building blocks, including the quantum search algorithm and the quantum Fourier transform. Trapped ions have also been used to demonstrate basic quantum communication
primitives, including quantum error correction and quantum teleportation.
A second area of progress has been in understanding what physical resources are
required to quantum compute. Perhaps the most intriguing breakthrough here has been the
discovery that quantum computation can be done via measurement alone. For many years,
the conventional wisdom was that coherent superposition-preserving unitary dynamics
was an essential part of the power of quantum computers. This conventional wisdom
was blown away by the realization that quantum computation can be done without any
unitary dynamics at all. Instead, in some new models of quantum computation, quantum
measurements alone can be used to do arbitrary quantum computations. The only coherent
resource in these models is quantum memory, i.e., the ability to store quantum information.
An especially interesting example of these models is the one-way quantum computer, or
cluster-state computer. To quantum compute in the cluster-state model requires only
that the experimenter have possession of a fixed universal state known as the cluster state.
With a cluster state in hand, quantum computation can be implemented simply by doing
a sequence of single-qubit measurements, with the particular computation done being
determined by which qubits are measured, when they are measured, and how they are
measured. This is remarkable: you’re given a fixed quantum state, and then quantum
compute by “looking” at the individual qubits in appropriate ways.
A third area of progress has been in classically simulating quantum systems. Feynman’s
pioneering 1982 paper on quantum computing was motivated in part by the observation
that quantum systems often seem hard to simulate on conventional classical computers.
Of course, at the time there was only a limited understanding of how difficult it is
to simulate different quantum systems on ordinary classical computers. But in the 1990s
and, especially, in the 2000s, we have learned much about which quantum systems are easy
xx
Afterword to the Tenth Anniversary Edition
to simulate, and which are hard. Ingenious algorithms have been developed to classically
simulate many quantum systems that were formerly thought to be hard to simulate, in
particular, many quantum systems in one spatial dimension, and certain two-dimensional
quantum systems. These classical algorithms have been made possible by the development
of insightful classical descriptions that capture in a compact way much or all of the essential
physics of the system in question. At the same time, we have learned that some systems
that formerly seemed simple are surprisingly complex. For example, it has long been
known that quantum systems based on a certain type of optical component – what are
called linear optical systems – are easily simulated classically. So it was surprising when it
was discovered that adding two seemingly innocuous components – single-photon sources
and photodetectors – gave linear optics the full power of quantum computation. These
and similar investigations have deepened our understanding of which quantum systems
are easy to simulate, which quantum systems are hard to simulate, and why.
A fourth area of progress has been a greatly deepened understanding of quantum
communication channels. A beautiful and complete theory has been developed of how
entangled quantum states can assist classical communication over quantum channels. A
plethora of different quantum protocols for communication have been organized into
a comprehensive family (headed by “mother” and “father” protocols), unifying much
of our understanding of the different types of communication possible with quantum
information. A sign of the progress is the disproof of one of the key unsolved conjectures
reported in this book (p. 554), namely, that the communication capacity of a quantum
channel with product states is equal to the unconstrained capacity (i.e., the capacity with
any entangled state allowed as input). But, despite the progress, much remains beyond
our understanding. Only very recently, for example, it was discovered, to considerable
surprise, that two quantum channels, each with zero quantum capacity, can have a positive
quantum capacity when used together; the analogous result, with classical capacities over
classical channels, is known to be impossible.
One of the main motivations for work in quantum information science is the prospect of
fast quantum algorithms to solve important computational problems. Here, the progress
over the past decade has been mixed. Despite great ingenuity and effort, the chief algorithmic insights stand as they were 10 years ago. There has been considerable technical
progress, but we do not yet understand what exactly it is that makes quantum computers powerful, or on what class of problems they can be expected to outperform classical
computers.
What is exciting, though, is that ideas from quantum computation have been used
to prove a variety of theorems about classical computation. These have included, for
example, results about the difficulty of finding certain hidden vectors in a discrete lattice
of points. The striking feature is that these proofs, utilizing ideas of quantum computation,
are sometimes considerably simpler and more elegant than prior, classical proofs. Thus,
an awareness has grown that quantum computation may be a more natural model of
computation than the classical model, and perhaps fundamental results may be more
easily revealed through the ideas of quantum computation.
Preface
This book provides an introduction to the main ideas and techniques of the field of
quantum computation and quantum information. The rapid rate of progress in this field
and its cross-disciplinary nature have made it difficult for newcomers to obtain a broad
overview of the most important techniques and results of the field.
Our purpose in this book is therefore twofold. First, we introduce the background
material in computer science, mathematics and physics necessary to understand quantum computation and quantum information. This is done at a level comprehensible to
readers with a background at least the equal of a beginning graduate student in one or
more of these three disciplines; the most important requirements are a certain level of
mathematical maturity, and the desire to learn about quantum computation and quantum
information. The second purpose of the book is to develop in detail the central results of
quantum computation and quantum information. With thorough study the reader should
develop a working understanding of the fundamental tools and results of this exciting
field, either as part of their general education, or as a prelude to independent research in
quantum computation and quantum information.
Structure of the book
The basic structure of the book is depicted in Figure 1. The book is divided into three
parts. The general strategy is to proceed from the concrete to the more abstract whenever
possible. Thus we study quantum computation before quantum information; specific
quantum error-correcting codes before the more general results of quantum information
theory; and throughout the book try to introduce examples before developing general
theory.
Part I provides a broad overview of the main ideas and results of the field of quantum computation and quantum information, and develops the background material in
computer science, mathematics and physics necessary to understand quantum computation and quantum information in depth. Chapter 1 is an introductory chapter which
outlines the historical development and fundamental concepts of the field, highlighting
some important open problems along the way. The material has been structured so as
to be accessible even without a background in computer science or physics. The background material needed for a more detailed understanding is developed in Chapters 2
and 3, which treat in depth the fundamental notions of quantum mechanics and computer science, respectively. You may elect to concentrate more or less heavily on different
chapters of Part I, depending upon your background, returning later as necessary to fill
any gaps in your knowledge of the fundamentals of quantum mechanics and computer
science.
Part II describes quantum computation in detail. Chapter 4 describes the fundamen-
xxii
Preface
2=HJ 1
Fundamental
Concepts
Introduction
and Overview
Quantum
Mechanics
Computer
Science
2=HJ 111
2=HJ 11
Quantum
Information
Quantum
Computation
Quantum
Circuits
Quantum
Fourier Transform
Noise and
Quantum Operations
"
#
Quantum
Search
Physical
Realizations
!
%
$
Distance
Measures
&
Quantum
Error-Correction
Entropy
Quantum Information
Theory
'
Figure 1. Structure of the book.
tal elements needed to perform quantum computation, and presents many elementary
operations which may be used to develop more sophisticated applications of quantum
computation. Chapters 5 and 6 describe the quantum Fourier transform and the quantum
search algorithm, the two fundamental quantum algorithms presently known. Chapter 5
also explains how the quantum Fourier transform may be used to solve the factoring and
discrete logarithm problems, and the importance of these results to cryptography. Chapter 7 describes general design principles and criteria for good physical implementations of
quantum computers, using as examples several realizations which have been successfully
demonstrated in the laboratory.
Part III is about quantum information: what it is, how information is represented and
communicated using quantum states, and how to describe and deal with the corruption of
quantum and classical information. Chapter 8 describes the properties of quantum noise
which are needed to understand real-world quantum information processing, and the
quantum operations formalism, a powerful mathematical tool for understanding quantum noise. Chapter 9 describes distance measures for quantum information which allow
us to make quantitatively precise what it means to say that two items of quantum information are similar. Chapter 10 explains quantum error-correcting codes, which may be
used to protect quantum computations against the effect of noise. An important result in
this chapter is the threshold theorem, which shows that for realistic noise models, noise
is in principle not a serious impediment to quantum computation. Chapter 11 introduces
the fundamental information-theoretic concept of entropy, explaining many properties of
entropy in both classical and quantum information theory. Finally, Chapter 12 discusses
the information carrying properties of quantum states and quantum communication chan-
Preface
xxiii
nels, detailing many of the strange and interesting properties such systems can have for
the transmission of information both classical and quantum, and for the transmission of
secret information.
A large number of exercises and problems appear throughout the book. Exercises are
intended to solidify understanding of basic material and appear within the main body of
the text. With few exceptions these should be easily solved with a few minutes work.
Problems appear at the end of each chapter, and are intended to introduce you to new
and interesting material for which there was not enough space in the main text. Often the
problems are in multiple parts, intended to develop a particular line of thought in some
depth. A few of the problems were unsolved as the book went to press. When this is the
case it is noted in the statement of the problem. Each chapter concludes with a summary
of the main results of the chapter, and with a ‘History and further reading’ section that
charts the development of the main ideas in the chapter, giving citations and references
for the whole chapter, as well as providing recommendations for further reading.
The front matter of the book contains a detailed Table of Contents, which we encourage
you to browse. There is also a guide to nomenclature and notation to assist you as you
read.
The end matter of the book contains six appendices, a bibliography, and an index.
Appendix 1 reviews some basic definitions, notations, and results in elementary probability theory. This material is assumed to be familiar to readers, and is included for ease
of reference. Similarly, Apendix 2 reviews some elementary concepts from group theory,
and is included mainly for convenience. Appendix 3 contains a proof of the Solovay–
Kitaev theorem, an important result for quantum computation, which shows that a finite
set of quantum gates can be used to quickly approximate an arbitrary quantum gate.
Appendix 4 reviews the elementary material on number theory needed to understand
the quantum algorithms for factoring and discrete logarithm, and the RSA cryptosystem,
which is itself reviewed in Appendix 5. Appendix 6 contains a proof of Lieb’s theorem,
one of the most important results in quantum computation and quantum information,
and a precursor to important entropy inequalities such as the celebrated strong subadditivity inequality. The proofs of the Solovay–Kitaev theorem and Lieb’s theorem are
lengthy enough that we felt they justified a treatment apart from the main text.
The bibliography contains a listing of all reference materials cited in the text of the
book. Our apologies to any researcher whose work we have inadvertently omitted from
citation.
The field of quantum computation and quantum information has grown so rapidly in
recent years that we have not been able to cover all topics in as much depth as we would
have liked. Three topics deserve special mention. The first is the subject of entanglement
measures. As we explain in the book, entanglement is a key element in effects such as
quantum teleportation, fast quantum algorithms, and quantum error-correction. It is,
in short, a resource of great utility in quantum computation and quantum information.
There is a thriving research community currently fleshing out the notion of entanglement
as a new type of physical resource, finding principles which govern its manipulation and
utilization. We felt that these investigations, while enormously promising, are not yet
complete enough to warrant the more extensive coverage we have given to other subjects
in this book, and we restrict ourselves to a brief taste in Chapter 12. Similarly, the subject of distributed quantum computation (sometimes known as quantum communication
complexity) is an enormously promising subject under such active development that we
x xiv
Preface
have not given it a treatment for fear of being obsolete before publication of the book.
The implementation of quantum information processing machines has also developed
into a fascinating and rich area, and we limit ourselves to but a single chapter on this
subject. Clearly, much more can be said about physical implementations, but this would
begin to involve many more areas of physics, chemistry, and engineering, which we do
not have room for here.
How to use this book
This book may be used in a wide variety of ways. It can be used as the basis for a variety
of courses, from short lecture courses on a specific topic in quantum computation and
quantum information, through to full-year classes covering the entire field. It can be
used for independent study by people who would like to learn just a little about quantum
computation and quantum information, or by people who would like to be brought up to
the research frontier. It is also intended to act as a reference work for current researchers
in the field. We hope that it will be found especially valuable as an introduction for
researchers new to the field.
Note to the independent reader
The book is designed to be accessible to the independent reader. A large number of exercises are peppered throughout the text, which can be used as self-tests for understanding
of the material in the main text. The Table of Contents and end of chapter summaries
should enable you to quickly determine which chapters you wish to study in most depth.
The dependency diagram, Figure 1, will help you determine in what order material in
the book may be covered.
Note to the teacher
This book covers a diverse range of topics, and can therefore be used as the basis for a
wide variety of courses.
A one-semester course on quantum computation could be based upon a selection of
material from Chapters 1 through 3, depending on the background of the class, followed
by Chapter 4 on quantum circuits, Chapters 5 and 6 on quantum algorithms, and a
selection from Chapter 7 on physical implementations, and Chapters 8 through 10 to
understand quantum error-correction, with an especial focus on Chapter 10.
A one-semester course on quantum information could be based upon a selection of
material from Chapters 1 through 3, depending on the background of the class. Following
that, Chapters 8 through 10 on quantum error-correction, followed by Chapters 11 and 12
on quantum entropy and quantum information theory, respectively.
A full year class could cover all material in the book, with time for additional readings
selected from the ‘History and further reading’ section of several chapters. Quantum computation and quantum information also lend themselves ideally to independent research
projects for students.
Aside from classes on quantum computation and quantum information, there is another
way we hope the book will be used, which is as the text for an introductory class in quantum mechanics for physics students. Conventional introductions to quantum mechanics
rely heavily on the mathematical machinery of partial differential equations. We believe
this often obscures the fundamental ideas. Quantum computation and quantum informa-
Preface
xxv
tion offers an excellent conceptual laboratory for understanding the basic concepts and
unique aspects of quantum mechanics, without the use of heavy mathematical machinery.
Such a class would focus on the introduction to quantum mechanics in Chapter 2, basic
material on quantum circuits in Chapter 4, a selection of material on quantum algorithms
from Chapters 5 and 6, Chapter 7 on physical implementations of quantum computation,
and then almost any selection of material from Part III of the book, depending upon
taste.
Note to the student
We have written the book to be as self-contained as possible. The main exception is that
occasionally we have omitted arguments that one really needs to work through oneself
to believe; these are usually given as exercises. Let us suggest that you should at least
attempt all the exercises as you work through the book. With few exceptions the exercises
can be worked out in a few minutes. If you are having a lot of difficulty with many of
the exercises it may be a sign that you need to go back and pick up one or more key
concepts.
Further reading
As already noted, each chapter concludes with a ‘History and further reading’ section.
There are also a few broad-ranging references that might be of interest to readers.
Preskill’s[Pre98b] superb lecture notes approach quantum computation and quantum information from a somewhat different point of view than this book. Good overview articles on
specific subjects include (in order of their appearance in this book): Aharonov’s review of
quantum computation[Aha99b], Kitaev’s review of algorithms and error-correction[Kit97b],
Mosca’s thesis on quantum algorithms[Mos99], Fuchs’ thesis[Fuc96] on distinguishability
and distance measures in quantum information, Gottesman’s thesis on quantum errorcorrection[Got97], Preskill’s review of quantum error-correction[Pre97], Nielsen’s thesis on
quantum information theory[Nie98], and the reviews of quantum information theory by
Bennett and Shor[BS98] and by Bennett and DiVincenzo[BD00]. Other useful references
include Gruska’s book[Gru99], and the collection of review articles edited by Lo, Spiller,
and Popescu[LSP98].
Errors
Any lengthy document contains errors and omissions, and this book is surely no exception
to the rule. If you find any errors or have other comments to make about the book,
please email them to: qci@squint.org. As errata are found, we will add them to a list
maintained at the book web site: http://www.squint.org/qci/.
Acknowledgements
A few people have decisively influenced how we think about quantum computation and
quantum information. For many enjoyable discussions which have helped us shape and
refine our views, MAN thanks Carl Caves, Chris Fuchs, Gerard Milburn, John Preskill
and Ben Schumacher, and ILC thanks Tom Cover, Umesh Vazirani, Yoshi Yamamoto,
and Bernie Yurke.
An enormous number of people have helped in the construction of this book, both
directly and indirectly. A partial list includes Dorit Aharonov, Andris Ambainis, Nabil
Amer, Howard Barnum, Dave Beckman, Harry Buhrman, the Caltech Quantum Optics
Foosballers, Andrew Childs, Fred Chong, Richard Cleve, John Conway, John Cortese,
Michael DeShazo, Ronald de Wolf, David DiVincenzo, Steven van Enk, Henry Everitt,
Ron Fagin, Mike Freedman, Michael Gagen, Neil Gershenfeld, Daniel Gottesman, Jim
Harris, Alexander Holevo, Andrew Huibers, Julia Kempe, Alesha Kitaev, Manny Knill,
Shing Kong, Raymond Laflamme, Andrew Landahl, Ron Legere, Debbie Leung, Daniel
Lidar, Elliott Lieb, Theresa Lynn, Hideo Mabuchi, Yu Manin, Mike Mosca, Alex Pines,
Sridhar Rajagopalan, Bill Risk, Beth Ruskai, Sara Schneider, Robert Schrader, Peter
Shor, Sheri Stoll, Volker Strassen, Armin Uhlmann, Lieven Vandersypen, Anne Verhulst, Debby Wallach, Mike Westmoreland, Dave Wineland, Howard Wiseman, John
Yard, Xinlan Zhou, and Wojtek Zurek.
Thanks to the folks at Cambridge University Press for their help turning this book
from an idea into reality. Our especial thanks go to our thoughtful and enthusiastic
editor Simon Capelin, who shepherded this project along for more than three years, and
to Margaret Patterson, for her timely and thorough copy-editing of the manuscript.
Parts of this book were completed while MAN was a Tolman Prize Fellow at the
California Institute of Technology, a member of the T-6 Theoretical Astrophysics Group
at the Los Alamos National Laboratory, and a member of the University of New Mexico
Center for Advanced Studies, and while ILC was a Research Staff Member at the IBM
Almaden Research Center, a consulting Assistant Professor of Electrical Engineering
at Stanford University, a visiting researcher at the University of California Berkeley
Department of Computer Science, a member of the Los Alamos National Laboratory T-6
Theoretical Astrophysics Group, and a visiting researcher at the University of California
Santa Barbara Institute for Theoretical Physics. We also appreciate the warmth and
hospitality of the Aspen Center for Physics, where the final page proofs of this book were
finished.
MAN and ILC gratefully acknowledge support from DARPA under the NMRQC
research initiative and the QUIC Institute administered by the Army Research Office.
We also thank the National Science Foundation, the National Security Agency, the Office
of Naval Research, and IBM for their generous support.
Nomenclature and notation
There are several items of nomenclature and notation which have two or more meanings in
common use in the field of quantum computation and quantum information. To prevent
confusion from arising, this section collects many of the more frequently used of these
items, together with the conventions that will be adhered to in this book.
Linear algebra and quantum mechanics
All vector spaces are assumed to be finite dimensional, unless otherwise noted. In many
instances this restriction is unnecessary, or can be removed with some additional technical
work, but making the restriction globally makes the presentation more easily comprehensible, and doesn’t detract much from many of the intended applications of the results.
A positive operator A is one for which ψ|A|ψ ≥ 0 for all |ψ. A positive definite
operator A is one for which ψ|A|ψ > 0 for all |ψ = 0. The support of an operator
is defined to be the vector space orthogonal to its kernel. For a Hermitian operator, this
means the vector space spanned by eigenvectors of the operator with non-zero eigenvalues.
The notation U (and often but not always V ) will generically be used to denote a unitary
operator or matrix. H is usually used to denote a quantum logic gate, the Hadamard
gate, and sometimes to denote the Hamiltonian for a quantum system, with the meaning
clear from context.
Vectors will sometimes be written in column format, as for example,
1
2
,
(0.1)
and sometimes for readability in the format (1, 2). The latter should be understood as
shorthand for a column vector. For two-level quantum systems used as qubits, we shall
usually identify the state |0 with the vector (1, 0), and similarly |1 with (0, 1). We also
define the Pauli sigma matrices in the conventional way – see ‘Frequently used quantum
gates and circuit symbols’, below. Most significantly, the convention for the Pauli sigma
z matrix is that σz |0 = |0 and σz |1 = −|1, which is reverse of what some physicists
(but usually not computer scientists or mathematicians) intuitively expect. The origin
of this dissonance is that the +1 eigenstate of σz is often identified by physicists with a
so-called ‘excited state’, and it seems natural to many to identify this with |1, rather than
with |0 as is done in this book. Our choice is made in order to be consistent with the
usual indexing of matrix elements in linear algebra, which makes it natural to identify the
first column of σz with the action of σz on |0, and the second column with the action
on |1. This choice is also in use throughout the quantum computation and quantum
information community. In addition to the conventional notations σx , σy and σz for the
Pauli sigma matrices, it will also be convenient to use the notations σ1 , σ2 , σ3 for these
xxx
Nomenclature and notation
three matrices, and to define σ0 as the 2×2 identity matrix. Most often, however, we use
the notations I, X, Y and Z for σ0 , σ1 , σ2 and σ3 , respectively.
Information theory and probability
As befits good information theorists, logarithms are always taken to base two, unless
otherwise noted. We use log(x) to denote logarithms to base 2, and ln(x) on those rare
occasions when we wish to take a natural logarithm. The term probability distribution
is used to refer to a finite set of real numbers, px , such that px ≥ 0 and x px = 1. The
relative entropy of a positive operator A with respect to a positive operator B is defined
by S(A||B) ≡ tr(A log A) − tr(A log B).
Miscellanea
⊕ denotes modulo two addition. Throughout this book ‘z’ is pronounced ‘zed’.
Frequently used quantum gates and circuit symbols
Certain schematic symbols are often used to denote unitary transforms which are useful in
the design of quantum circuits. For the reader’s convenience, many of these are gathered
together below. The rows and columns of the unitary transforms are labeled from left to
right and top to bottom as 00 . . . 0, 00 . . . 1 to 11 . . . 1 with the bottom-most wire being
the least significant bit. Note that eiπ/4 is the square root of i, so that the π/8 gate is the
square root of the phase gate, which itself is the square root of the Pauli-Z gate.
1 1 1
√
Hadamard
2 1 −1
0 1
Pauli-X
1 0
0 −i
Pauli-Y
i 0
1 0
Pauli-Z
0 −1
1 0
Phase
0 i
1
0
π/8
0 eiπ/4
xxxi
Nomenclature and notation
⎡
controlled-
swap
controlled-Z
•
=
Z
controlled-phase
Toffoli
•
•
⊕
Fredkin (controlled-swap)
•
×
×
measurement
❴✤✤ ❴ ❴ ❴ ❴ ❴ L ❴ ❴ ✤✤
✙
✤✤
✤✤✤
✙✙
✤✤
✤
✙✙
❴✤ ❴ ❴ ❴ ❴ ✙ ❴ ❴ ❴ ✤
⎡
1
⎢0
⎣0
0
⎡
1
⎢0
⎣0
0
⎡
1
⎢0
⎣0
0
⎡
1
⎢0
⎣0
0
0
1
0
0
0
0
0
1
0
0
1
0
0
1
0
0
0
1
0
0
0
0
1
0
0
1
0
0
0
0
1
0
1
⎢0
⎢0
⎢
⎢0
⎢
⎢0
⎢
⎢0
⎣
0
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0
1
⎢0
⎢0
⎢
⎢0
⎢
⎢0
⎢
⎢0
⎣
0
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0
⎡
⎤
0
0⎥
1
0
⎤
0
0⎥
0
1
⎤
0
0 ⎥
0
−1
⎤
0
0⎥
0
i
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
1
0
0
0
0
0
0
1
0
0
⎤
0
0⎥
0⎥
⎥
0⎥
⎥
0⎥
⎥
0⎥
1
0
⎤
0
0⎥
0⎥
⎥
0⎥
⎥
0⎥
⎥
0⎥
0
1
Projection onto |0 and |1
qubit
wire carrying a single qubit
(time goes left to right)
classical bit
wire carrying a single classical bit
n qubits
wire carrying n qubits
I Fundamental concepts
1 Introduction and overview
Science offers the boldest metaphysics of the age. It is a thoroughly human
construct, driven by the faith that if we dream, press to discover, explain, and
dream again, thereby plunging repeatedly into new terrain, the world will somehow come clearer and we will grasp the true strangeness of the universe. And
the strangeness will all prove to be connected, and make sense.
– Edward O. Wilson
Information is physical.
– Rolf Landauer
What are the fundamental concepts of quantum computation and quantum information?
How did these concepts develop? To what uses may they be put? How will they be presented in this book? The purpose of this introductory chapter is to answer these questions
by developing in broad brushstrokes a picture of the field of quantum computation and
quantum information. The intent is to communicate a basic understanding of the central
concepts of the field, perspective on how they have been developed, and to help you
decide how to approach the rest of the book.
Our story begins in Section 1.1 with an account of the historical context in which
quantum computation and quantum information has developed. Each remaining section
in the chapter gives a brief introduction to one or more fundamental concepts from the
field: quantum bits (Section 1.2), quantum computers, quantum gates and quantum circuits (Section 1.3), quantum algorithms (Section 1.4), experimental quantum information
processing (Section 1.5), and quantum information and communication (Section 1.6).
Along the way, illustrative and easily accessible developments such as quantum teleportation and some simple quantum algorithms are given, using the basic mathematics
taught in this chapter. The presentation is self-contained, and designed to be accessible
even without a background in computer science or physics. As we move along, we give
pointers to more in-depth discussions in later chapters, where references and suggestions
for further reading may also be found.
If as you read you’re finding the going rough, skip on to a spot where you feel more
comfortable. At points we haven’t been able to avoid using a little technical lingo which
won’t be completely explained until later in the book. Simply accept it for now, and come
back later when you understand all the terminology in more detail. The emphasis in this
first chapter is on the big picture, with the details to be filled in later.
1.1 Global perspectives
Quantum computation and quantum information is the study of the information processing tasks that can be accomplished using quantum mechanical systems. Sounds pretty
2
Introduction and overview
simple and obvious, doesn’t it? Like many simple but profound ideas it was a long time
before anybody thought of doing information processing using quantum mechanical systems. To see why this is the case, we must go back in time and look in turn at each
of the fields which have contributed fundamental ideas to quantum computation and
quantum information – quantum mechanics, computer science, information theory, and
cryptography. As we take our short historical tour of these fields, think of yourself first
as a physicist, then as a computer scientist, then as an information theorist, and finally
as a cryptographer, in order to get some feel for the disparate perspectives which have
come together in quantum computation and quantum information.
1.1.1 History of quantum computation and quantum information
Our story begins at the turn of the twentieth century when an unheralded revolution was
underway in science. A series of crises had arisen in physics. The problem was that the
theories of physics at that time (now dubbed classical physics) were predicting absurdities
such as the existence of an ‘ultraviolet catastrophe’ involving infinite energies, or electrons
spiraling inexorably into the atomic nucleus. At first such problems were resolved with
the addition of ad hoc hypotheses to classical physics, but as a better understanding
of atoms and radiation was gained these attempted explanations became more and more
convoluted. The crisis came to a head in the early 1920s after a quarter century of turmoil,
and resulted in the creation of the modern theory of quantum mechanics. Quantum
mechanics has been an indispensable part of science ever since, and has been applied
with enormous success to everything under and inside the Sun, including the structure
of the atom, nuclear fusion in stars, superconductors, the structure of DNA, and the
elementary particles of Nature.
What is quantum mechanics? Quantum mechanics is a mathematical framework or set
of rules for the construction of physical theories. For example, there is a physical theory
known as quantum electrodynamics which describes with fantastic accuracy the interaction of atoms and light. Quantum electrodynamics is built up within the framework of
quantum mechanics, but it contains specific rules not determined by quantum mechanics.
The relationship of quantum mechanics to specific physical theories like quantum electrodynamics is rather like the relationship of a computer’s operating system to specific
applications software – the operating system sets certain basic parameters and modes of
operation, but leaves open how specific tasks are accomplished by the applications.
The rules of quantum mechanics are simple but even experts find them counterintuitive, and the earliest antecedents of quantum computation and quantum information
may be found in the long-standing desire of physicists to better understand quantum
mechanics. The best known critic of quantum mechanics, Albert Einstein, went to his
grave unreconciled with the theory he helped invent. Generations of physicists since have
wrestled with quantum mechanics in an effort to make its predictions more palatable.
One of the goals of quantum computation and quantum information is to develop tools
which sharpen our intuition about quantum mechanics, and make its predictions more
transparent to human minds.
For example, in the early 1980s, interest arose in whether it might be possible to use
quantum effects to signal faster than light – a big no-no according to Einstein’s theory of
relativity. The resolution of this problem turns out to hinge on whether it is possible to
clone an unknown quantum state, that is, construct a copy of a quantum state. If cloning
were possible, then it would be possible to signal faster than light using quantum effects.
Global perspectives
3
However, cloning – so easy to accomplish with classical information (consider the words
in front of you, and where they came from!) – turns out not to be possible in general in
quantum mechanics. This no-cloning theorem, discovered in the early 1980s, is one of
the earliest results of quantum computation and quantum information. Many refinements
of the no-cloning theorem have since been developed, and we now have conceptual tools
which allow us to understand how well a (necessarily imperfect) quantum cloning device
might work. These tools, in turn, have been applied to understand other aspects of
quantum mechanics.
A related historical strand contributing to the development of quantum computation
and quantum information is the interest, dating to the 1970s, of obtaining complete control over single quantum systems. Applications of quantum mechanics prior to the 1970s
typically involved a gross level of control over a bulk sample containing an enormous
number of quantum mechanical systems, none of them directly accessible. For example,
superconductivity has a superb quantum mechanical explanation. However, because a superconductor involves a huge (compared to the atomic scale) sample of conducting metal,
we can only probe a few aspects of its quantum mechanical nature, with the individual
quantum systems constituting the superconductor remaining inaccessible. Systems such
as particle accelerators do allow limited access to individual quantum systems, but again
provide little control over the constituent systems.
Since the 1970s many techniques for controlling single quantum systems have been
developed. For example, methods have been developed for trapping a single atom in an
‘atom trap’, isolating it from the rest of the world and allowing us to probe many different
aspects of its behavior with incredible precision. The scanning tunneling microscope
has been used to move single atoms around, creating designer arrays of atoms at will.
Electronic devices whose operation involves the transfer of only single electrons have
been demonstrated.
Why all this effort to attain complete control over single quantum systems? Setting
aside the many technological reasons and concentrating on pure science, the principal
answer is that researchers have done this on a hunch. Often the most profound insights
in science come when we develop a method for probing a new regime of Nature. For
example, the invention of radio astronomy in the 1930s and 1940s led to a spectacular
sequence of discoveries, including the galactic core of the Milky Way galaxy, pulsars, and
quasars. Low temperature physics has achieved its amazing successes by finding ways to
lower the temperatures of different systems. In a similar way, by obtaining complete
control over single quantum systems, we are exploring untouched regimes of Nature in
the hope of discovering new and unexpected phenomena. We are just now taking our first
steps along these lines, and already a few interesting surprises have been discovered in
this regime. What else shall we discover as we obtain more complete control over single
quantum systems, and extend it to more complex systems?
Quantum computation and quantum information fit naturally into this program. They
provide a useful series of challenges at varied levels of difficulty for people devising
methods to better manipulate single quantum systems, and stimulate the development of
new experimental techniques and provide guidance as to the most interesting directions
in which to take experiment. Conversely, the ability to control single quantum systems
is essential if we are to harness the power of quantum mechanics for applications to
quantum computation and quantum information.
Despite this intense interest, efforts to build quantum information processing systems
4
Introduction and overview
have resulted in modest success to date. Small quantum computers, capable of doing
dozens of operations on a few quantum bits (or qubits) represent the state of the art in
quantum computation. Experimental prototypes for doing quantum cryptography – a
way of communicating in secret across long distances – have been demonstrated, and are
even at the level where they may be useful for some real-world applications. However, it
remains a great challenge to physicists and engineers of the future to develop techniques
for making large-scale quantum information processing a reality.
Let us turn our attention from quantum mechanics to another of the great intellectual
triumphs of the twentieth century, computer science. The origins of computer science
are lost in the depths of history. For example, cuneiform tablets indicate that by the time
of Hammurabi (circa 1750 B.C.) the Babylonians had developed some fairly sophisticated
algorithmic ideas, and it is likely that many of those ideas date to even earlier times.
The modern incarnation of computer science was announced by the great mathematician Alan Turing in a remarkable 1936 paper. Turing developed in detail an abstract
notion of what we would now call a programmable computer, a model for computation
now known as the Turing machine, in his honor. Turing showed that there is a Universal
Turing Machine that can be used to simulate any other Turing machine. Furthermore,
he claimed that the Universal Turing Machine completely captures what it means to perform a task by algorithmic means. That is, if an algorithm can be performed on any piece
of hardware (say, a modern personal computer), then there is an equivalent algorithm
for a Universal Turing Machine which performs exactly the same task as the algorithm
running on the personal computer. This assertion, known as the Church–Turing thesis
in honor of Turing and another pioneer of computer science, Alonzo Church, asserts the
equivalence between the physical concept of what class of algorithms can be performed
on some physical device with the rigorous mathematical concept of a Universal Turing
Machine. The broad acceptance of this thesis laid the foundation for the development of
a rich theory of computer science.
Not long after Turing’s paper, the first computers constructed from electronic components were developed. John von Neumann developed a simple theoretical model for
how to put together in a practical fashion all the components necessary for a computer
to be fully as capable as a Universal Turing Machine. Hardware development truly took
off, though, in 1947, when John Bardeen, Walter Brattain, and Will Shockley developed
the transistor. Computer hardware has grown in power at an amazing pace ever since, so
much so that the growth was codified by Gordon Moore in 1965 in what has come to be
known as Moore’s law, which states that computer power will double for constant cost
roughly once every two years.
Amazingly enough, Moore’s law has approximately held true in the decades since
the 1960s. Nevertheless, most observers expect that this dream run will end some time
during the first two decades of the twenty-first century. Conventional approaches to
the fabrication of computer technology are beginning to run up against fundamental
difficulties of size. Quantum effects are beginning to interfere in the functioning of
electronic devices as they are made smaller and smaller.
One possible solution to the problem posed by the eventual failure of Moore’s law
is to move to a different computing paradigm. One such paradigm is provided by the
theory of quantum computation, which is based on the idea of using quantum mechanics
to perform computations, instead of classical physics. It turns out that while an ordinary
computer can be used to simulate a quantum computer, it appears to be impossible to
Global perspectives
5
perform the simulation in an efficient fashion. Thus quantum computers offer an essential
speed advantage over classical computers. This speed advantage is so significant that many
researchers believe that no conceivable amount of progress in classical computation would
be able to overcome the gap between the power of a classical computer and the power of
a quantum computer.
What do we mean by ‘efficient’ versus ‘inefficient’ simulations of a quantum computer?
Many of the key notions needed to answer this question were actually invented before
the notion of a quantum computer had even arisen. In particular, the idea of efficient
and inefficient algorithms was made mathematically precise by the field of computational
complexity. Roughly speaking, an efficient algorithm is one which runs in time polynomial
in the size of the problem solved. In contrast, an inefficient algorithm requires superpolynomial (typically exponential) time. What was noticed in the late 1960s and early
1970s was that it seemed as though the Turing machine model of computation was at
least as powerful as any other model of computation, in the sense that a problem which
could be solved efficiently in some model of computation could also be solved efficiently
in the Turing machine model, by using the Turing machine to simulate the other model
of computation. This observation was codified into a strengthened version of the Church–
Turing thesis:
Any algorithmic process can be simulated efficiently using a Turing machine.
The key strengthening in the strong Church–Turing thesis is the word efficiently. If
the strong Church–Turing thesis is correct, then it implies that no matter what type of
machine we use to perform our algorithms, that machine can be simulated efficiently
using a standard Turing machine. This is an important strengthening, as it implies that
for the purposes of analyzing whether a given computational task can be accomplished
efficiently, we may restrict ourselves to the analysis of the Turing machine model of
computation.
One class of challenges to the strong Church–Turing thesis comes from the field of
analog computation. In the years since Turing, many different teams of researchers have
noticed that certain types of analog computers can efficiently solve problems believed to
have no efficient solution on a Turing machine. At first glance these analog computers
appear to violate the strong form of the Church–Turing thesis. Unfortunately for analog
computation, it turns out that when realistic assumptions about the presence of noise in
analog computers are made, their power disappears in all known instances; they cannot
efficiently solve problems which are not efficiently solvable on a Turing machine. This
lesson – that the effects of realistic noise must be taken into account in evaluating the
efficiency of a computational model – was one of the great early challenges of quantum
computation and quantum information, a challenge successfully met by the development
of a theory of quantum error-correcting codes and fault-tolerant quantum computation.
Thus, unlike analog computation, quantum computation can in principle tolerate a finite
amount of noise and still retain its computational advantages.
The first major challenge to the strong Church–Turing thesis arose in the mid 1970s,
when Robert Solovay and Volker Strassen showed that it is possible to test whether an integer is prime or composite using a randomized algorithm. That is, the Solovay–Strassen
test for primality used randomness as an essential part of the algorithm. The algorithm
did not determine whether a given integer was prime or composite with certainty. Instead,
the algorithm could determine that a number was probably prime or else composite with
6
Introduction and overview
certainty. By repeating the Solovay–Strassen test a few times it is possible to determine
with near certainty whether a number is prime or composite. The Solovay-Strassen test
was of especial significance at the time it was proposed as no deterministic test for primality was then known, nor is one known at the time of this writing. Thus, it seemed as
though computers with access to a random number generator would be able to efficiently
perform computational tasks with no efficient solution on a conventional deterministic
Turing machine. This discovery inspired a search for other randomized algorithms which
has paid off handsomely, with the field blossoming into a thriving area of research.
Randomized algorithms pose a challenge to the strong Church–Turing thesis, suggesting that there are efficiently soluble problems which, nevertheless, cannot be efficiently
solved on a deterministic Turing machine. This challenge appears to be easily resolved
by a simple modification of the strong Church–Turing thesis:
Any algorithmic process can be simulated efficiently using a
probabilistic Turing machine.
This ad hoc modification of the strong Church–Turing thesis should leave you feeling
rather queasy. Might it not turn out at some later date that yet another model of computation allows one to efficiently solve problems that are not efficiently soluble within Turing’s
model of computation? Is there any way we can find a single model of computation which
is guaranteed to be able to efficiently simulate any other model of computation?
Motivated by this question, in 1985 David Deutsch asked whether the laws of physics
could be use to derive an even stronger version of the Church–Turing thesis. Instead of
adopting ad hoc hypotheses, Deutsch looked to physical theory to provide a foundation
for the Church–Turing thesis that would be as secure as the status of that physical theory.
In particular, Deutsch attempted to define a computational device that would be capable
of efficiently simulating an arbitrary physical system. Because the laws of physics are
ultimately quantum mechanical, Deutsch was naturally led to consider computing devices
based upon the principles of quantum mechanics. These devices, quantum analogues of
the machines defined forty-nine years earlier by Turing, led ultimately to the modern
conception of a quantum computer used in this book.
At the time of writing it is not clear whether Deutsch’s notion of a Universal Quantum Computer is sufficient to efficiently simulate an arbitrary physical system. Proving
or refuting this conjecture is one of the great open problems of the field of quantum
computation and quantum information. It is possible, for example, that some effect of
quantum field theory or an even more esoteric effect based in string theory, quantum
gravity or some other physical theory may take us beyond Deutsch’s Universal Quantum Computer, giving us a still more powerful model for computation. At this stage, we
simply don’t know.
What Deutsch’s model of a quantum computer did enable was a challenge to the strong
form of the Church–Turing thesis. Deutsch asked whether it is possible for a quantum
computer to efficiently solve computational problems which have no efficient solution on
a classical computer, even a probabilistic Turing machine. He then constructed a simple
example suggesting that, indeed, quantum computers might have computational powers
exceeding those of classical computers.
This remarkable first step taken by Deutsch was improved in the subsequent decade
by many people, culminating in Peter Shor’s 1994 demonstration that two enormously
important problems – the problem of finding the prime factors of an integer, and the so-
Global perspectives
7
called ‘discrete logarithm’ problem – could be solved efficiently on a quantum computer.
This attracted widespread interest because these two problems were and still are widely
believed to have no efficient solution on a classical computer. Shor’s results are a powerful indication that quantum computers are more powerful than Turing machines, even
probabilistic Turing machines. Further evidence for the power of quantum computers
came in 1995 when Lov Grover showed that another important problem – the problem of
conducting a search through some unstructured search space – could also be sped up on
a quantum computer. While Grover’s algorithm did not provide as spectacular a speedup as Shor’s algorithms, the widespread applicability of search-based methodologies has
excited considerable interest in Grover’s algorithm.
At about the same time as Shor’s and Grover’s algorithms were discovered, many
people were developing an idea Richard Feynman had suggested in 1982. Feynman had
pointed out that there seemed to be essential difficulties in simulating quantum mechanical systems on classical computers, and suggested that building computers based on
the principles of quantum mechanics would allow us to avoid those difficulties. In the
1990s several teams of researchers began fleshing this idea out, showing that it is indeed
possible to use quantum computers to efficiently simulate systems that have no known
efficient simulation on a classical computer. It is likely that one of the major applications
of quantum computers in the future will be performing simulations of quantum mechanical systems too difficult to simulate on a classical computer, a problem with profound
scientific and technological implications.
What other problems can quantum computers solve more quickly than classical computers? The short answer is that we don’t know. Coming up with good quantum algorithms seems to be hard. A pessimist might think that’s because there’s nothing quantum
computers are good for other than the applications already discovered! We take a different view. Algorithm design for quantum computers is hard because designers face two
difficult problems not faced in the construction of algorithms for classical computers.
First, our human intuition is rooted in the classical world. If we use that intuition as an
aid to the construction of algorithms, then the algorithmic ideas we come up with will
be classical ideas. To design good quantum algorithms one must ‘turn off’ one’s classical
intuition for at least part of the design process, using truly quantum effects to achieve
the desired algorithmic end. Second, to be truly interesting it is not enough to design an
algorithm that is merely quantum mechanical. The algorithm must be better than any
existing classical algorithm! Thus, it is possible that one may find an algorithm which
makes use of truly quantum aspects of quantum mechanics, that is nevertheless not of
widespread interest because classical algorithms with comparable performance characteristics exist. The combination of these two problems makes the construction of new
quantum algorithms a challenging problem for the future.
Even more broadly, we can ask if there are any generalizations we can make about the
power of quantum computers versus classical computers. What is it that makes quantum
computers more powerful than classical computers – assuming that this is indeed the
case? What class of problems can be solved efficiently on a quantum computer, and how
does that class compare to the class of problems that can be solved efficiently on a classical
computer? One of the most exciting things about quantum computation and quantum
information is how little is known about the answers to these questions! It is a great
challenge for the future to understand these questions better.
Having come up to the frontier of quantum computation, let’s switch to the history
8
Introduction and overview
of another strand of thought contributing to quantum computation and quantum information: information theory. At the same time computer science was exploding in the
1940s, another revolution was taking place in our understanding of communication. In
1948 Claude Shannon published a remarkable pair of papers laying the foundations for
the modern theory of information and communication.
Perhaps the key step taken by Shannon was to mathematically define the concept of
information. In many mathematical sciences there is considerable flexibility in the choice
of fundamental definitions. Try thinking naively for a few minutes about the following
question: how would you go about mathematically defining the notion of an information
source? Several different answers to this problem have found widespread use; however,
the definition Shannon came up with seems to be far and away the most fruitful in
terms of increased understanding, leading to a plethora of deep results and a theory
with a rich structure which seems to accurately reflect many (though not all) real-world
communications problems.
Shannon was interested in two key questions related to the communication of information over a communications channel. First, what resources are required to send
information over a communications channel? For example, telephone companies need
to know how much information they can reliably transmit over a given telephone cable.
Second, can information be transmitted in such a way that it is protected against noise
in the communications channel?
Shannon answered these two questions by proving the two fundamental theorems of
information theory. The first, Shannon’s noiseless channel coding theorem, quantifies
the physical resources required to store the output from an information source. Shannon’s second fundamental theorem, the noisy channel coding theorem, quantifies how
much information it is possible to reliably transmit through a noisy communications
channel. To achieve reliable transmission in the presence of noise, Shannon showed that
error-correcting codes could be used to protect the information being sent. Shannon’s
noisy channel coding theorem gives an upper limit on the protection afforded by errorcorrecting codes. Unfortunately, Shannon’s theorem does not explicitly give a practically
useful set of error-correcting codes to achieve that limit. From the time of Shannon’s papers until today, researchers have constructed more and better classes of error-correcting
codes in their attempts to come closer to the limit set by Shannon’s theorem. A sophisticated theory of error-correcting codes now exists offering the user a plethora of choices
in their quest to design a good error-correcting code. Such codes are used in a multitude
of places including, for example, compact disc players, computer modems, and satellite
communications systems.
Quantum information theory has followed with similar developments. In 1995, Ben
Schumacher provided an analogue to Shannon’s noiseless coding theorem, and in the
process defined the ‘quantum bit’ or ‘qubit’ as a tangible physical resource. However,
no analogue to Shannon’s noisy channel coding theorem is yet known for quantum information. Nevertheless, in analogy to their classical counterparts, a theory of quantum
error-correction has been developed which, as already mentioned, allows quantum computers to compute effectively in the presence of noise, and also allows communication
over noisy quantum channels to take place reliably.
Indeed, classical ideas of error-correction have proved to be enormously important
in developing and understanding quantum error-correcting codes. In 1996, two groups
working independently, Robert Calderbank and Peter Shor, and Andrew Steane, discov-
Global perspectives
9
ered an important class of quantum codes now known as CSS codes after their initials.
This work has since been subsumed by the stabilizer codes, independently discovered by
Robert Calderbank, Eric Rains, Peter Shor and Neil Sloane, and by Daniel Gottesman.
By building upon the basic ideas of classical linear coding theory, these discoveries greatly
facilitated a rapid understanding of quantum error-correcting codes and their application
to quantum computation and quantum information.
The theory of quantum error-correcting codes was developed to protect quantum states
against noise. What about transmitting ordinary classical information using a quantum
channel? How efficiently can this be done? A few surprises have been discovered in this
arena. In 1992 Charles Bennett and Stephen Wiesner explained how to transmit two
classical bits of information, while only transmitting one quantum bit from sender to
receiver, a result dubbed superdense coding.
Even more interesting are the results in distributed quantum computation. Imagine
you have two computers networked, trying to solve a particular problem. How much
communication is required to solve the problem? Recently it has been shown that quantum computers can require exponentially less communication to solve certain problems
than would be required if the networked computers were classical! Unfortunately, as yet
these problems are not especially important in a practical setting, and suffer from some
undesirable technical restrictions. A major challenge for the future of quantum computation and quantum information is to find problems of real-world importance for which
distributed quantum computation offers a substantial advantage over distributed classical
computation.
Let’s return to information theory proper. The study of information theory begins with
the properties of a single communications channel. In applications we often do not deal
with a single communications channel, but rather with networks of many channels. The
subject of networked information theory deals with the information carrying properties
of such networks of communications channels, and has been developed into a rich and
intricate subject.
By contrast, the study of networked quantum information theory is very much in its
infancy. Even for very basic questions we know little about the information carrying abilities of networks of quantum channels. Several rather striking preliminary results have
been found in the past few years; however, no unifying theory of networked information
theory exists for quantum channels. One example of networked quantum information
theory should suffice to convince you of the value such a general theory would have.
Imagine that we are attempting to send quantum information from Alice to Bob through
a noisy quantum channel. If that channel has zero capacity for quantum information,
then it is impossible to reliably send any information from Alice to Bob. Imagine instead
that we consider two copies of the channel, operating in synchrony. Intuitively it is clear
(and can be rigorously justified) that such a channel also has zero capacity to send quantum information. However, if we instead reverse the direction of one of the channels, as
illustrated in Figure 1.1, it turns out that sometimes we can obtain a non-zero capacity
for the transmission of information from Alice to Bob! Counter-intuitive properties like
this illustrate the strange nature of quantum information. Better understanding the information carrying properties of networks of quantum channels is a major open problem
of quantum computation and quantum information.
Let’s switch fields one last time, moving to the venerable old art and science of cryptography. Broadly speaking, cryptography is the problem of doing communication or
10
Introduction and overview
Figure 1.1. Classically, if we have two very noisy channels of zero capacity running side by side, then the combined
channel has zero capacity to send information. Not surprisingly, if we reverse the direction of one of the channels,
we still have zero capacity to send information. Quantum mechanically, reversing one of the zero capacity channels
can actually allow us to send information!
computation involving two or more parties who may not trust one another. The best
known cryptographic problem is the transmission of secret messages. Suppose two parties
wish to communicate in secret. For example, you may wish to give your credit card number to a merchant in exchange for goods, hopefully without any malevolent third party
intercepting your credit card number. The way this is done is to use a cryptographic
protocol. We’ll describe in detail how cryptographic protocols work later in the book, but
for now it will suffice to make a few simple distinctions. The most important distinction
is between private key cryptosystems and public key cryptosystems.
The way a private key cryptosystem works is that two parties, ‘Alice’ and ‘Bob’, wish
to communicate by sharing a private key, which only they know. The exact form of the
key doesn’t matter at this point – think of a string of zeroes and ones. The point is that
this key is used by Alice to encrypt the information she wishes to send to Bob. After
Alice encrypts she sends the encrypted information to Bob, who must now recover the
original information. Exactly how Alice encrypts the message depends upon the private
key, so that to recover the original message Bob needs to know the private key, in order
to undo the transformation Alice applied.
Unfortunately, private key cryptosystems have some severe problems in many contexts.
The most basic problem is how to distribute the keys? In many ways, the key distribution
problem is just as difficult as the original problem of communicating in private – a
malevolent third party may be eavesdropping on the key distribution, and then use the
intercepted key to decrypt some of the message transmission.
One of the earliest discoveries in quantum computation and quantum information was
that quantum mechanics can be used to do key distribution in such a way that Alice and
Bob’s security can not be compromised. This procedure is known as quantum cryptography or quantum key distribution. The basic idea is to exploit the quantum mechanical
principle that observation in general disturbs the system being observed. Thus, if there is
an eavesdropper listening in as Alice and Bob attempt to transmit their key, the presence
of the eavesdropper will be visible as a disturbance of the communications channel Alice
and Bob are using to establish the key. Alice and Bob can then throw out the key bits
established while the eavesdropper was listening in, and start over. The first quantum
cryptographic ideas were proposed by Stephen Wiesner in the late 1960s, but unfortu-
Global perspectives
11
nately were not accepted for publication! In 1984 Charles Bennett and Gilles Brassard,
building on Wiesner’s earlier work, proposed a protocol using quantum mechanics to
distribute keys between Alice and Bob, without any possibility of a compromise. Since
then numerous quantum cryptographic protocols have been proposed, and experimental
prototypes developed. At the time of this writing, the experimental prototypes are nearing
the stage where they may be useful in limited-scale real-world applications.
The second major type of cryptosystem is the public key cryptosystem. Public key
cryptosystems don’t rely on Alice and Bob sharing a secret key in advance. Instead, Bob
simply publishes a ‘public key’, which is made available to the general public. Alice
can make use of this public key to encrypt a message which she sends to Bob. What
is interesting is that a third party cannot use Bob’s public key to decrypt the message!
Strictly speaking, we shouldn’t say cannot. Rather, the encryption transformation is
chosen in a very clever and non-trivial way so that it is extremely difficult (though not
impossible) to invert, given only knowledge of the public key. To make inversion easy, Bob
has a secret key matched to his public key, which together enable him to easily perform
the decryption. This secret key is not known to anybody other than Bob, who can therefore
be confident that only he can read the contents of Alice’s transmission, to the extent that
it is unlikely that anybody else has the computational power to invert the encryption,
given only the public key. Public key cryptosystems solve the key distribution problem
by making it unnecessary for Alice and Bob to share a private key before communicating.
Rather remarkably, public key cryptography did not achieve widespread use until the
mid-1970s, when it was proposed independently by Whitfield Diffie and Martin Hellman,
and by Ralph Merkle, revolutionizing the field of cryptography. A little later, Ronald
Rivest, Adi Shamir, and Leonard Adleman developed the RSA cryptosystem, which
at the time of writing is the most widely deployed public key cryptosystem, believed to
offer a fine balance of security and practical usability. In 1997 it was disclosed that these
ideas – public key cryptography, the Diffie–Hellman and RSA cryptosystems – were
actually invented in the late 1960s and early 1970s by researchers working at the British
intelligence agency GCHQ.
The key to the security of public key cryptosystems is that it should be difficult to
invert the encryption stage if only the public key is available. For example, it turns out
that inverting the encryption stage of RSA is a problem closely related to factoring.
Much of the presumed security of RSA comes from the belief that factoring is a problem
hard to solve on a classical computer. However, Shor’s fast algorithm for factoring on
a quantum computer could be used to break RSA! Similarly, there are other public key
cryptosystems which can be broken if a fast algorithm for solving the discrete logarithm
problem – like Shor’s quantum algorithm for discrete logarithm – were known. This
practical application of quantum computers to the breaking of cryptographic codes has
excited much of the interest in quantum computation and quantum information.
We have been looking at the historical antecedents for quantum computation and
quantum information. Of course, as the field has grown and matured, it has sprouted
its own subfields of research, whose antecedents lie mainly within quantum computation
and quantum information.
Perhaps the most striking of these is the study of quantum entanglement. Entanglement is a uniquely quantum mechanical resource that plays a key role in many of the
most interesting applications of quantum computation and quantum information; entanglement is iron to the classical world’s bronze age. In recent years there has been a
12
Introduction and overview
tremendous effort trying to better understand the properties of entanglement considered
as a fundamental resource of Nature, of comparable importance to energy, information,
entropy, or any other fundamental resource. Although there is as yet no complete theory
of entanglement, some progress has been made in understanding this strange property of
quantum mechanics. It is hoped by many researchers that further study of the properties
of entanglement will yield insights that facilitate the development of new applications in
quantum computation and quantum information.
1.1.2 Future directions
We’ve looked at some of the history and present status of quantum computation and
quantum information. What of the future? What can quantum computation and quantum information offer to science, to technology, and to humanity? What benefits does
quantum computation and quantum information confer upon its parent fields of computer
science, information theory, and physics? What are the key open problems of quantum
computation and quantum information? We will make a few very brief remarks about
these overarching questions before moving onto more detailed investigations.
Quantum computation and quantum information has taught us to think physically
about computation, and we have discovered that this approach yields many new and
exciting capabilities for information processing and communication. Computer scientists
and information theorists have been gifted with a new and rich paradigm for exploration. Indeed, in the broadest terms we have learned that any physical theory, not just
quantum mechanics, may be used as the basis for a theory of information processing
and communication. The fruits of these explorations may one day result in information
processing devices with capabilities far beyond today’s computing and communications
systems, with concomitant benefits and drawbacks for society as a whole.
Quantum computation and quantum information certainly offer challenges aplenty
to physicists, but it is perhaps a little subtle what quantum computation and quantum
information offers to physics in the long term. We believe that just as we have learned to
think physically about computation, we can also learn to think computationally about
physics. Whereas physics has traditionally been a discipline focused on understanding
‘elementary’ objects and simple systems, many interesting aspects of Nature arise only
when things become larger and more complicated. Chemistry and engineering deal with
such complexity to some extent, but most often in a rather ad hoc fashion. One of
the messages of quantum computation and information is that new tools are available
for traversing the gulf between the small and the relatively complex: computation and
algorithms provide systematic means for constructing and understanding such systems.
Applying ideas from these fields is already beginning to yield new insights into physics.
It is our hope that this perspective will blossom in years to come into a fruitful way of
understanding all aspects of physics.
We’ve briefly examined some of the key motivations and ideas underlying quantum
computation and quantum information. Over the remaining sections of this chapter we
give a more technical but still accessible introduction to these motivations and ideas, with
the hope of giving you a bird’s-eye view of the field as it is presently poised.
Quantum bits
13
1.2 Quantum bits
The bit is the fundamental concept of classical computation and classical information.
Quantum computation and quantum information are built upon an analogous concept,
the quantum bit, or qubit for short. In this section we introduce the properties of single
and multiple qubits, comparing and contrasting their properties to those of classical bits.
What is a qubit? We’re going to describe qubits as mathematical objects with certain
specific properties. ‘But hang on’, you say, ‘I thought qubits were physical objects.’ It’s
true that qubits, like bits, are realized as actual physical systems, and in Section 1.5 and
Chapter 7 we describe in detail how this connection between the abstract mathematical
point of view and real systems is made. However, for the most part we treat qubits as
abstract mathematical objects. The beauty of treating qubits as abstract entities is that it
gives us the freedom to construct a general theory of quantum computation and quantum
information which does not depend upon a specific system for its realization.
What then is a qubit? Just as a classical bit has a state – either 0 or 1 – a qubit also
has a state. Two possible states for a qubit are the states |0 and |1, which as you might
guess correspond to the states 0 and 1 for a classical bit. Notation like ‘| ’ is called the
Dirac notation, and we’ll be seeing it often, as it’s the standard notation for states in
quantum mechanics. The difference between bits and qubits is that a qubit can be in a
state other than |0 or |1. It is also possible to form linear combinations of states, often
called superpositions:
|ψ = α |0 + β |1.
(1.1)
The numbers α and β are complex numbers, although for many purposes not much is
lost by thinking of them as real numbers. Put another way, the state of a qubit is a vector
in a two-dimensional complex vector space. The special states |0 and |1 are known as
computational basis states, and form an orthonormal basis for this vector space.
We can examine a bit to determine whether it is in the state 0 or 1. For example,
computers do this all the time when they retrieve the contents of their memory. Rather
remarkably, we cannot examine a qubit to determine its quantum state, that is, the
values of α and β. Instead, quantum mechanics tells us that we can only acquire much
more restricted information about the quantum state. When we measure a qubit we get
either the result 0, with probability |α|2 , or the result 1, with probability |β|2 . Naturally,
|α|2 + |β|2 = 1, since the probabilities must sum to one. Geometrically, we can interpret
this as the condition that the qubit’s state be normalized to length 1. Thus, in general a
qubit’s state is a unit vector in a two-dimensional complex vector space.
This dichotomy between the unobservable state of a qubit and the observations we
can make lies at the heart of quantum computation and quantum information. In most
of our abstract models of the world, there is a direct correspondence between elements
of the abstraction and the real world, just as an architect’s plans for a building are in
correspondence with the final building. The lack of this direct correspondence in quantum
mechanics makes it difficult to intuit the behavior of quantum systems; however, there
is an indirect correspondence, for qubit states can be manipulated and transformed in
ways which lead to measurement outcomes which depend distinctly on the different
properties of the state. Thus, these quantum states have real, experimentally verifiable
consequences, which we shall see are essential to the power of quantum computation and
quantum information.
14
Introduction and overview
The ability of a qubit to be in a superposition state runs counter to our ‘common sense’
understanding of the physical world around us. A classical bit is like a coin: either heads
or tails up. For imperfect coins, there may be intermediate states like having it balanced
on an edge, but those can be disregarded in the ideal case. By contrast, a qubit can exist
in a continuum of states between |0 and |1 – until it is observed. Let us emphasize
again that when a qubit is measured, it only ever gives ‘0’ or ‘1’ as the measurement
result – probabilistically. For example, a qubit can be in the state
1
1
√ |0 + √ |1 ,
2
2
(1.2)
√
which, when measured, gives the result 0 fifty percent (|1/ 2|2 ) of the time, and the
result 1 fifty percent of the time. We will return often to this state, which is sometimes
denoted |+.
Despite this strangeness, qubits are decidedly real, their existence and behavior extensively validated by experiments (discussed in Section 1.5 and Chapter 7), and many
different physical systems can be used to realize qubits. To get a concrete feel for how a
qubit can be realized it may be helpful to list some of the ways this realization may occur:
as the two different polarizations of a photon; as the alignment of a nuclear spin in a
uniform magnetic field; as two states of an electron orbiting a single atom such as shown
in Figure 1.2. In the atom model, the electron can exist in either the so-called ‘ground’
or ‘excited’ states, which we’ll call |0 and |1, respectively. By shining light on the atom,
with appropriate energy and for an appropriate length of time, it is possible to move
the electron from the |0 state to the |1 state and vice versa. But more interestingly, by
reducing the time we shine the light, an electron initially in the state |0 can be moved
‘halfway’ between |0 and |1, into the |+ state.
Figure 1.2. Qubit represented by two electronic levels in an atom.
Naturally, a great deal of attention has been given to the ‘meaning’ or ‘interpretation’
that might be attached to superposition states, and of the inherently probabilistic nature of
observations on quantum systems. However, by and large, we shall not concern ourselves
with such discussions in this book. Instead, our intent will be to develop mathematical
and conceptual pictures which are predictive.
One picture useful in thinking about qubits is the following geometric representation.
Quantum bits
Because |α|2 + |β|2 = 1, we may rewrite Equation (1.1) as
θ
θ
iγ
iϕ
|ψ = e
cos |0 + e sin |1 ,
2
2
15
(1.3)
where θ, ϕ and γ are real numbers. In Chapter 2 we will see that we can ignore the factor
of eiγ out the front, because it has no observable effects, and for that reason we can
effectively write
θ
θ
(1.4)
|ψ = cos |0 + eiϕ sin |1.
2
2
The numbers θ and ϕ define a point on the unit three-dimensional sphere, as shown in
Figure 1.3. This sphere is often called the Bloch sphere; it provides a useful means of
visualizing the state of a single qubit, and often serves as an excellent testbed for ideas
about quantum computation and quantum information. Many of the operations on single
qubits which we describe later in this chapter are neatly described within the Bloch sphere
picture. However, it must be kept in mind that this intuition is limited because there is
no simple generalization of the Bloch sphere known for multiple qubits.
|ñ
z
θ
|ψ
y
x
ϕ
|ñ
Figure 1.3. Bloch sphere representation of a qubit.
How much information is represented by a qubit? Paradoxically, there are an infinite
number of points on the unit sphere, so that in principle one could store an entire text
of Shakespeare in the infinite binary expansion of θ. However, this conclusion turns
out to be misleading, because of the behavior of a qubit when observed. Recall that
measurement of a qubit will give only either 0 or 1. Furthermore, measurement changes
the state of a qubit, collapsing it from its superposition of |0 and |1 to the specific state
consistent with the measurement result. For example, if measurement of |+ gives 0,
then the post-measurement state of the qubit will be |0. Why does this type of collapse
occur? Nobody knows. As discussed in Chapter 2, this behavior is simply one of the
fundamental postulates of quantum mechanics. What is relevant for our purposes is that
from a single measurement one obtains only a single bit of information about the state of
the qubit, thus resolving the apparent paradox. It turns out that only if infinitely many
16
Introduction and overview
identically prepared qubits were measured would one be able to determine α and β for
a qubit in the state given in Equation (1.1).
But an even more interesting question to ask might be: how much information is
represented by a qubit if we do not measure it? This is a trick question, because how
can one quantify information if it cannot be measured? Nevertheless, there is something
conceptually important here, because when Nature evolves a closed quantum system of
qubits, not performing any ‘measurements’, she apparently does keep track of all the
continuous variables describing the state, like α and β. In a sense, in the state of a qubit,
Nature conceals a great deal of ‘hidden information’. And even more interestingly, we will
see shortly that the potential amount of this extra ‘information’ grows exponentially with
the number of qubits. Understanding this hidden quantum information is a question
that we grapple with for much of this book, and which lies at the heart of what makes
quantum mechanics a powerful tool for information processing.
1.2.1 Multiple qubits
Hilbert space is a big place.
– Carlton Caves
Suppose we have two qubits. If these were two classical bits, then there would be four
possible states, 00, 01, 10, and 11. Correspondingly, a two qubit system has four computational basis states denoted |00, |01, |10, |11. A pair of qubits can also exist in
superpositions of these four states, so the quantum state of two qubits involves associating
a complex coefficient – sometimes called an amplitude – with each computational basis
state, such that the state vector describing the two qubits is
|ψ = α00 |00 + α01 |01 + α10 |10 + α11 |11.
(1.5)
Similar to the case for a single qubit, the measurement result x (= 00, 01, 10 or 11) occurs
with probability |αx |2 , with the state of the qubits after the measurement being |x. The
condition that probabilities sum to one is therefore expressed by the normalization
condition that x∈{0,1}2 |αx |2 = 1, where the notation ‘{0, 1}2 ’ means ‘the set of strings
of length two with each letter being either zero or one’. For a two qubit system, we could
measure just a subset of the qubits, say the first qubit, and you can probably guess how
this works: measuring the first qubit alone gives 0 with probability |α00 |2 + |α01 |2 , leaving
the post-measurement state
α00 |00 + α01 |01
.
|ψ ′ =
|α00 |2 + |α01 |2
(1.6)
Note how the post-measurement state is re-normalized by the factor |α00 |2 + |α01 |2
so that it still satisfies the normalization condition, just as we expect for a legitimate
quantum state.
An important two qubit state is the Bell state or EPR pair,
|00 + |11
√
.
2
(1.7)
This innocuous-looking state is responsible for many surprises in quantum computation
Quantum computation
17
and quantum information. It is the key ingredient in quantum teleportation and superdense coding, which we’ll come to in Section 1.3.7 and Section 2.3, respectively, and
the prototype for many other interesting quantum states. The Bell state has the property
that upon measuring the first qubit, one obtains two possible results: 0 with probability
1/2, leaving the post-measurement state |ϕ′ = |00, and 1 with probability 1/2, leaving
|ϕ′ = |11. As a result, a measurement of the second qubit always gives the same result
as the measurement of the first qubit. That is, the measurement outcomes are correlated.
Indeed, it turns out that other types of measurements can be performed on the Bell
state, by first applying some operations to the first or second qubit, and that interesting
correlations still exist between the result of a measurement on the first and second qubit.
These correlations have been the subject of intense interest ever since a famous paper
by Einstein, Podolsky and Rosen, in which they first pointed out the strange properties
of states like the Bell state. EPR’s insights were taken up and greatly improved by John
Bell, who proved an amazing result: the measurement correlations in the Bell state are
stronger than could ever exist between classical systems. These results, described in detail in Section 2.6, were the first intimation that quantum mechanics allows information
processing beyond what is possible in the classical world.
More generally, we may consider a system of n qubits. The computational basis states
of this system are of the form |x1 x2 . . . xn , and so a quantum state of such a system
is specified by 2n amplitudes. For n = 500 this number is larger than the estimated
number of atoms in the Universe! Trying to store all these complex numbers would not
be possible on any conceivable classical computer. Hilbert space is indeed a big place.
In principle, however, Nature manipulates such enormous quantities of data, even for
systems containing only a few hundred atoms. It is as if Nature were keeping 2500 hidden
pieces of scratch paper on the side, on which she performs her calculations as the system
evolves. This enormous potential computational power is something we would very much
like to take advantage of. But how can we think of quantum mechanics as computation?
1.3 Quantum computation
Changes occurring to a quantum state can be described using the language of quantum
computation. Analogous to the way a classical computer is built from an electrical circuit
containing wires and logic gates, a quantum computer is built from a quantum circuit
containing wires and elementary quantum gates to carry around and manipulate the
quantum information. In this section we describe some simple quantum gates, and present
several example circuits illustrating their application, including a circuit which teleports
qubits!
1.3.1 Single qubit gates
Classical computer circuits consist of wires and logic gates. The wires are used to carry
information around the circuit, while the logic gates perform manipulations of the information, converting it from one form to another. Consider, for example, classical single bit
gate, whose operation
logic gates. The only non-trivial member of this class is the
is defined by its truth table, in which 0 → 1 and 1 → 0, that is, the 0 and 1 states are
interchanged.
gate for qubits be defined? Imagine that we had
Can an analogous quantum
some process which took the state |0 to the state |1, and vice versa. Such a process
18
Introduction and overview
would obviously be a good candidate for a quantum analogue to the
gate. However,
specifying the action of the gate on the states |0 and |1 does not tell us what happens to
superpositions of the states |0 and |1, without further knowledge about the properties
gate acts linearly, that is, it takes the state
of quantum gates. In fact, the quantum
α|0 + β|1
(1.8)
to the corresponding state in which the role of |0 and |1 have been interchanged,
α|1 + β|0.
(1.9)
gate acts linearly and not in some nonlinear fashion is a very
Why the quantum
interesting question, and the answer is not at all obvious. It turns out that this linear
behavior is a general property of quantum mechanics, and very well motivated empirically;
moreover, nonlinear behavior can lead to apparent paradoxes such as time travel, fasterthan-light communication, and violations of the second laws of thermodynamics. We’ll
explore this point in more depth in later chapters, but for now we’ll just take it as given.
gate in matrix form,
There is a convenient way of representing the quantum
which follows directly from the linearity of quantum gates. Suppose we define a matrix
X to represent the quantum
gate as follows:
0 1
X≡
.
(1.10)
1 0
(The notation X for the quantum
is used for historical reasons.) If the quantum
state α|0 + β|1 is written in a vector notation as
α
,
(1.11)
β
with the top entry corresponding to the amplitude for |0 and the bottom entry the
gate is
amplitude for |1, then the corresponding output from the quantum
β
α
.
(1.12)
=
X
α
β
Notice that the action of the
gate is to take the state |0 and replace it by the state
corresponding to the first column of the matrix X. Similarly, the state |1 is replaced by
the state corresponding to the second column of the matrix X.
So quantum gates on a single qubit can be described by two by two matrices. Are there
any constraints on what matrices may be used as quantum gates? It turns out that there
are. Recall that the normalization condition requires |α|2 + |β|2 = 1 for a quantum state
α|0 + β|1. This must also be true of the quantum state |ψ ′ = α′ |0 + β ′ |1 after the
gate has acted. It turns out that the appropriate condition on the matrix representing the
gate is that the matrix U describing the single qubit gate be unitary, that is U † U = I,
where U † is the adjoint of U (obtained by transposing and then complex conjugating
gate it is easy to
U ), and I is the two by two identity matrix. For example, for the
†
verify that X X = I.
Amazingly, this unitarity constraint is the only constraint on quantum gates. Any
unitary matrix specifies a valid quantum gate! The interesting implication is that in
contrast to the classical case, where only one non-trivial single bit gate exists – the
19
Quantum computation
|ñ
z
z
y
x
z
y
x
y
x
+
|ñ
√
Figure 1.4. Visualization of the Hadamard gate on the Bloch sphere, acting on the input state (|0 + |1)/ 2.
gate – there are many non-trivial single qubit gates. Two important ones which we shall
use later are the Z gate:
1
0
,
(1.13)
Z≡
0 −1
which leaves |0 unchanged, and flips the sign of |1 to give −|1, and the Hadamard
gate,
1
1
1
.
(1.14)
H≡√
2 1 −1
This gate is sometimes√described as being like a ‘square-root of
’ gate, in that it turns
a |0 into (|0 + |1)/
√ 2 (first column of H), ‘halfway’ between |0 and |1, and turns
|1 into (|0 − |1)/ 2 (second column of H), which is also ‘halfway’ between |0 and
gate, as simple algebra shows that H 2 = I, and
|1. Note, however, that H 2 is not a
thus applying H twice to a state does nothing to it.
The Hadamard gate is one of the most useful quantum gates, and it is worth trying to
visualize its operation by considering the Bloch sphere picture. In this picture, it turns
out that single qubit gates correspond to rotations and reflections of the sphere. The
Hadamard operation is just a rotation of the sphere about the ŷ axis by 90◦ , followed by
a rotation about the x̂ axis by 180◦ , as illustrated in Figure 1.4. Some important single
qubit gates are shown in Figure 1.5, and contrasted with the classical case.
N
N
Figure 1.5. Single bit (left) and qubit (right) logic gates.
There are infinitely many two by two unitary matrices, and thus infinitely many single
20
Introduction and overview
qubit gates. However, it turns out that the properties of the complete set can be understood from the properties of a much smaller set. For example, as explained in Box 1.1,
an arbitrary single qubit unitary gate can be decomposed as a product of rotations
cos γ2 − sin γ2
,
(1.15)
cos γ2
sin γ2
and a gate which we’ll later understand as being a rotation about the ẑ axis,
−iβ/2
e
0
,
0
eiβ/2
(1.16)
together with a (global) phase shift – a constant multiplier of the form eiα . These gates
can be broken down further – we don’t need to be able to do these gates for arbitrary
α, β and γ, but can build arbitrarily good approximations to such gates using only certain
special fixed values of α, β and γ. In this way it is possible to build up an arbitrary single
qubit gate using a finite set of quantum gates. More generally, an arbitrary quantum
computation on any number of qubits can be generated by a finite set of gates that is said
to be universal for quantum computation. To obtain such a universal set we first need
to introduce some quantum gates involving multiple qubits.
Box 1.1: Decomposing single qubit operations
In Section 4.2 starting on page 174 we prove that an arbitrary 2×2 unitary matrix
may be decomposed as
−iβ/2
−iδ/2
cos γ2 − sin γ2
e
0
e
0
, (1.17)
,
U = eiα
sin γ2
cos γ2
0
eiβ/2
0
eiδ/2
where α, β, γ, and δ are real-valued. Notice that the second matrix is just an
ordinary rotation. It turns out that the first and last matrices can also be understood
as rotations in a different plane. This decomposition can be used to give an exact
prescription for performing an arbitrary single qubit quantum logic gate.
1.3.2 Multiple qubit gates
Now let us generalize from one to multiple qubits. Figure 1.6 shows five notable multiple
, ,
(exclusive- ),
and
gates. An important
bit classical gates, the
theoretical result is that any function on bits can be computed from the composition of
gates alone, which is thus known as a universal gate. By contrast, the
alone or
is not universal. One way of seeing this is to note that applying
even together with
gate does not change the total parity of the bits. As a result, any circuit involving
an
and
gates will, if two inputs x and y have the same parity, give outputs
only
with the same parity, restricting the class of functions which may be computed, and thus
precluding universality.
or
gate.
The prototypical multi-qubit quantum logic gate is the controlledThis gate has two input qubits, known as the control qubit and the target qubit, respecis shown in the top right of Figure 1.6;
tively. The circuit representation for the
the top line represents the control qubit, while the bottom line represents the target
Quantum computation
=
NOT
=
>
=
=
>
21
= AND >
=
>
= OR >
= XOR >
=
>
= NAND >
=
>
= NOR >
Figure 1.6. On the left are some standard single and multiple bit gates, while on the right is the prototypical
. The matrix representation of the controlled, UCN , is written with
multiple qubit gate, the controlledrespect to the amplitudes for |00, |01, |10, and |11, in that order.
qubit. The action of the gate may be described as follows. If the control qubit is set to
0, then the target qubit is left alone. If the control qubit is set to 1, then the target qubit
is flipped. In equations:
|00 → |00; |01 → |01; |10 → |11; |11 → |10.
(1.18)
is as a generalization of the classical
gate, since
Another way of describing the
the action of the gate may be summarized as |A, B → |A, B ⊕ A, where ⊕ is addition
gate does. That is, the control qubit and the
modulo two, which is exactly what the
ed and stored in the target qubit.
target qubit are
is to give a matrix represenYet another way of describing the action of the
tation, as shown in the bottom right of Figure 1.6. You can easily verify that the first
column of UCN describes the transformation that occurs to |00, and similarly for the
other computational basis states, |01, |10, and |11. As for the single qubit case, the
requirement that probability be conserved is expressed in the fact that UCN is a unitary
†
UCN = I.
matrix, that is, UCN
can be regarded as a type of generalizedgate. Can
We noticed that the
other classical gates such as the
or the regular
gate be understood as unitary
gate represents the classical
gates in a sense similar to the way the quantum
gate? It turns out that this is not possible. The reason is because the
and
gates
are essentially irreversible or non-invertible. For example, given the output A ⊕ B from
gate, it is not possible to determine what the inputs A and B were; there is an
an
gate.
irretrievable loss of information associated with the irreversible action of the
On the other hand, unitary quantum gates are always invertible, since the inverse of a
unitary matrix is also a unitary matrix, and thus a quantum gate can always be inverted
by another quantum gate. Understanding how to do classical logic in this reversible or
invertible sense will be a crucial step in understanding how to harness the power of
22
Introduction and overview
quantum mechanics for computation. We’ll explain the basic idea of how to do reversible
computation in Section 1.4.1.
.
Of course, there are many interesting quantum gates other than the controlledand single qubit gates are the prototypes for all
However, in a sense the controlledother gates because of the following remarkable universality result: Any multiple qubit
and single qubit gates. The proof is given in
logic gate may be composed from
gate.
Section 4.5, and is the quantum parallel of the universality of the
1.3.3 Measurements in bases other than the computational basis
We’ve described quantum measurements of a single qubit in the state α|0 + β|1 as
yielding the result 0 or 1 and leaving the qubit in the corresponding state |0 or |1,
with respective probabilities |α|2 and |β|2 . In fact, quantum mechanics allows somewhat
more versatility in the class of measurements that may be performed, although certainly
nowhere near enough to recover α and β from a single measurement!
Note that the states |0 and |1 represent just one of many possible choices
of basis
√
states for a √
qubit. Another possible choice is the set |+ ≡ (|0 + |1)/ 2 and |− ≡
(|0 − |1)/ 2. An arbitrary state |ψ = α|0 + β|1 can be re-expressed in terms of the
states |+ and |−:
|ψ = α|0 + β|1 = α
|+ − |− α + β
α−β
|+ + |−
√
√
+β
= √ |+ + √ |−. (1.19)
2
2
2
2
It turns out that it is possible to treat the |+ and |− states as though they were the computational basis states, and measure with respect to this new basis. Naturally, measuring
with respect to the |+, |− basis results in the result ‘+’ with probability |α + β|2 /2 and
the result ‘−’ with probability |α − β|2 /2, with corresponding post-measurement states
|+ and |−, respectively.
More generally, given any basis states |a and |b for a qubit, it is possible to express an
arbitrary state as a linear combination α|a + β|b of those states. Furthermore, provided
the states are orthonormal, it is possible to perform a measurement with respect to
the |a, |b basis, giving the result a with probability |α|2 and b with probability |β|2 .
The orthonormality constraint is necessary in order that |α|2 + |β|2 = 1 as we expect for
probabilities. In an analogous way it is possible in principle to measure a quantum system
of many qubits with respect to an arbitrary orthonormal basis. However, just because it
is possible in principle does not mean that such a measurement can be done easily, and
we return later to the question of how efficiently a measurement in an arbitrary basis can
be performed.
There are many reasons for using this extended formalism for quantum measurements, but ultimately the best one is this: the formalism allows us to describe observed
experimental results, as we will see in our discussion of the Stern–Gerlach experiment
in Section 1.5.1. An even more sophisticated and convenient (but essentially equivalent)
formalism for describing quantum measurements is described in the next chapter, in
Section 2.2.3.
1.3.4 Quantum circuits
We’ve already met a few simple quantum circuits. Let’s look in a little more detail at
the elements of a quantum circuit. A simple quantum circuit containing three quantum
gates is shown in Figure 1.7. The circuit is to be read from left-to-right. Each line
Quantum computation
23
in the circuit represents a wire in the quantum circuit. This wire does not necessarily
correspond to a physical wire; it may correspond instead to the passage of time, or perhaps
to a physical particle such as a photon – a particle of light – moving from one location
to another through space. It is conventional to assume that the state input to the circuit
is a computational basis state, usually the state consisting of all |0s. This rule is broken
frequently in the literature on quantum computation and quantum information, but it is
considered polite to inform the reader when this is the case.
The circuit in Figure 1.7 accomplishes a simple but useful task – it swaps the states
of the two qubits. To see that this circuit accomplishes the swap operation, note that the
sequence of gates has the following sequence of effects on a computational basis state
|a, b,
|a, b −→ |a, a ⊕ b
−→ |a ⊕ (a ⊕ b), a ⊕ b = |b, a ⊕ b
−→ |b, (a ⊕ b) ⊕ b = |b, a ,
(1.20)
where all additions are done modulo 2. The effect of the circuit, therefore, is to interchange the state of the two qubits.
Figure 1.7. Circuit swapping two qubits, and an equivalent schematic symbol notation for this common and useful
circuit.
There are a few features allowed in classical circuits that are not usually present in
quantum circuits. First of all, we don’t allow ‘loops’, that is, feedback from one part of the
quantum circuit to another; we say the circuit is acyclic. Second, classical circuits allow
, with the resulting single wire
wires to be ‘joined’ together, an operation known as
of the inputs. Obviously this operation is not reversible and
containing the bitwise
in our quantum circuits. Third, the inverse
therefore not unitary, so we don’t allow
, whereby several copies of a bit are produced is also not allowed in
operation,
quantum circuits. In fact, it turns out that quantum mechanics forbids the copying of a
operation impossible! We’ll see an example of this in the next
qubit, making the
section when we attempt to design a circuit to copy a qubit.
As we proceed we’ll introduce new quantum gates as needed. It’s convenient to introduce another convention about quantum circuits at this point. This convention is
illustrated in Figure 1.8. Suppose U is any unitary matrix acting on some number n of
qubits, so U can be regarded as a quantum gate on those qubits. Then we can define a
gate. Such a gate
controlled-U gate which is a natural extension of the controlledhas a single control qubit, indicated by the line with the black dot, and n target qubits,
indicated by the boxed U . If the control qubit is set to 0 then nothing happens to the
target qubits. If the control qubit is set to 1 then the gate U is applied to the target qubits.
gate, which is
The prototypical example of the controlled-U gate is the controlleda controlled-U gate with U = X, as illustrated in Figure 1.9.
Another important operation is measurement, which we represent by a ‘meter’ symbol,
24
Introduction and overview
Figure 1.8. Controlled-U gate.
Figure 1.9. Two different representations for the controlled-
.
as shown in Figure 1.10. As previously described, this operation converts a single qubit
state |ψ = α|0 + β|1 into a probabilistic classical bit M (distinguished from a qubit by
drawing it as a double-line wire), which is 0 with probability |α|2 , or 1 with probability
|β|2 .
❴✤✤ ❴ ❴ ❴ ❴ ❴ L ❴ ❴ ✤✤
✙
✤✤
✤✤
✙✙
✤✤
✤✤✤
✙✙ ✙
❴ ❴ ❴ ❴ ❴ ❴ ❴ ❴ ✤
Figure 1.10. Quantum circuit symbol for measurement.
We shall find quantum circuits useful as models of all quantum processes, including
but not limited to computation, communication, and even quantum noise. Several simple
examples illustrate this below.
1.3.5 Qubit copying circuit?
The
gate is useful for demonstrating one particularly fundamental property of
quantum information. Consider the task of copying a classical bit. This may be done
gate, which takes in the bit to copy (in some unknown state x)
using a classical
and a ‘scratchpad’ bit initialized to zero, as illustrated in Figure 1.11. The output is two
bits, both of which are in the same state x.
Suppose we try to copy a qubit in the unknown state |ψ = a |0 + b |1 in the same
manner by using a
gate. The input state of the two qubits may be written as
a |0 + b |1 |0 = a |00 + b |10,
(1.21)
is to negate the second qubit when the first qubit is 1, and thus
The function of
the output is simply a |00 + b |11. Have we successfully copied |ψ? That is, have we
created the state |ψ|ψ? In the case where |ψ = |0 or |ψ = |1 that is indeed what this
circuit does; it is possible to use quantum circuits to copy classical information encoded
as a |0 or a |1. However, for a general state |ψ we see that
|ψ|ψ = a2 |00 + ab|01 + ab|10 + b2 |11.
(1.22)
Quantum computation
N
N
N
N
O
NÅ O
N
25
Figure 1.11. Classical and quantum circuits to ‘copy’ an unknown bit or qubit.
Comparing with a|00 + b|11, we see that unless ab = 0 the ‘copying circuit’ above does
not copy the quantum state input. In fact, it turns out to be impossible to make a copy
of an unknown quantum state. This property, that qubits cannot be copied, is known
as the no-cloning theorem, and it is one of the chief differences between quantum and
classical information. The no-cloning theorem is discussed at more length in Box 12.1
on page 532; the proof is very simple, and we encourage you to skip ahead and read the
proof now.
There is another way of looking at the failure of the circuit in Figure 1.11, based on
the intuition that a qubit somehow contains ‘hidden’ information not directly accessible
to measurement. Consider what happens when we measure one of the qubits of the state
a|00 + b|11. As previously described, we obtain either 0 or 1 with probabilities |a|2
and |b|2 . However, once one qubit is measured, the state of the other one is completely
determined, and no additional information can be gained about a and b. In this sense, the
extra hidden information carried in the original qubit |ψ was lost in the first measurement, and cannot be regained. If, however, the qubit had been copied, then the state of
the other qubit should still contain some of that hidden information. Therefore, a copy
cannot have been created.
1.3.6 Example: Bell states
Let’s consider a slightly more complicated circuit, shown in Figure 1.12, which has a
, and transforms the four computational basis states
Hadamard gate followed by a
according to the table √
given. As an explicit example, the Hadamard gate takes the input
√
gives the output state (|00 + |11)/ 2.
|00 to (|0 + |1)|0/ 2, and then the
Note how this works: first, the Hadamard transform puts the top qubit in a superposition;
, and the target gets inverted only when the
this then acts as a control input to the
control is 1. The output states
|00 + |11
√
;
2
|01 + |10
√
|β01 =
;
2
|00 − |11
√
; and
|β10 =
2
|01 − |10
√
|β11 =
,
2
|β00 =
(1.23)
(1.24)
(1.25)
(1.26)
are known as the Bell states, or sometimes the EPR states or EPR pairs, after some of
the people – Bell, and Einstein, Podolsky, and Rosen – who first pointed out the strange
properties of states like these. The mnemonic notation |β00 , |β01 , |β10 , |β11 may be
26
Introduction and overview
understood via the equations
|βxy ≡
|0, y + (−1)x |1, ȳ
√
,
2
(1.27)
where ȳ is the negation of y.
In
|00
|01
|10
|11
Out
√
(|00 + |11)/ 2 ≡ |β00
√
(|01 + |10)/ 2 ≡ |β01
√
(|00 − |11)/ 2 ≡ |β10
√
(|01 − |10)/ 2 ≡ |β11
Figure 1.12. Quantum circuit to create Bell states, and its input–ouput quantum ‘truth table’.
1.3.7 Example: quantum teleportation
We will now apply the techniques of the last few pages to understand something nontrivial, surprising, and a lot of fun – quantum teleportation! Quantum teleportation is a
technique for moving quantum states around, even in the absence of a quantum communications channel linking the sender of the quantum state to the recipient.
Here’s how quantum teleportation works. Alice and Bob met long ago but now live
far apart. While together they generated an EPR pair, each taking one qubit of the EPR
pair when they separated. Many years later, Bob is in hiding, and Alice’s mission, should
she choose to accept it, is to deliver a qubit |ψ to Bob. She does not know the state of
the qubit, and moreover can only send classical information to Bob. Should Alice accept
the mission?
Intuitively, things look pretty bad for Alice. She doesn’t know the state |ψ of the
qubit she has to send to Bob, and the laws of quantum mechanics prevent her from
determining the state when she only has a single copy of |ψ in her possession. What’s
worse, even if she did know the state |ψ, describing it precisely takes an infinite amount
of classical information since |ψ takes values in a continuous space. So even if she did
know |ψ, it would take forever for Alice to describe the state to Bob. It’s not looking
good for Alice. Fortunately for Alice, quantum teleportation is a way of utilizing the
entangled EPR pair in order to send |ψ to Bob, with only a small overhead of classical
communication.
In outline, the steps of the solution are as follows: Alice interacts the qubit |ψ with
her half of the EPR pair, and then measures the two qubits in her possession, obtaining
one of four possible classical results, 00, 01, 10, and 11. She sends this information to
Bob. Depending on Alice’s classical message, Bob performs one of four operations on his
half of the EPR pair. Amazingly, by doing this he can recover the original state |ψ!
The quantum circuit shown in Figure 1.13 gives a more precise description of quantum
teleportation. The state to be teleported is |ψ = α|0+β|1, where α and β are unknown
amplitudes. The state input into the circuit |ψ0 is
|ψ0 = |ψ|β00
(1.28)
Quantum computation
❴✤✤ ❴ ❴ ❴ ❴ ❴ L
✤✤
✙✙
✤✤
✙✙
✤❴ ❴ ❴ ❴ ❴ ✙ ✙ ❴
❴✤✤ ❴ ❴ ❴ ❴ ❴ L
✤✤
✙✙
✤✤
✙✙
✤❴ ❴ ❴ ❴ ❴ ✙ ✙ ❴
27
❴ ❴ ✤✤
✤✤
✤
✤
❴ ❴ ✤
❴ ❴ ✤✤
✤✤✤
✤
❴ ❴ ✤
Figure 1.13. Quantum circuit for teleporting a qubit. The two top lines represent Alice’s system, while the bottom
line is Bob’s system. The meters represent measurement, and the double lines coming out of them carry classical
bits (recall that single lines denote qubits).
1
= √ α|0(|00 + |11) + β|1(|00 + |11) ,
2
(1.29)
where we use the convention that the first two qubits (on the left) belong to Alice, and
the third qubit to Bob. As we explained previously, Alice’s second qubit and Bob’s qubit
gate, obtaining
start out in an EPR state. Alice sends her qubits through a
1
(1.30)
|ψ1 = √ α|0(|00 + |11) + β|1(|10 + |01) .
2
She then sends the first qubit through a Hadamard gate, obtaining
1
α(|0 + |1)(|00 + |11) + β(|0 − |1)(|10 + |01) .
|ψ2 =
2
This state may be re-written in the following way, simply by regrouping terms:
1
|ψ2 =
|00 α|0 + β|1 + |01 α|1 + β|0
2
+ |10 α|0 − β|1 + |11 α|1 − β|0 .
(1.31)
(1.32)
This expression naturally breaks down into four terms. The first term has Alice’s qubits
in the state |00, and Bob’s qubit in the state α|0 + β|1 – which is the original state
|ψ. If Alice performs a measurement and obtains the result 00 then Bob’s system will
be in the state |ψ. Similarly, from the previous expression we can read off Bob’s postmeasurement state, given the result of Alice’s measurement:
00 −→ |ψ3 (00) ≡ α|0 + β|1
(1.33)
(1.34)
01 −→ |ψ3 (01) ≡ α|1 + β|0
10 −→ |ψ3 (10) ≡ α|0 − β|1
(1.35)
(1.36)
11 −→ |ψ3 (11) ≡ α|1 − β|0 .
Depending on Alice’s measurement outcome, Bob’s qubit will end up in one of these
four possible states. Of course, to know which state it is in, Bob must be told the result of
Alice’s measurement – we will show later that it is this fact which prevents teleportation
28
Introduction and overview
from being used to transmit information faster than light. Once Bob has learned the measurement outcome, Bob can ‘fix up’ his state, recovering |ψ, by applying the appropriate
quantum gate. For example, in the case where the measurement yields 00, Bob doesn’t
need to do anything. If the measurement is 01 then Bob can fix up his state by applying
the X gate. If the measurement is 10 then Bob can fix up his state by applying the Z
gate. If the measurement is 11 then Bob can fix up his state by applying first an X and
then a Z gate. Summing up, Bob needs to apply the transformation Z M1 X M2 (note how
time goes from left to right in circuit diagrams, but in matrix products terms on the right
happen first) to his qubit, and he will recover the state |ψ.
There are many interesting features of teleportation, some of which we shall return
to later in the book. For now we content ourselves with commenting on a couple of
aspects. First, doesn’t teleportation allow one to transmit quantum states faster than
light? This would be rather peculiar, because the theory of relativity implies that faster
than light information transfer could be used to send information backwards in time.
Fortunately, quantum teleportation does not enable faster than light communication,
because to complete the teleportation Alice must transmit her measurement result to
Bob over a classical communications channel. We will show in Section 2.4.3 that without
this classical communication, teleportation does not convey any information at all. The
classical channel is limited by the speed of light, so it follows that quantum teleportation
cannot be accomplished faster than the speed of light, resolving the apparent paradox.
A second puzzle about teleportation is that it appears to create a copy of the quantum state being teleported, in apparent violation of the no-cloning theorem discussed in
Section 1.3.5. This violation is only illusory since after the teleportation process only the
target qubit is left in the state |ψ, and the original data qubit ends up in one of the
computational basis states |0 or |1, depending upon the measurement result on the first
qubit.
What can we learn from quantum teleportation? Quite a lot! It’s much more than
just a neat trick one can do with quantum states. Quantum teleportation emphasizes the
interchangeability of different resources in quantum mechanics, showing that one shared
EPR pair together with two classical bits of communication is a resource at least the
equal of one qubit of communication. Quantum computation and quantum information
has revealed a plethora of methods for interchanging resources, many built upon quantum
teleportation. In particular, in Chapter 10 we explain how teleportation can be used to
build quantum gates which are resistant to the effects of noise, and in Chapter 12 we show
that teleportation is intimately connected with the properties of quantum error-correcting
codes. Despite these connections with other subjects, it is fair to say that we are only
beginning to understand why it is that quantum teleportation is possible in quantum
mechanics; in later chapters we endeavor to explain some of the insights that make such
an understanding possible.
1.4 Quantum algorithms
What class of computations can be performed using quantum circuits? How does that class
compare with the computations which can be performed using classical logical circuits?
Can we find a task which a quantum computer may perform better than a classical
computer? In this section we investigate these questions, explaining how to perform
classical computations on quantum computers, giving some examples of problems for
Quantum algorithms
29
which quantum computers offer an advantage over classical computers, and summarizing
the known quantum algorithms.
1.4.1 Classical computations on a quantum computer
Can we simulate a classical logic circuit using a quantum circuit? Not surprisingly, the
answer to this question turns out to be yes. It would be very surprising if this were not
the case, as physicists believe that all aspects of the world around us, including classical
logic circuits, can ultimately be explained using quantum mechanics. As pointed out
earlier, the reason quantum circuits cannot be used to directly simulate classical circuits
is because unitary quantum logic gates are inherently reversible, whereas many classical
gate are inherently irreversible.
logic gates such as the
Any classical circuit can be replaced by an equivalent circuit containing only reversible
elements, by making use of a reversible gate known as the Toffoli gate. The Toffoli gate
has three input bits and three output bits, as illustrated in Figure 1.14. Two of the bits are
control bits that are unaffected by the action of the Toffoli gate. The third bit is a target
bit that is flipped if both control bits are set to 1, and otherwise is left alone. Note that
applying the Toffoli gate twice to a set of bits has the effect (a, b, c) → (a, b, c ⊕ ab) →
(a, b, c), and thus the Toffoli gate is a reversible gate, since it has an inverse – itself.
Inputs
a b c
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
Outputs
a′ b′ c′
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 1
1 1 0
Figure 1.14. Truth table for the Toffoli gate, and its circuit representation.
gates, as shown in Figure 1.15, and
The Toffoli gate can be used to simulate
, as shown in Figure 1.16. With these two operations it
can also be used to do
becomes possible to simulate all other elements in a classical circuit, and thus an arbitrary
classical circuit can be simulated by an equivalent reversible circuit.
The Toffoli gate has been described as a classical gate, but it can also be implemented
as a quantum logic gate. By definition, the quantum logic implementation of the Toffoli
gate simply permutes computational basis states in the same way as the classical Toffoli
gate. For example, the quantum Toffoli gate acting on the state |110 flips the third qubit
because the first two are set, resulting in the state |111. It is tedious but not difficult
to write this transformation out as an 8 by 8 matrix, U , and verify explicitly that U is
a unitary matrix, and thus the Toffoli gate is a legitimate quantum gate. The quantum
Toffoli gate can be used to simulate irreversible classical logic gates, just as the classical
30
Introduction and overview
Figure 1.15. Classical circuit implementing a
gate using a Toffoli gate. The top two bits represent the input
, while the third bit is prepared in the standard state 1, sometimes known as an ancilla state. The
to the
is on the third bit.
output from the
Figure 1.16.
with the Toffoli gate, with the second bit being the input to the
(and the other two
bits standard ancilla states), and the output from
appearing on the second and third bits.
Toffoli gate was, and ensures that quantum computers are capable of performing any
computation which a classical (deterministic) computer may do.
What if the classical computer is non-deterministic, that is, has the ability to generate
random bits to be used in the computation? Not surprisingly, it is easy for a quantum
computer to simulate this. To perform such a simulation it turns out to be sufficient to
produce random fair coin tosses, which can be done by preparing
√ a qubit in the state
|0, sending it through a Hadamard gate to produce (|0 + |1)/ 2, and then measuring
the state. The result will be |0 or |1 with 50/50 probability. This provides a quantum
computer with the ability to efficiently simulate a non-deterministic classical computer.
Of course, if the ability to simulate classical computers were the only feature of quantum computers there would be little point in going to all the trouble of exploiting quantum
effects! The advantage of quantum computing is that much more powerful functions may
be computed using qubits and quantum gates. In the next few sections we explain how
to do this, culminating in the Deutsch–Jozsa algorithm, our first example of a quantum
algorithm able to solve a problem faster than any classical algorithm.
1.4.2 Quantum parallelism
Quantum parallelism is a fundamental feature of many quantum algorithms. Heuristically, and at the risk of over-simplifying, quantum parallelism allows quantum computers
to evaluate a function f (x) for many different values of x simultaneously. In this section
we explain how quantum parallelism works, and some of its limitations.
Suppose f (x) : {0, 1} → {0, 1} is a function with a one-bit domain and range. A
Quantum algorithms
31
convenient way of computing this function on a quantum computer is to consider a two
qubit quantum computer which starts in the state |x, y. With an appropriate sequence
of logic gates it is possible to transform this state into |x, y ⊕ f (x), where ⊕ indicates
addition modulo 2; the first register is called the ‘data’ register, and the second register the
‘target’ register. We give the transformation defined by the map |x, y → |x, y ⊕ f (x) a
name, Uf , and note that it is easily shown to be unitary. If y = 0, then the final state of the
second qubit is just the value f (x). (In Section 3.2.5 we show that given a classical circuit
for computing f there is a quantum circuit of comparable efficiency which computes the
transformation Uf on a quantum computer. For our purposes it can be considered to be
a black box.)
Figure 1.17. Quantum circuit for evaluating f (0) and f (1) simultaneously. Uf is the quantum circuit which takes
inputs like |x, y to |x, y ⊕ f (x).
Consider the circuit shown in Figure 1.17, which applies Uf to an input not in the
computational
basis. Instead, the data register is prepared in the superposition (|0 +
√
|1)/ 2, which can be created with a Hadamard gate acting on |0. Then we apply Uf ,
resulting in the state:
|0, f (0) + |1, f (1)
√
.
2
(1.37)
This is a remarkable state! The different terms contain information about both f (0) and
f (1); it is almost as if we have evaluated f (x) for two values of x simultaneously, a feature
known as ‘quantum parallelism’. Unlike classical parallelism, where multiple circuits each
built to compute f (x) are executed simultaneously, here a single f (x) circuit is employed
to evaluate the function for multiple values of x simultaneously, by exploiting the ability
of a quantum computer to be in superpositions of different states.
This procedure can easily be generalized to functions on an arbitrary number of bits, by
using a general operation known as the Hadamard transform, or sometimes the Walsh–
Hadamard transform. This operation is just n Hadamard gates acting in parallel on n
qubits. For example, shown in Figure 1.18 is the case n = 2 with qubits initially prepared
as |0, which gives
|0 + |1
√
2
|0 + |1
√
2
=
|00 + |01 + |10 + |11
2
(1.38)
as output. We write H ⊗2 to denote the parallel action of two Hadamard gates, and read
‘⊗’ as ‘tensor’. More generally, the result of performing the Hadamard transform on n
32
Introduction and overview
qubits initially in the all |0 state is
1
√
2n
x
|x ,
(1.39)
where the sum is over all possible values of x, and we write H ⊗n to denote this action.
That is, the Hadamard transform produces an equal superposition of all computational
basis states. Moreover, it does this extremely efficiently, producing a superposition of 2n
states using just n gates.
Figure 1.18. The Hadamard transform H ⊗2 on two qubits.
Quantum parallel evaluation of a function with an n bit input x and 1 bit output, f (x),
can thus be performed in the following manner. Prepare the n + 1 qubit state |0⊗n |0,
then apply the Hadamard transform to the first n qubits, followed by the quantum circuit
implementing Uf . This produces the state
1
√
2n
x
|x|f (x) .
(1.40)
In some sense, quantum parallelism enables all possible values of the function f to be
evaluated simultaneously, even though we apparently only evaluated f once. However,
this parallelism is not immediately useful. In our single qubit example, measurement of the
state gives only either |0, f (0) or |1, f (1)! Similarly, in the general case, measurement of
the state x |x, f (x) would give only f (x) for a single value of x. Of course, a classical
computer can do this easily! Quantum computation requires something more than just
quantum parallelism to be useful; it requires the ability to extract information about more
than one value of f (x) from superposition states like x |x, f (x). Over the next two
sections we investigate examples of how this may be done.
1.4.3 Deutsch’s algorithm
A simple modification of the circuit in Figure 1.17 demonstrates how quantum circuits
can outperform classical ones by implementing Deutsch’s algorithm (we actually present
a simplified and improved version of the original algorithm; see ‘History and further
reading’ at the end of the chapter). Deutsch’s algorithm combines quantum parallelism
with a property of quantum mechanics known as interference. As before, let
√ us use the
Hadamard gate to prepare the first qubit as the superposition (|0
√ + |1)/ 2, but now
let us prepare the second qubit y as the superposition (|0 − |1)/ 2, using a Hadamard
gate applied to the state |1. Let us follow the states along to see what happens in this
circuit, shown in Figure 1.19.
The input state
|ψ0 = |01
(1.41)
Quantum algorithms
33
Figure 1.19. Quantum circuit implementing Deutsch’s algorithm.
is sent through two Hadamard gates to give
|0 + |1 |0 − |1
√
√
|ψ1 =
.
2
2
(1.42)
√
A little thought shows that if we√apply Uf to the state |x(|0 − |1)/ 2 then we obtain
the state (−1)f (x) |x(|0 − |1)/ 2. Applying Uf to |ψ1 therefore leaves us with one of
two possibilities:
⎧
|0 + |1 |0 − |1
⎪
⎪
√
√
if f (0) = f (1)
⎪ ±
⎪
⎨
2
2
(1.43)
|ψ2 =
⎪
⎪
|0
−
|1
|0
−
|1
⎪
⎪
√
√
if f (0) = f (1).
⎩ ±
2
2
The final Hadamard gate on the first qubit thus gives us
⎧
|0 − |1
⎪
⎪
√
if f (0) = f (1)
±|0
⎪
⎪
⎨
2
|ψ3 =
⎪
⎪
|0 − |1
⎪
⎪
√
if f (0) = f (1).
⎩ ±|1
2
(1.44)
Realizing that f (0) ⊕ f (1) is 0 if f (0) = f (1) and 1 otherwise, we can rewrite this result
concisely as
|0 − |1
√
|ψ3 = ±|f (0) ⊕ f (1)
,
(1.45)
2
so by measuring the first qubit we may determine f (0) ⊕ f (1). This is very interesting
indeed: the quantum circuit has given us the ability to determine a global property of
f (x), namely f (0)⊕ f (1), using only one evaluation of f (x)! This is faster than is possible
with a classical apparatus, which would require at least two evaluations.
This example highlights the difference between quantum parallelism and classical
randomized algorithms. Naively, one might think that the state |0|f (0) + |1|f (1)
corresponds rather closely to a probabilistic classical computer that evaluates f (0) with
probability one-half, or f (1) with probability one-half. The difference is that in a classical
computer these two alternatives forever exclude one another; in a quantum computer it is
34
Introduction and overview
possible for the two alternatives to interfere with one another to yield some global property
of the function f , by using something like the Hadamard gate to recombine the different
alternatives, as was done in Deutsch’s algorithm. The essence of the design of many
quantum algorithms is that a clever choice of function and final transformation allows
efficient determination of useful global information about the function – information
which cannot be attained quickly on a classical computer.
1.4.4 The Deutsch–Jozsa algorithm
Deutsch’s algorithm is a simple case of a more general quantum algorithm, which we shall
refer to as the Deutsch–Jozsa algorithm. The application, known as Deutsch’s problem,
may be described as the following game. Alice, in Amsterdam, selects a number x from
0 to 2n − 1, and mails it in a letter to Bob, in Boston. Bob calculates some function
f (x) and replies with the result, which is either 0 or 1. Now, Bob has promised to use
a function f which is of one of two kinds; either f (x) is constant for all values of x,
or else f (x) is balanced, that is, equal to 1 for exactly half of all the possible x, and 0
for the other half. Alice’s goal is to determine with certainty whether Bob has chosen a
constant or a balanced function, corresponding with him as little as possible. How fast
can she succeed?
In the classical case, Alice may only send Bob one value of x in each letter. At worst,
Alice will need to query Bob at least 2n /2 + 1 times, since she may receive 2n /2 0s before
finally getting a 1, telling her that Bob’s function is balanced. The best deterministic
classical algorithm she can use therefore requires 2n /2 + 1 queries. Note that in each
letter, Alice sends Bob n bits of information. Furthermore, in this example, physical
distance is being used to artificially elevate the cost of calculating f (x), but this is not
needed in the general problem, where f (x) may be inherently difficult to calculate.
If Bob and Alice were able to exchange qubits, instead of just classical bits, and if Bob
agreed to calculate f (x) using a unitary transform Uf , then Alice could achieve her goal
in just one correspondence with Bob, using the following algorithm.
Analogously to Deutsch’s algorithm, Alice has an n qubit register to store her query
in, and a single qubit register which she will give to Bob, to store the answer in. She
begins by preparing both her query and answer registers in a superposition state. Bob
will evaluate f (x) using quantum parallelism and leave the result in the answer register.
Alice then interferes states in the superposition using a Hadamard transform on the query
register, and finishes by performing a suitable measurement to determine whether f was
constant or balanced.
The specific steps of the algorithm are depicted in Figure 1.20. Let us follow the states
through this circuit. The input state
|ψ0 = |0⊗n |1
(1.46)
is similar to that of Equation (1.41), but here the query register describes the state of n
qubits all prepared in the |0 state. After the Hadamard transform on the query register
and the Hadamard gate on the answer register we have
|x |0 − |1
√
√
|ψ1 =
.
(1.47)
2n
2
x∈{0,1}n
The query register is now a superposition of all values, and the answer register is in an
Quantum algorithms
35
Figure 1.20. Quantum circuit implementing the general Deutsch–Jozsa algorithm. The wire with a ‘/’ through it
represents a set of n qubits, similar to the common engineering notation.
evenly weighted superposition of 0 and 1. Next, the function f is evaluated (by Bob)
using Uf : |x, y → |x, y ⊕ f (x), giving
(−1)f (x) |x |0 − |1
√
√
.
(1.48)
|ψ2 =
2n
2
x
Alice now has a set of qubits in which the result of Bob’s function evaluation is stored
in the amplitude of the qubit superposition state. She now interferes terms in the superposition using a Hadamard transform on the query register. To determine the result of
the Hadamard transform it helps to first calculate the effect of the Hadamard transform
on a state |x. By checking the
√ cases x = 0 and x = 1 separately we see that for a single
qubit H|x = z (−1)xz |z/ 2. Thus
x1 z1 +·· +xn zn
|z1 , . . . , zn
z1 ,...,zn (−1)
⊗n
√
.
(1.49)
H |x1 , . . . , xn =
2n
This can be summarized more succinctly in the very useful equation
(−1)x·z |z
,
H ⊗n |x = z √
2n
(1.50)
where x · z is the bitwise inner product of x and z, modulo 2. Using this equation
and (1.48) we can now evaluate |ψ3 ,
(−1)x·z+f (x) |z |0 − |1
√
|ψ3 =
.
(1.51)
2n
2
z
x
Alice now observes the query register. Note that the amplitude for the state |0⊗n is
f (x)
/2n . Let’s look at the two possible cases – f constant and f balanced – to
x (−1)
discern what happens. In the case where f is constant the amplitude for |0⊗n is +1 or
−1, depending on the constant value f (x) takes. Because |ψ3 is of unit length it follows
that all the other amplitudes must be zero, and an observation will yield 0s for all qubits
in the query register. If f is balanced then the positive and negative contributions to the
amplitude for |0⊗n cancel, leaving an amplitude of zero, and a measurement must yield
a result other than 0 on at least one qubit in the query register. Summarizing, if Alice
36
Introduction and overview
measures all 0s then the function is constant; otherwise the function is balanced. The
Deutsch–Jozsa algorithm is summarized below.
Algorithm: Deutsch–Jozsa
Inputs: (1) A black box Uf which performs the transformation
|x|y → |x|y ⊕ f (x), for x ∈ {0, . . . , 2n − 1} and f (x) ∈ {0, 1}. It is
promised that f (x) is either constant for all values of x, or else f (x) is balanced,
that is, equal to 1 for exactly half of all the possible x, and 0 for the other half.
Outputs: 0 if and only if f is constant.
Runtime: One evaluation of Uf . Always succeeds.
Procedure:
1.
2.
3.
4.
5.
|0⊗n |1
2n −1
1
|0 − |1
√
|x
→√
2n x=0
2
|0 − |1
√
→
(−1)f (x) |x
2
x
x·z+f (x)
(−1)
|z |0 − |1
√
√
→
2n
2
z
x
→z
initialize state
create superposition using
Hadamard gates
calculate function f using Uf
perform Hadamard transform
measure to obtain final output z
We’ve shown that a quantum computer can solve Deutsch’s problem with one evaluation of the function f compared to the classical requirement for 2n /2 + 1 evaluations.
This appears impressive, but there are several important caveats. First, Deutsch’s problem is not an especially important problem; it has no known applications. Second, the
comparison between classical and quantum algorithms is in some ways an apples and
oranges comparison, as the method for evaluating the function is quite different in the
two cases. Third, if Alice is allowed to use a probabilistic classical computer, then by
asking Bob to evaluate f (x) for a few randomly chosen x she can very quickly determine
with high probability whether f is constant or balanced. This probabilistic scenario is
perhaps more realistic than the deterministic scenario we have been considering. Despite
these caveats, the Deutsch–Jozsa algorithm contains the seeds for more impressive quantum algorithms, and it is enlightening to attempt to understand the principles behind its
operation.
Exercise 1.1: (Probabilistic classical algorithm) Suppose that the problem is not
to distinguish between the constant and balanced functions with certainty, but
rather, with some probability of error ǫ < 1/2. What is the performance of the
best classical algorithm for this problem?
1.4.5 Quantum algorithms summarized
The Deutsch–Jozsa algorithm suggests that quantum computers may be capable of solving
some computational problems much more efficiently than classical computers. Unfortunately, the problem it solves is of little practical interest. Are there more interesting
Quantum algorithms
37
problems whose solution may be obtained more efficiently using quantum algorithms?
What are the principles underlying such algorithms? What are the ultimate limits of a
quantum computer’s computational power?
Broadly speaking, there are three classes of quantum algorithms which provide an
advantage over known classical algorithms. First, there is the class of algorithms based
upon quantum versions of the Fourier transform, a tool which is also widely used in
classical algorithms. The Deutsch–Jozsa algorithm is an example of this type of algorithm, as are Shor’s algorithms for factoring and discrete logarithm. The second class
of algorithms is quantum search algorithms. The third class of algorithms is quantum
simulation, whereby a quantum computer is used to simulate a quantum system. We now
briefly describe each of these classes of algorithms, and then summarize what is known
or suspected about the computational power of quantum computers.
Quantum algorithms based upon the Fourier transform
The discrete Fourier transform is usually described as transforming a set x0 , . . . , xN −1
of N complex numbers into a set of complex numbers y0 , . . . , yN −1 defined by
1
yk ≡ √
N
N −1
e2πijk/N xj .
(1.52)
j=0
Of course, this transformation has an enormous number of applications in many branches
of science; the Fourier transformed version of a problem is often easier than the original
problem, enabling a solution.
The Fourier transform has proved so useful that a beautiful generalized theory of
Fourier transforms has been developed which goes beyond the definition (1.52). This
general theory involves some technical ideas from the character theory of finite groups,
and we will not attempt to describe it here. What is important is that the Hadamard
transform used in the Deutsch–Jozsa algorithm is an example of this generalized class
of Fourier transforms. Moreover, many of the other important quantum algorithms also
involve some type of Fourier transform.
The most important quantum algorithms known, Shor’s fast algorithms for factoring
and discrete logarithm, are two examples of algorithms based upon the Fourier transform defined in Equation (1.52). The Equation (1.52) does not appear terribly quantum
mechanical in the form we have written it. Imagine, however, that we define a linear
transformation U on n qubits by its action on computational basis states |j, where
0 ≤ j ≤ 2n − 1,
1
|j −→ √
2n
2n −1
k=0
n
e2πijk/2 |k .
(1.53)
It can be checked that this transformation is unitary, and in fact can be realized as a
quantum circuit. Moreover, if we write out its action on superpositions,
⎡
⎤
2n −1 2n −1
2n −1
2n −1
n
1
2πijk/2
⎣
xj |j −→ √
e
xj |k =
yk |k ,
(1.54)
2n k=0 j=0
j=0
k=0
we see that it corresponds to a vector notation for the Fourier transform (1.52) for the
case N = 2n .
38
Introduction and overview
How quickly can we perform the Fourier transform? Classically, the fast Fourier transform takes roughly N log(N ) = n2n steps to Fourier transform N = 2n numbers. On a
quantum computer, the Fourier transform can be accomplished using about log2 (N ) = n2
steps, an exponential saving! The quantum circuit to do this is explained in Chapter 5.
This result seems to indicate that quantum computers can be used to very quickly
compute the Fourier transform of a vector of 2n complex numbers, which would be
fantastically useful in a wide range of applications. However, that is not exactly the case;
the Fourier transform is being performed on the information ‘hidden’ in the amplitudes
of the quantum state. This information is not directly accessible to measurement. The
catch, of course, is that if the output state is measured, it will collapse each qubit into
the state |0 or |1, preventing us from learning the transform result yk directly. This
example speaks to the heart of the conundrum of devising a quantum algorithm. On the
one hand, we can perform certain calculations on the 2n amplitudes associated with n
qubits far more efficiently than would be possible on a classical computer. But on the
other hand, the results of such a calculation are not available to us if we go about it in
a straightforward manner. More cleverness is required in order to harness the power of
quantum computation.
Fortunately, it does turn out to be possible to utilize the quantum Fourier transform
to efficiently solve several problems that are believed to have no efficient solution on a
classical computer. These problems include Deutsch’s problem, and Shor’s algorithms for
discrete logarithm and factoring. This line of thought culminated in Kitaev’s discovery
of a method to solve the Abelian stabilizer problem, and the generalization to the hidden
subgroup problem,
Let f be a function from a finitely generated group G to a finite set X such that
f is constant on the cosets of a subgroup K, and distinct on each coset. Given a
quantum black box for performing the unitary transform U |g|h = |g|h⊕f (g),
for g ∈ G, h ∈ X, and ⊕ an appropriately chosen binary operation on X, find a
generating set for K.
The Deutsch–Jozsa algorithm, Shor’s algorithms, and related ‘exponentially fast’ quantum algorithms can all be viewed as special cases of this algorithm. The quantum Fourier
transform and its applications are described in Chapter 5.
Quantum search algorithms
A completely different class of algorithms is represented by the quantum search algorithm,
whose basic principles were discovered by Grover. The quantum search algorithm solves
the following problem: Given a search space of size N , and no prior knowledge about the
structure of the information in it, we want to find an element of that search space satisfying
a known property. How long does it take to find an element satisfying that property?
Classically, this problem requires approximately N operations,
but the quantum search
√
algorithm allows it to be solved using approximately N operations.
The quantum search algorithm offers only a quadratic speedup, as opposed to the more
impressive exponential speedup offered by algorithms based on the quantum Fourier
transform. However, the quantum search algorithm is still of great interest, since searching heuristics have a wider range of application than the problems solved using the quantum Fourier transform, and adaptations of the quantum search algorithm may have utility
Quantum algorithms
39
for a very wide range of problems. The quantum search algorithm and its applications
are described in Chapter 6.
Quantum simulation
Simulating naturally occurring quantum mechanical systems is an obvious candidate for
a task at which quantum computers may excel, yet which is believed to be difficult
on a classical computer. Classical computers have difficulty simulating general quantum
systems for much the same reasons they have difficulty simulating quantum computers –
the number of complex numbers needed to describe a quantum system generally grows
exponentially with the size of the system, rather than linearly, as occurs in classical
systems. In general, storing the quantum state of a system with n distinct components
takes something like cn bits of memory on a classical computer, where c is a constant
which depends upon details of the system being simulated, and the desired accuracy of
the simulation.
By contrast, a quantum computer can perform the simulation using kn qubits, where
k is again a constant which depends upon the details of the system being simulated. This
allows quantum computers to efficiently perform simulations of quantum mechanical
systems that are believed not to be efficiently simulatable on a classical computer. A
significant caveat is that even though a quantum computer can simulate many quantum
systems far more efficiently than a classical computer, this does not mean that the fast
simulation will allow the desired information about the quantum system to be obtained.
When measured, a kn qubit simulation will collapse into a definite state, giving only kn
bits of information; the cn bits of ‘hidden information’ in the wavefunction is not entirely
accessible. Thus, a crucial step in making quantum simulations useful is development of
systematic means by which desired answers can be efficiently extracted; how to do this
is only partially understood.
Despite this caveat, quantum simulation is likely to be an important application of
quantum computers. The simulation of quantum systems is an important problem in
many fields, notably quantum chemistry, where the computational constraints imposed
by classical computers make it difficult to accurately simulate the behavior of even moderately sized molecules, much less the very large molecules that occur in many important
biological systems. Obtaining faster and more accurate simulations of such systems may
therefore have the welcome effect of enabling advances in other fields in which quantum
phenomena are important.
In the future we may discover a physical phenomenon in Nature which cannot be
efficiently simulated on a quantum computer. Far from being bad news, this would be
wonderful! At the least, it will stimulate us to extend our models of computation to
encompass the new phenomenon, and increase the power of our computational models
beyond the existing quantum computing model. It also seems likely that very interesting
new physical effects will be associated with any such phenomenon!
Another application for quantum simulation is as a general method to obtain insight
into other quantum algorithms; for example, in Section 6.2 we explain how the quantum
search algorithm can be viewed as the solution to a problem of quantum simulation. By
approaching the problem in this fashion it becomes much easier to understand the origin
of the quantum search algorithm.
Finally, quantum simulation also gives rise to an interesting and optimistic ‘quantum
corollary’ to Moore’s law. Recall that Moore’s law states that the power of classical
40
Introduction and overview
computers will double once every two years or so, for constant cost. However, suppose
we are simulating a quantum system on a classical computer, and want to add a single
qubit (or a larger system) to the system being simulated. This doubles or more the
memory requirements needed for a classical computer to store a description of the state
of the quantum system, with a similar or greater cost in the time needed to simulate the
dynamics. The quantum corollary to Moore’s law follows from this observation, stating
that quantum computers are keeping pace with classical computers provided a single
qubit is added to the quantum computer every two years. This corollary should not be
taken too seriously, as the exact nature of the gain, if any, of quantum computation over
classical is not yet clear. Nevertheless, this heuristic statement helps convey why we
should be interested in quantum computers, and hopeful that they will one day be able
to outperform the most powerful classical computers, at least for some applications.
The power of quantum computation
How powerful are quantum computers? What gives them their power? Nobody yet knows
the answers to these questions, despite the suspicions fostered by examples such as factoring, which strongly suggest that quantum computers are more powerful than classical
computers. It is still possible that quantum computers are no more powerful than classical
computers, in the sense that any problem which can be efficiently solved on a quantum
computer can also be efficiently solved on a classical computer. On the other hand, it
may eventually be proved that quantum computers are much more powerful than classical computers. We now take a brief look at what is known about the power of quantum
computation.
Computational complexity theory is the subject of classifying the difficulty of various computational problems, both classical and quantum, and to understand the power of
quantum computers we will first examine some general ideas from computational complexity. The most basic idea is that of a complexity class. A complexity class can be
thought of as a collection of computational problems, all of which share some common
feature with respect to the computational resources needed to solve those problems.
Two of the most important complexity classes go by the names P and NP. Roughly
speaking, P is the class of computational problems that can be solved quickly on a classical
computer. NP is the class of problems which have solutions which can be quickly checked
on a classical computer. To understand the distinction between P and NP, consider the
problem of finding the prime factors of an integer, n. So far as is known there is no fast
way of solving this problem on a classical computer, which suggests that the problem is
not in P. On the other hand, if somebody tells you that some number p is a factor of
n, then we can quickly check that this is correct by dividing p into n, so factoring is a
problem in NP.
It is clear that P is a subset of NP, since the ability to solve a problem implies the ability
to check potential solutions. What is not so clear is whether or not there are problems
in NP that are not in P. Perhaps the most important unsolved problem in theoretical
computer science is to determine whether these two classes are different:
?
P = NP .
(1.55)
Most researchers believe that NP contains problems that are not in P. In particular,
there is an important subclass of the NP problems, the NP-complete problems, that are
Quantum algorithms
41
of especial importance for two reasons. First, there are thousands of problems, many
highly important, that are known to be NP-complete. Second, any given NP-complete
problem is in some sense ‘at least as hard’ as all other problems in NP. More precisely,
an algorithm to solve a specific NP-complete problem can be adapted to solve any other
problem in NP, with a small overhead. In particular, if P = NP, then it will follow that
no NP-complete problem can be efficiently solved on a classical computer.
It is not known whether quantum computers can be used to quickly solve all the
problems in NP, despite the fact that they can be used to solve some problems – like
factoring – which are believed by many people to be in NP but not in P. (Note that
factoring is not known to be NP-complete, otherwise we would already know how to
efficiently solve all problems in NP using quantum computers.) It would certainly be
very exciting if it were possible to solve all the problems in NP efficiently on a quantum
computer. There is a very interesting negative result known in this direction which
rules out using a simple variant of quantum parallelism to solve all the problems in
NP. Specifically, one approach to the problem of solving problems in NP on a quantum
computer is to try to use some form of quantum parallelism to search in parallel through
all the possible solutions to the problem. In Section 6.6 we will show that no approach
based upon such a search-based methodology can yield an efficient solution to all the
problems in NP. While it is disappointing that this approach fails, it does not rule out
that some deeper structure exists in the problems in NP that will allow them all to be
solved quickly using a quantum computer.
P and NP are just two of a plethora of complexity classes that have been defined.
Another important complexity class is PSPACE. Roughly speaking, PSPACE consists
of those problems which can be solved using resources which are few in spatial size (that
is, the computer is ‘small’), but not necessarily in time (‘long’ computations are fine).
PSPACE is believed to be strictly larger than both P and NP although, again, this has
never been proved. Finally, the complexity class BPP is the class of problems that can be
solved using randomized algorithms in polynomial time, if a bounded probability of error
(say 1/4) is allowed in the solution to the problem. BPP is widely regarded as being, even
more so than P, the class of problems which should be considered efficiently soluble on
a classical computer. We have elected to concentrate here on P rather than BPP because
P has been studied in more depth, however many similar ideas and conclusions arise in
connection with BPP.
What of quantum complexity classes? We can define BQP to be the class of all computational problems which can be solved efficiently on a quantum computer, where a
bounded probability of error is allowed. (Strictly speaking this makes BQP more analogous to the classical complexity class BPP than to P, however we will ignore this subtlety
for the purposes of the present discussion, and treat it as the analogue of P.) Exactly
where BQP fits with respect to P, NP and PSPACE is as yet unknown. What is known
is that quantum computers can solve all the problems in P efficiently, but that there
are no problems outside of PSPACE which they can solve efficiently. Therefore, BQP
lies somewhere between P and PSPACE, as illustrated in Figure 1.21. An important
implication is that if it is proved that quantum computers are strictly more powerful than
classical computers, then it will follow that P is not equal to PSPACE. Proving this latter
result has been attempted without success by many computer scientists, suggesting that
it may be non-trivial to prove that quantum computers are more powerful than classical
computers, despite much evidence in favor of this proposition.
42
Introduction and overview
Figure 1.21. The relationship between classical and quantum complexity classes. Quantum computers can quickly
solve any problem in P, and it is known that they can’t solve problems outside of PSPACE quickly. Where
quantum computers fit between P and PSPACE is not known, in part because we don’t even know whether
PSPACE is bigger than P!
We won’t speculate further on the ultimate power of quantum computation now,
preferring to wait until after we have better understood the principles on which fast
quantum algorithms are based, a topic which occupies us for most of Part II of this
book. What is already clear is that the theory of quantum computation poses interesting
and significant challenges to the traditional notions of computation. What makes this an
important challenge is that the theoretical model of quantum computation is believed
to be experimentally realizable, because – to the best of our knowledge – this theory is
consistent with the way Nature works. If this were not so then quantum computation
would be just another mathematical curiosity.
1.5 Experimental quantum information processing
Quantum computation and quantum information is a wonderful theoretical discovery,
but its central concepts, such as superpositions and entanglement, run counter to the
intuition we garner from the everyday world around us. What evidence do we have that
these ideas truly describe how Nature operates? Will the realization of large-scale quantum
Experimental quantum information processing
43
computers be experimentally feasible? Or might there be some principle of physics which
fundamentally prohibits their eventual scaling? In the next two sections we address these
questions. We begin with a review of the famous ‘Stern–Gerlach’ experiment, which
provides evidence for the existence of qubits in Nature. We then widen our scope,
addressing the broader problem of how to build practical quantum information processing
systems.
1.5.1 The Stern–Gerlach experiment
The qubit is a fundamental element for quantum computation and quantum information.
How do we know that systems with the properties of qubits exist in Nature? At the time
of writing there is an enormous amount of evidence that this is so, but in the early days
of quantum mechanics the qubit structure was not at all obvious, and people struggled
with phenomena that we may now understand in terms of qubits, that is, in terms of two
level quantum systems.
A decisive (and very famous) early experiment indicating the qubit structure was
conceived by Stern in 1921 and performed with Gerlach in 1922 in Frankfurt. In the
original Stern–Gerlach experiment, hot atoms were ‘beamed’ from an oven through a
magnetic field which caused the atoms to be deflected, and then the position of each atom
was recorded, as illustrated in Figure 1.22. The original experiment was done with silver
atoms, which have a complicated structure that obscures the effects we are discussing.
What we describe below actually follows a 1927 experiment done using hydrogen atoms.
The same basic effect is observed, but with hydrogen atoms the discussion is easier
to follow. Keep in mind, though, that this privilege wasn’t available to people in the
early 1920s, and they had to be very ingenious to think up explanations for the more
complicated effects they observed.
Hydrogen atoms contain a proton and an orbiting electron. You can think of this electron as a little ‘electric current’ around the proton. This electric current causes the atom
to have a magnetic field; each atom has what physicists call a ‘magnetic dipole moment’.
As a result each atom behaves like a little bar magnet with an axis corresponding to the
axis the electron is spinning around. Throwing little bar magnets through a magnetic field
causes the magnets to be deflected by the field, and we expect to see a similar deflection
of atoms in the Stern–Gerlach experiment.
How the atom is deflected depends upon both the atom’s magnetic dipole moment –
the axis the electron is spinning around – and the magnetic field generated by the Stern–
Gerlach device. We won’t go through the details, but suffice to say that by constructing
the Stern–Gerlach device appropriately, we can cause the atom to be deflected by an
amount that depends upon the ẑ component of the atom’s magnetic dipole moment,
where ẑ is some fixed external axis.
Two major surprises emerge when this experiment is performed. First, since the
hot atoms exiting the oven would naturally be expected to have their dipoles oriented
randomly in every direction, it would follow that there would be a continuous distribution
of atoms seen at all angles exiting from the Stern–Gerlach device. Instead, what is seen
is atoms emerging from a discrete set of angles. Physicists were able to explain this by
assuming that the magnetic dipole moment of the atoms is quantized, that is, comes in
discrete multiples of some fundamental amount.
This observation of quantization in the Stern–Gerlach experiment was surprising to
physicists of the 1920s, but not completely astonishing because evidence for quantization
44
Introduction and overview
effects in other systems was becoming widespread at that time. What was truly surprising was the number of peaks seen in the experiment. The hydrogen atoms being used
were such that they should have had zero magnetic dipole moment. Classically, this is
surprising in itself, since it corresponds to no orbital motion of the electron, but based
on what was known of quantum mechanics at that time this was an acceptable notion.
Since the hydrogen atoms would therefore have zero magnetic moment, it was expected
that only one beam of atoms would be seen, and this beam would not be deflected by
the magnetic field. Instead, two beams were seen, one deflected up by the magnetic field,
and the other deflected down!
This puzzling doubling was explained after considerable effort by positing that the
electron in the hydrogen atom has associated with it a quantity called spin. This spin
is not in any way associated to the usual rotational motion of the electron around the
proton; it is an entirely new quantity to be associated with an electron. The great physicist
Heisenberg labeled the idea ‘brave’ at the time it was suggested, and it is a brave idea, since
it introduces an essentially new physical quantity into Nature. The spin of the electron
is posited to make an extra contribution to the magnetic dipole moment of a hydrogen
atom, in addition to the contribution due to the rotational motion of the electron.
/
/
/
Figure 1.22. Abstract schematic of the Stern–Gerlach experiment. Hot hydrogen atoms are beamed from an oven
through a magnetic field, causing a deflection either up (| + Z) or down (| − Z).
What is the proper description of the spin of the electron? As a first guess, we might
hypothesize that the spin is specified by a single bit, telling the hydrogen atom to go up or
down. Additional experimental results provide further useful information to determine if
this guess needs refinement or replacement. Let’s represent the original Stern–Gerlach
apparatus as shown in Figure 1.22. Its outputs are two beams of atoms, which we shall
call | + Z and | − Z. (We’re using suggestive notation which looks quantum mechanical,
but of course you’re free to use whatever notation you prefer.) Now suppose we cascade
two Stern–Gerlach apparatus together, as shown in Figure 1.23. We arrange it so that the
second apparatus is tipped sideways, so the magnetic field deflects atoms along the x̂ axis.
In our thought-experiment we’ll block off the | − Z output from the first Stern–Gerlach
apparatus, while the | + Z output is sent through a second apparatus oriented along the
x̂ axis. A detector is placed at the final output to measure the distribution of atoms along
the x̂ axis.
A classical magnetic dipole pointed in the +ẑ direction has no net magnetic moment
in the x̂ direction, so we might expect that the final output would have one central peak.
However, experimentally it is observed that there are two peaks of equal intensity! So
perhaps these atoms are peculiar, and have definite magnetic moments along each axis,
independently. That is, maybe each atom passing through the second apparatus can be
Experimental quantum information processing
/
45
/
/
/
/
Figure 1.23. Cascaded Stern–Gerlach measurements.
described as being in a state we might write as | + Z| + X or | + Z| − X, to indicate
the two values for spin that might be observed.
/
/
/
/
/
/
/
Figure 1.24. Three stage cascaded Stern–Gerlach measurements.
Another experiment, shown in Figure 1.24, can test this hypothesis by sending one
beam of the previous output through a second ẑ oriented Stern–Gerlach apparatus. If
the atoms had retained their | + Z orientation, then the output would be expected to
have only one peak, at the | + Z output. However, again two beams are observed at
the final output, of equal intensity. Thus, the conclusion would seem to be that contrary
to classical expectations, a | + Z state consists of equal portions of | + X and | − X
states, and a | + X state consists of equal portions of | + Z and | − Z states. Similar
conclusions can be reached if the Stern–Gerlach apparatus is aligned along some other
axis, like the ŷ axis.
The qubit model provides a simple explanation of this experimentally observed behavior. Let |0 and |1 be the states of a qubit, and make the assignments
| + Z ← |0
| − Z ← |1
(1.56)
√
| + X ← (|0 + |1)/ 2.
√
| − X ← (|0 − |1)/ 2
(1.57)
(1.58)
(1.59)
Then the results of the cascaded Stern–Gerlach experiment can be explained by assuming
that the ẑ Stern–Gerlach apparatus measures the spin (that is, the qubit) in the computational basis |0, |1, and
apparatus measures the spin with respect to
√
√ the x̂ Stern–Gerlach
the basis (|0 + |1)/ 2, (|0 − |1)/ 2. For example, in the cascaded ẑ-x̂-ẑ experiment,
√
if we assume that the spins are in the state | + Z = |0 = (| + X + | − X)/ 2 after
exiting the first Stern–Gerlach experiment, then the probability for obtaining | + X
out of the second apparatus is 1/2, and the probability for | − X is 1/2. Similarly, the
probability for obtaining | + Z out of the third apparatus is 1/2. A qubit model thus
properly predicts results from this type of cascaded Stern–Gerlach experiment.
46
Introduction and overview
This example demonstrates how qubits could be a believable way of modeling systems
in Nature. Of course it doesn’t establish beyond all doubt that the qubit model is the
correct way of understanding electron spin – far more experimental corroboration is
required. Nevertheless, because of many experiments like these, we now believe that
electron spin is best described by the qubit model. What is more, we believe that the
qubit model (and generalizations of it to higher dimensions; quantum mechanics, in other
words) is capable of describing every physical system. We now turn to the question of
what systems are especially well adapted to quantum information processing.
1.5.2 Prospects for practical quantum information processing
Building quantum information processing devices is a great challenge for scientists and
engineers of the third millennium. Will we rise to meet this challenge? Is it possible at
all? Is it worth attempting? If so, how might the feat be accomplished? These are difficult
and important questions, to which we essay brief answers in this section, to be expanded
upon throughout the book.
The most fundamental question is whether there is any point of principle that prohibits
us from doing one or more forms of quantum information processing? Two possible
obstructions suggest themselves: that noise may place a fundamental barrier to useful
quantum information processing; or that quantum mechanics may fail to be correct.
Noise is without a doubt a significant obstruction to the development of practical
quantum information processing devices. Is it a fundamentally irremovable obstruction
that will forever prevent the development of large-scale quantum information processing devices? The theory of quantum error-correcting codes strongly suggests that while
quantum noise is a practical problem that needs to be addressed, it does not present a
fundamental problem of principle. In particular, there is a threshold theorem for quantum computation, which states, roughly speaking, that provided the level of noise in a
quantum computer can be reduced below a certain constant ‘threshold’ value, quantum
error-correcting codes can be used to push it down even further, essentially ad infinitum, for a small overhead in the complexity of the computation. The threshold theorem
makes some broad assumptions about the nature and magnitude of the noise occurring in
a quantum computer, and the architecture available for performing quantum computation; however, provided those assumptions are satisfied, the effects of noise can be made
essentially negligible for quantum information processing. Chapters 8, 10 and 12 discuss
quantum noise, quantum error-correction and the threshold theorem in detail.
A second possibility that may preclude quantum information processing is if quantum mechanics is incorrect. Indeed, probing the validity of quantum mechanics (both
relativistic and non-relativistic) is one reason for being interested in building quantum
information processing devices. Never before have we explored a regime of Nature in
which complete control has been obtained over large-scale quantum systems, and perhaps
Nature may reveal some new surprises in this regime which are not adequately explained
by quantum mechanics. If this occurs, it will be a momentous discovery in the history of
science, and can be expected to have considerable consequences in other areas of science
and technology, as did the discovery of quantum mechanics. Such a discovery might also
impact quantum computation and quantum information; however, whether the impact
would enhance, detract or not affect the power of quantum information processing cannot be predicted in advance. Until and unless such effects are found we have no way of
knowing how they might affect information processing, so for the remainder of this book
Experimental quantum information processing
47
we go with all the evidence to date and assume that quantum mechanics is a complete
and correct description of the world.
Given that there is no fundamental obstacle to building quantum information processing devices, why should we invest enormous amounts of time and money in the attempt
to do so? We have already discussed several reasons for wanting to do so: practical applications such as quantum cryptography and the factoring of large composite numbers; and
the desire to obtain fundamental insights into Nature and into information processing.
These are good reasons, and justify a considerable investment of time and money in
the effort to build quantum information processing devices. However, it is fair to say that
a clearer picture of the relative power of quantum and classical information processing is
needed in order to assess their relative merits. To obtain such a picture requires further
theoretical work on the foundations of quantum computation and quantum information.
Of particular interest is a decisive answer to the question ‘Are quantum computers more
powerful than classical computers?’ Even if the answer to such a question eludes us for
the time being, it would be useful to have a clear path of interesting applications at
varying levels of complexity to aid researchers aiming to experimentally realize quantum
information processing. Historically, the advance of technology is often hastened by the
use of short- to medium-term incentives as a stepping-stone to long-term goals. Consider
that microprocessors were initially used as controllers for elevators and other simple
devices, before graduating to be the fundamental component in personal computers (and
then on to who-knows-what). Below we sketch out a path of short- to medium-term goals
for people interested in achieving the long-term goal of large-scale quantum information
processing.
Surprisingly many small-scale applications of quantum computation and quantum information are known. Not all are as flashy as cousins like the quantum factoring algorithm,
but the relative ease of implementing small-scale applications makes them extremely important as medium-term goals in themselves.
Quantum state tomography and quantum process tomography are two elementary
processes whose perfection is of great importance to quantum computation and quantum
information, as well as being of independent interest in their own right. Quantum state
tomography is a method for determining the quantum state of a system. To do this, it
has to overcome the ‘hidden’ nature of the quantum state – remember, the state can’t be
directly determined by a measurement – by performing repeated preparations of the same
quantum state, which is then measured in different ways in order to build up a complete
description of the quantum state. Quantum process tomography is a more ambitious (but
closely related) procedure to completely characterize the dynamics of a quantum system.
Quantum process tomography can, for example, be used to characterize the performance
of an alleged quantum gate or quantum communications channel, or to determine the
types and magnitudes of different noise processes in a system. Beside obvious applications to quantum computation and quantum information, quantum process tomography
can be expected to have significant applications as a diagnostic tool to aid in the evaluation and improvement of primitive operations in any field of science and technology
where quantum effects are important. Quantum state tomography and quantum process
tomography are described in more detail in Chapter 8.
Various small-scale communications primitives are also of great interest. We have already mentioned quantum cryptography and quantum teleportation. The former is likely
to be useful in practical applications involving the distribution of a small amount of key
48
Introduction and overview
material that needs to be highly secure. The uses of quantum teleportation are perhaps
more open to question. We will see in Chapter 12 that teleportation may be an extremely
useful primitive for transmitting quantum states between distant nodes in a network, in
the presence of noise. The idea is to focus one’s efforts on distributing EPR pairs between
the nodes that wish to communicate. The EPR pairs may be corrupted during communication, but special ‘entanglement distillation’ protocols can then be used to ‘clean up’
the EPR pairs, enabling them to be used to teleport quantum states from one location
to another. In fact, procotols based upon entanglement distillation and teleportation offer performance superior to more conventional quantum error-correction techniques in
enabling noise free communication of qubits.
What of the medium-scale? A promising medium-scale application of quantum information processing is to the simulation of quantum systems. To simulate a quantum
system containing even a few dozen ‘qubits’ (or the equivalent in terms of some other
basic system) strains the resources of even the largest supercomputers. A simple calculation is instructive. Suppose we have a system containing 50 qubits. To describe the
state of such a system requires 250 ≈ 1015 complex amplitudes. If the amplitudes are
stored to 128 bits of precision, then it requires 256 bits or 32 bytes in order to store each
amplitude, for a total of 32 × 1015 bytes of information, or about 32 thousand terabytes
of information, well beyond the capacity of existing computers, and corresponding to
about the storage capacity that might be expected to appear in supercomputers during
the second decade of the twenty-first century, presuming that Moore’s law continues on
schedule. 90 qubits at the same level of precision requires 32 × 1027 bytes, which, even
if implemented using single atoms to represent bits, would require kilograms (or more)
of matter.
How useful will quantum simulations be? It seems likely that conventional methods will
still be used to determine elementary properties of materials, such as bond strengths and
basic spectroscopic properties. However, once the basic properties are well understood,
it seems likely that quantum simulation will be of great utility as a laboratory for the
design and testing of properties of novel molecules. In a conventional laboratory setup,
many different types of ‘hardware’ – chemicals, detectors, and so on – may be required
to test a wide variety of possible designs for a molecule. On a quantum computer, these
different types of hardware can all be simulated in software, which is likely to be much
less expensive and much faster. Of course, final design and testing must be performed
with real physical systems; however, quantum computers may enable a much larger range
of potential designs to be explored and evaluated en route to a better final design. It is
interesting to note that such ab initio calculations to aid in the design of new molecules
have been attempted on classical computers; however, they have met with limited success
due to the enormous computational resources needed to simulate quantum mechanics on a
classical computer. Quantum computers should be able to do much better in the relatively
near future.
What of large-scale applications? Aside from scaling up applications like quantum
simulation and quantum cryptography, relatively few large-scale applications are known:
the factoring of large numbers, taking discrete logarithms, and quantum searching. Interest in the first two of these derives mainly from the negative effect they would have
of limiting the viability of existing public key cryptographic systems. (They might also
be of substantial practical interest to mathematicians interested in these problems simply for their own sake.) So it does not seem likely that factoring and discrete logarithm
Experimental quantum information processing
49
will be all that important as applications for the long run. Quantum searching may be
of tremendous use because of the wide utility of the search heuristic, and we discuss
some possible applications in Chapter 6. What would really be superb are many more
large-scale applications of quantum information processing. This is a great goal for the
future!
Given a path of potential applications for quantum information processing, how can it
be achieved in real physical systems? At the small scale of a few qubits there are already
several working proposals for quantum information processing devices. Perhaps the easiest
to realize are based upon optical techniques, that is, electromagnetic radiation. Simple
devices like mirrors and beamsplitters can be used to do elementary manipulations of
photons. Interestingly, a major difficulty has been producing single photons on demand;
experimentalists have instead opted to use schemes which produce single photons ‘every
now and then’, at random, and wait for such an event to occur. Quantum cryptography,
superdense coding, and quantum teleportation have all been realized using such optical
techniques. A major advantage of the optical techniques is that photons tend to be highly
stable carriers of quantum mechanical information. A major disadvantage is that photons
don’t directly interact with one another. Instead, the interaction has to be mediated by
something else, like an atom, which introduces additional noise and complications into
the experiment. An effective interaction between two photons is set up, which essentially
works in two steps: photon number one interacts with the atom, which in turn interacts
with the second photon, causing an overall interaction between the two photons.
An alternative scheme is based upon methods for trapping different types of atom: there
is the ion trap, in which a small number of charged atoms are trapped in a confined space;
and neutral atom traps, for trapping uncharged atoms in a confined space. Quantum
information processing schemes based upon atom traps use the atoms to store qubits.
Electromagnetic radiation also shows up in these schemes, but in a rather different way
than in what we referred to as the ‘optical’ approach to quantum information processing.
In these schemes, photons are used to manipulate the information stored in the atoms
themselves, rather than as the place the information is stored. Single qubit quantum
gates can be performed by applying appropriate pulses of electromagnetic radiation to
individual atoms. Neighboring atoms can interact with one another via (for example)
dipole forces that enable quantum gates to be accomplished. Moreover, the exact nature of
the interaction between neighboring atoms can be modified by applying appropriate pulses
of electromagnetic radiation to the atoms, giving the experimentalist control over what
gates are performed in the system. Finally, quantum measurement can be accomplished in
these systems using the long established quantum jumps technique, which implements
with superb accuracy the measurements in the computational basis used for quantum
computation.
Another class of quantum information processing schemes is based upon Nuclear
Magnetic Resonance, often known by its initials, NMR. These schemes store quantum
information in the nuclear spin of atoms in a molecule, and manipulate that information
using electromagnetic radiation. Such schemes pose special difficulties, because in NMR
it is not possible to directly access individual nuclei. Instead, a huge number (typically
around 1015 ) of essentially identical molecules are stored in solution. Electromagnetic
pulses are applied to the sample, causing each molecule to respond in roughly the same
way. You should think of each molecule as being an independent computer, and the
sample as containing a huge number of computers all running in parallel (classically).
50
Introduction and overview
NMR quantum information processing faces three special difficulties that make it rather
different from other quantum information processing schemes. First, the molecules are
typically prepared by letting them equilibrate at room temperature, which is so much
higher than typical spin flip energies that the spins become nearly completely randomly
oriented. This fact makes the initial state rather more ‘noisy’ than is desirable for quantum
information processing. How this noise may be overcome is an interesting story that we
tell in Chapter 7. A second problem is that the class of measurements that may be
performed in NMR falls well short of the most general measurements we would like to
perform in quantum information processing. Nevertheless, for many instances of quantum
information processing the class of measurements allowed in NMR is sufficient. Third,
because molecules cannot be individually addressed in NMR you might ask how it is that
individual qubits can be manipulated in an appropriate way. Fortunately, different nuclei
in the molecule can have different properties that allow them to be individually addressed
– or at least addressed at a sufficiently fine-grained scale to allow the operations essential
for quantum computation.
Many of the elements required to perform large-scale quantum information processing
can be found in existing proposals: superb state preparation and quantum measurements
can be performed on a small number of qubits in the ion trap; superb dynamics can be
performed in small molecules using NMR; fabrication technology in solid state systems
allows designs to be scaled up tremendously. A single system having all these elements
would be a long way down the road to a dream quantum computer. Unfortunately, all
these systems are very different, and we are many, many years from having large-scale
quantum computers. However, we believe that the existence of all these properties in
existing (albeit different) systems does bode well for the long-term existence of largescale quantum information processors. Furthermore, it suggests that there is a great deal
of merit to pursuing hybrid designs which attempt to marry the best features of two or
more existing technologies. For example, there is much work being done on trapping
atoms inside electromagnetic cavities. This enables flexible manipulation of the atom
inside the cavity via optical techniques, and makes possible real-time feedback control of
single atoms in ways unavailable in conventional atom traps.
To conclude, note that it is important not to assess quantum information processing
as though it were just another technology for information processing. For example, it
is tempting to dismiss quantum computation as yet another technological fad in the
evolution of the computer that will pass in time, much as other fads have passed – for
example, the ‘bubble memories’ widely touted as the next big thing in memory during the
early 1980s. This is a mistake, since quantum computation is an abstract paradigm for
information processing that may have many different implementations in technology. One
can compare two different proposals for quantum computing as regards their technological
merits – it makes sense to compare a ‘good’ proposal to a ‘bad’ proposal – however even
a very poor proposal for a quantum computer is of a different qualitative nature from a
superb design for a classical computer.
1.6 Quantum information
The term ‘quantum information’ is used in two distinct ways in the field of quantum
computation and quantum information. The first usage is as a broad catch-all for all
manner of operations that might be interpreted as related to information processing
Quantum information
51
using quantum mechanics. This use encompasses subjects such as quantum computation,
quantum teleportation, the no-cloning theorem, and virtually all other topics in this book.
The second use of ‘quantum information’ is much more specialized: it refers to the
study of elementary quantum information processing tasks. It does not typically include,
for example, quantum algorithm design, since the details of specific quantum algorithms
are beyond the scope of ‘elementary’. To avoid confusion we will use the term ‘quantum
information theory’ to refer to this more specialized field, in parallel with the widely
used term ‘(classical) information theory’ to describe the corresponding classical field.
Of course, the term ‘quantum information theory’ has a drawback of its own – it might
be seen as implying that theoretical considerations are all that matter! Of course, this
is not the case, and experimental demonstration of the elementary processes studied by
quantum information theory is of great interest.
The purpose of this section is to introduce the basic ideas of quantum information
theory. Even with the restriction to elementary quantum information processing tasks,
quantum information theory may look like a disordered zoo to the beginner, with many
apparently unrelated subjects falling under the ‘quantum information theory’ rubric. In
part, that’s because the subject is still under development, and it’s not yet clear how all
the pieces fit together. However, we can identify a few fundamental goals uniting work
on quantum information theory:
(1) Identify elementary classes of static resources in quantum mechanics. An
example is the qubit. Another example is the bit; classical physics arises as a special
case of quantum physics, so it should not be surprising that elementary static
resources appearing in classical information theory should also be of great relevance
in quantum information theory. Yet another example of an elementary class of
static resources is a Bell state shared between two distant parties.
(2) Identify elementary classes of dynamical processes in quantum mechanics.
A simple example is memory, the ability to store a quantum state over some period
of time. Less trivial processes are quantum information transmission between two
parties, Alice and Bob; copying (or trying to copy) a quantum state, and the process
of protecting quantum information processing against the effects of noise.
(3) Quantify resource tradeoffs incurred performing elementary dynamical
processes. For example, what are the minimal resources required to reliably
transfer quantum information between two parties using a noisy communications
channel?
Similar goals define classical information theory; however, quantum information theory
is broader in scope than classical information theory, for quantum information theory
includes all the static and dynamic elements of classical information theory, as well as
additional static and dynamic elements.
The remainder of this section describes some examples of questions studied by quantum information theory, in each case emphasizing the fundamental static and dynamic
elements under consideration, and the resource tradeoffs being considered. We begin with
an example that will appear quite familiar to classical information theorists: the problem
of sending classical information through a quantum channel. We then begin to branch out
and explore some of the new static and dynamic processes present in quantum mechanics, such as quantum error-correction, the problem of distinguishing quantum states, and
entanglement transformation. The chapter concludes with some reflections on how the
52
Introduction and overview
tools of quantum information theory can be applied elsewhere in quantum computation
and quantum information.
1.6.1 Quantum information theory: example problems
Classical information through quantum channels
The fundamental results of classical information theory are Shannon’s noiseless channel
coding theorem and Shannon’s noisy channel coding theorem. The noiseless channel
coding theorem quantifies how many bits are required to store information being emitted
by a source of information, while the noisy channel coding theorem quantifies how much
information can be reliably transmitted through a noisy communications channel.
What do we mean by an information source? Defining this notion is a fundamental
problem of classical and quantum information theory, one we’ll re-examine several times.
For now, let’s go with a provisional definition: a classical information source is described
by a set of probabilities pj , j = 1, 2, . . . , d. Each use of the source results in the ‘letter’
j being emitted, chosen at random with probability pj , independently for each use of
the source. For instance, if the source were of English text, then the numbers j might
correspond to letters of the alphabet and punctuation, with the probabilities pj giving
the relative frequencies with which the different letters appear in regular English text.
Although it is not true that the letters in English appear in an independent fashion, for
our purposes it will be a good enough approximation.
Regular English text includes a considerable amount of redundancy, and it is possible to
exploit that redundancy to compress the text. For example, the letter ‘e’ occurs much more
frequently in regular English text than does the letter ‘z’. A good scheme for compressing
English text will therefore represent the letter ‘e’ using fewer bits of information than
it uses to represent ‘z’. Shannon’s noiseless channel coding theorem quantifies exactly
how well such a compression scheme can be made to work. More precisely, the noiseless
channel coding theorem tells us that a classical source described by probabilities pj can be
compressed so that on average each use of the source can be represented using H(pj ) bits
of information, where H(pj ) ≡ − j pj log(pj ) is a function of the source probability
distribution known as the Shannon entropy. Moreover, the noiseless channel coding
theorem tells us that to attempt to represent the source using fewer bits than this will
result in a high probability of error when the information is decompressed. (Shannon’s
noiseless channel coding theorem is discussed in much greater detail in Chapter 12.)
Shannon’s noiseless coding theorem provides a good example where the goals of information theory listed earlier are all met. Two static resources are identified (goal number 1):
the bit and the information source. A two-stage dynamic process is identified (goal 2),
compressing an information source, and then decompressing to recover the information
source. Finally a quantitative criterion for determining the resources consumed (goal 3)
by an optimal data compression scheme is found.
Shannon’s second major result, the noisy channel coding theorem, quantifies the
amount of information that can be reliably transmitted through a noisy channel. In particular, suppose we wish to transfer the information being produced by some information
source to another location through a noisy channel. That location may be at another point
in space, or at another point in time – the latter is the problem of storing information
in the presence of noise. The idea in both instances is to encode the information being
produced using error-correcting codes, so that any noise introduced by the channel can
be corrected at the other end of the channel. The way error-correcting codes achieve this
Quantum information
53
is by introducing enough redundancy into the information sent through the channel so
that even after some of the information has been corrupted it is still possible to recover
the original message. For example, suppose the noisy channel is for the transmission of
single bits, and the noise in the channel is such that to achieve reliable transmission each
bit produced by the source must be encoded using two bits before being sent through
the channel. We say that such a channel has a capacity of half a bit, since each use of
the channel can be used to reliably convey roughly half a bit of information. Shannon’s
noisy channel coding theorem provides a general procedure for calculating the capacity
of an arbitrary noisy channel.
Shannon’s noisy channel coding theorem also achieves the three goals of information
theory we stated earlier. Two types of static resources are involved (goal 1), the information source, and the bits being sent through the channel. Three dynamical processes are
involved (goal 2). The primary process is the noise in the channel. To combat this noise
we perform the dual processes of encoding and decoding the state in an error-correcting
code. For a fixed noise model, Shannon’s theorem tells us how much redundancy must
be introduced by an optimal error-correction scheme if reliable information transmission
is to be achieved (goal 3).
For both the noiseless and noisy channel coding theorems Shannon restricted himself
to storing the output from an information source in classical systems – bits and the like. A
natural question for quantum information theory is what happens if the storage medium is
changed so that classical information is transmitted using quantum states as the medium.
For example, it may be that Alice wishes to compress some classical information produced
by an information source, transmitting the compressed information to Bob, who then
decompresses it. If the medium used to store the compressed information is a quantum
state, then Shannon’s noiseless channel coding theorem cannot be used to determine
the optimal compression and decompression scheme. One might wonder, for example,
if using qubits allows a better compression rate than is possible classically. We’ll study
this question in Chapter 12, and prove that, in fact, qubits do not allow any significant
saving in the amount of communication required to transmit information over a noiseless
channel.
Naturally, the next step is to investigate the problem of transmitting classical information through a noisy quantum channel. Ideally, what we’d like is a result that quantifies
the capacity of such a channel for the transmission of information. Evaluating the capacity is a very tricky job for several reasons. Quantum mechanics gives us a huge variety of
noise models, since it takes place in a continuous space, and it is not at all obvious how
to adapt classical error-correction techniques to combat the noise. Might it be advantageous, for example, to encode the classical information using entangled states, which are
then transmitted one piece at a time through the noisy channel? Or perhaps it will be
advantageous to decode using entangled measurements? In Chapter 12 we’ll prove the
HSW (Holevo–Schumacher–Westmoreland) theorem, which provides a lower bound
on the capacity of such a channel. Indeed, it is widely believed that the HSW theorem
provides an exact evaluation of the capacity, although a complete proof of this is not yet
known! What remains at issue is whether or not encoding using entangled states can be
used to raise the capacity beyond the lower bound provided by the HSW theorem. All
evidence to date suggests that this doesn’t help raise the capacity, but it is still a fascinating open problem of quantum information theory to determine the truth or falsity of
this conjecture.
54
Introduction and overview
Quantum information through quantum channels
Classical information is, of course, not the only static resource available in quantum
mechanics. Quantum states themselves are a natural static resource, even more natural
than classical information. Let’s look at a different quantum analogue of Shannon’s coding
theorems, this time involving the compression and decompression of quantum states.
To begin, we need to define some quantum notion of an information source, analogous
to the classical definition of an information source. As in the classical case, there are several
different ways of doing this, but for the sake of definiteness let’s make the provisional
definition that a quantum source is described by a set of probabilities pj and corresponding
quantum states |ψj . Each use of the source produces a state |ψj with probability pj ,
with different uses of the source being independent of one another.
Is it possible to compress the output from such a quantum mechanical source? Consider
the case of a qubit source which outputs the state |0 with probability p and the state |1
with probability 1 − p. This is essentially the same as a classical source emitting single
bits, either 0 with probability p, or 1 with probability 1 − p, so it is not surprising that
similar techniques can be used to compress the source so that only H(p, 1 − p) qubits
are required to store the compressed source, where H(·) is again the Shannon entropy
function.
What if the source√had instead been producing the state |0 with probability p, and
the state (|0 + |1)/ 2 with probability 1 − p? The standard techniques of classical
data compression no longer apply,
√ since in general it is not possible for us to distinguish
the states |0 and (|0 + |1)/ 2. Might it still be possible to perform some type of
compression operation?
It turns out that a type of compression is still possible, even in this instance. What is
interesting is that the compression may no longer be error-free, in the sense that the quantum states being produced by the source may be slightly distorted by the compression–
decompression procedure. Nevertheless, we require that this distortion ought to become
very small and ultimately negligible in the limit of large blocks of source output being
compressed. To quantify the distortion we introduce a fidelity measure for the compression scheme, which measures the average distortion introduced by the compression
scheme. The idea of quantum data compression is that the compressed data should be
recovered with very good fidelity. Think of the fidelity as being analogous to the probability of doing the decompression correctly – in the limit of large block lengths, it should
tend towards the no error limit of 1.
Schumacher’s noiseless channel coding theorem quantifies the resources required to do
quantum data compression, with the restriction that it be possible to recover the source
with fidelity close to 1. In the case of a source producing orthogonal quantum states
|ψj with probabilities pj Schumacher’s theorem reduces to telling us that the source
may be compressed down to but not beyond the classical limit H(pj ). However, in the
more general case of non-orthogonal states being produced by the source, Schumacher’s
theorem tells us how much a quantum source may be compressed, and the answer is
not the Shannon entropy H(pj )! Instead, a new entropic quantity, the von Neumann
entropy, turns out to be the correct answer. In general, the von Neumann entropy agrees
with the Shannon entropy if and only if the states |ψj are orthogonal. Otherwise, the von
Neumann entropy for the source pj , |ψj is in general strictly smaller than the Shannon
entropy H(pj ). Thus, for example, a source producing the state |0 with probability p
Quantum information
55
√
and (|0 + |1)/ 2 with probability 1 − p can be reliably compressed using fewer than
H(p, 1 − p) qubits per use of the source!
The basic intuition for this decrease in resources required can be understood
√ quite
easily. Suppose the source emitting states |0 with probability p and (|0 + |1)/ 2 with
probability 1 − p is used a large number n times. Then by the law of large numbers,
with high probability
the source emits about np copies of |0 and n(1 − p) copies of
√
(|0 + |1)/ 2. That is, it has the form
⊗n(1−p)
|0 + |1
⊗np
√
,
(1.60)
|0
2
up to re-ordering of the systems involved. Suppose we expand the product of |0 + |1
terms on the right hand side. Since n(1 − p) is large, we can again use the law of large
numbers to deduce that the terms in the product will be roughly one-half |0s and onehalf |1s. That is, the |0 + |1 product can be well approximated by a superposition of
states of the form
|0⊗n(1−p)/2 |1⊗n(1−p)/2 .
(1.61)
Thus the state emitted by the source can be approximated as a superposition of terms of
the form
|0⊗n(1+p)/2 |1⊗n(1−p)/2 .
(1.62)
How many states of this form are there? Roughly n choose n(1 + p)/2, which by Stirling’s approximation is equal to N ≡ 2nH[(1+p)/2,(1−p)/2] . A simple compression method
then is to label all states of the form (1.62) |c1 through |cN . It is possible to perform a unitary transform on the n qubits emitted from the source that takes |cj to
|j|0⊗n−nH[(1+p)/2,(1−p)/2] , since j is an nH[(1 + p)/2, (1 − p)/2] bit number. The compression operation is to perform this unitary transformation, and then drop the final
n−nH[(1+p)/2, (1−p)/2] qubits, leaving a compressed state of nH[(1+p)/2, (1−p)/2]
qubits. To decompress we append the state |0⊗n−nH[(1+p)/2,(1−p)/2] to the compressed
state, and perform the inverse unitary transformation.
This procedure for quantum data compression and decompression results in a storage
requirement of H[(1 + p)/2, (1 − p)/2] qubits per use of the source, which whenever
p ≥ 1/3 is an improvement over the H(p, 1 − p) qubits we might naively have expected
from Shannon’s noiseless channel coding theorem. In fact, Schumacher’s noiseless channel coding theorem allows us to do somewhat better even than this, as we will see in
Chapter 12; however, the essential reason in that construction is the same as √
the reason
we were able to compress here: we exploited the fact that |0 and (|0 + |1)/ 2 are not
orthogonal. Intuitively, the states contain some redundancy since both have a component
in the |0 direction, which results in more physical similarity than would be obtained
from orthogonal states. It is this redundancy that we have exploited in the coding scheme
just described, and which is used in the full proof of Schumacher’s noiseless channel
coding theorem. Note that the restriction p ≥ 1/3 arises because when p < 1/3 this
particular scheme doesn’t exploit the redundancy in the states: we end up effectively
increasing the redundancy present in the problem! Of course, this is an artifact of the
particular scheme we have chosen, and the general solution exploits the redundancy in a
much more sensible way to achieve data compression.
Schumacher’s noiseless channel coding theorem is an analogue of Shannon’s noiseless
56
Introduction and overview
channel coding theorem for the compression and decompression of quantum states. Can
we find an analogue of Shannon’s noisy channel coding theorem? Considerable progress
on this important question has been made, using the theory of quantum error-correcting
codes; however, a fully satisfactory analogue has not yet been found. We review some of
what is known about the quantum channel capacity in Chapter 12.
Quantum distinguishability
Thus far all the dynamical processes we have considered – compression, decompression,
noise, encoding and decoding error-correcting codes – arise in both classical and quantum
information theory. However, the introduction of new types of information, such as
quantum states, enlarges the class of dynamical processes beyond those considered in
classical information theory. A good example is the problem of distinguishing quantum
states. Classically, we are used to being able to distinguish different items of information,
at least in principle. In practice, of course, a smudged letter ‘a’ written on a page may be
very difficult to distinguish from a letter ‘o’, but in principle it is possible to distinguish
between the two possibilities with perfect certainty.
On the other hand, quantum mechanically it is not always possible to distinguish
between arbitrary states. For example, there is no process allowed by quantum
mechanics
√
that will reliably distinguish between the states |0 and (|0 + |1)/ 2. Proving this
rigorously requires tools we don’t presently have available (it is done in Chapter 2),
but by considering examples it’s pretty easy to convince oneself that it is not possible.
Suppose, for example, that we try to distinguish the two states by measuring in the
computational basis. Then, if we have been given the state |0, the
√ measurement will
yield 0 with probability 1. However, when we measure (|0 + |1)/ 2 the measurement
yields 0 with probability 1/2 and 1 with probability 1/2.
√ Thus, while a measurement
result of 1 implies that state must have been (|0 + |1)/ 2, since it couldn’t have been
|0, we can’t infer anything about the identity of the quantum state from a measurement
result of 0.
This indistinguishability of non-orthogonal quantum states is at the heart of quantum
computation and quantum information. It is the essence of our assertion that a quantum state contains hidden information that is not accessible to measurement, and thus
plays a key role in quantum algorithms and quantum cryptography. One of the central
problems of quantum information theory is to develop measures quantifying how well
non-orthogonal quantum states may be distinguished, and much of Chapters 9 and 12 is
concerned with this goal. In this introduction we’ll limit ourselves to pointing out two
interesting aspects of indistinguishability – a connection with the possibility of fasterthan-light communication, and an application to ‘quantum money.’
Imagine for a moment that we could distinguish between arbitrary quantum states.
We’ll show that this implies the ability to communicate faster than light, using entanglement.
√ Suppose Alice and Bob share an entangled pair of qubits in the state (|00 +
|11)/ 2. Then, if Alice measures in the computational basis, the post-measurement
states will be |00 with probability 1/2, and |11 with probability 1/2. Thus Bob’s system is either in the state |0, with probability 1/2, or in the state |1, with probability
1/2. Suppose, however,
in the |+, |− basis. Recall that
√
√ that Alice had instead measured
|0 = (|+ + |−)/ 2 and |1 = (|+ − |−)/ 2. A little algebra shows√that the initial
state of Alice and Bob’s system may be rewritten as (| + + + | − −)/ 2. Therefore,
if Alice measures in the |+, |− basis, the state of Bob’s system after the measurement
Quantum information
57
will be |+ or |− with probability 1/2 each. So far, this is all basic quantum mechanics.
But if Bob had access to a device that could distinguish the four states |0, |1, |+, |−
from one another, then he could tell whether Alice had measured in the computational
basis, or in the |+, |− basis. Moreover, he could get that information instantaneously,
as soon as Alice had made the measurement, providing a means by which Alice and Bob
could achieve faster-than-light communication! Of course, we know that it is not possible
to distinguish non-orthogonal quantum states; this example shows that this restriction is
also intimately tied to other physical properties which we expect the world to obey.
The indistinguishability of non-orthogonal quantum states need not always be a handicap. Sometimes it can be a boon. Imagine that a bank produces banknotes imprinted
with a (classical)
√ serial number, and a sequence of qubits each in either the state |0
or (|0 + |1)/ 2. Nobody but the bank knows what sequence of these two states is
embedded in the note, and the bank maintains a list matching serial numbers to embedded states. The note is impossible to counterfeit exactly, because it is impossible
for a would-be counterfeiter to determine with certainty the state of the qubits in the
original note, without destroying them. When presented with the banknote a merchant
(of certifiable repute) can verify that it is not a counterfeit by calling the bank, telling
them the serial number, and then asking what sequence of states were embedded in
the note. They can then
√ note is genuine by measuring the qubits in the
√ check that the
|0, |1 or (|0 + |1)/ 2, (|0 − |1)/ 2 basis, as directed by the bank. With probability
which increases exponentially to one with the number of qubits checked, any would-be
counterfeiter will be detected at this stage! This idea is the basis for numerous other
quantum cryptographic protocols, and demonstrates the utility of the indistinguishability
of non-orthogonal quantum states.
Exercise 1.2: Explain how a device which, upon input of one of two non-orthogonal
quantum states |ψ or |ϕ correctly identified the state, could be used to build a
device which cloned the states |ψ and |ϕ, in violation of the no-cloning
theorem. Conversely, explain how a device for cloning could be used to
distinguish non-orthogonal quantum states.
Creation and transformation of entanglement
Entanglement is another elementary static resource of quantum mechanics. Its properties
are amazingly different from those of the resources most familiar from classical information theory, and they are not yet well understood; we have at best an incomplete collage
of results related to entanglement. We don’t yet have all the language needed to understand the solutions, but let’s at least look at two information-theoretic problems related
to entanglement.
Creating entanglement is a simple dynamical process of interest in quantum information theory. How many qubits must two parties exchange if they are to create a particular
entangled state shared between them, given that they share no prior entanglement? A
second dynamical process of interest is transforming entanglement from one form into
another. Suppose, for example, that Alice and Bob share between them a Bell state, and
wish to transform it into some other type of entangled state. What resources do they
need to accomplish this task? Can they do it without communicating? With classical
communication only? If quantum communication is required then how much quantum
communication is required?
58
Introduction and overview
Answering these and more complex questions about the creation and transformation of
entanglement forms a fascinating area of study in its own right, and also promises to give
insight into tasks such as quantum computation. For example, a distributed quantum
computation may be viewed as simply a method for generating entanglement between
two or more parties; lower bounds on the amount of communication that must be done
to perform such a distributed quantum computation then follow from lower bounds on
the amount of communication that must be performed to create appropriate entangled
states.
1.6.2 Quantum information in a wider context
We have given but the barest glimpse of quantum information theory. Part III of this
book discusses quantum information theory in much greater detail, especially Chapter 11,
which deals with fundamental properties of entropy in quantum and classical information
theory, and Chapter 12, which focuses on pure quantum information theory.
Quantum information theory is the most abstract part of quantum computation and
quantum information, yet in some sense it is also the most fundamental. The question
driving quantum information theory, and ultimately all of quantum computation and
quantum information, is what makes quantum information processing tick? What is
it that separates the quantum and the classical world? What resources, unavailable in a
classical world, are being utilized in a quantum computation? Existing answers to these
questions are foggy and incomplete; it is our hope that the fog may yet lift in the years
to come, and we will obtain a clear appreciation for the possibilities and limitations of
quantum information processing.
Problem 1.1: (Feynman-Gates conversation) Construct a friendly imaginary
discussion of about 2000 words between Bill Gates and Richard Feynman, set in
the present, on the future of computation. (Comment: You might like to try
waiting until you’ve read the rest of the book before attempting this question.
See the ‘History and further reading’ below for pointers to one possible answer
for this question.)
Problem 1.2: What is the most significant discovery yet made in quantum
computation and quantum information? Write an essay of about 2000 words for
an educated lay audience about the discovery. (Comment: As for the previous
problem, you might like to try waiting until you’ve read the rest of the book
before attempting this question.)
History and further reading
Most of the material in this chapter is revisited in more depth in later chapters. Therefore
the historical references and further reading below are limited to material which does not
recur in later chapters.
Piecing together the historical context in which quantum computation and quantum
information have developed requires a broad overview of the history of many fields. We
have tried to tie this history together in this chapter, but inevitably much background
material was omitted due to limited space and expertise. The following recommendations
attempt to redress this omission.
History and further reading
59
The history of quantum mechanics has been told in many places. We recommend especially the outstanding works of Pais[Pai82, Pai86, Pai91]. Of these three, [Pai86] is most directly concerned with the development of quantum mechanics; however, Pais’ biographies
of Einstein[Pai82] and of Bohr[Pai91] also contain much material of interest, at a less intense
level. The rise of technologies based upon quantum mechanics has been described by Milburn[Mil97, Mil98]. Turing’s marvelous paper on the foundations of computer science[Tur36]
is well worth reading. It can be found in the valuable historical collection of Davis[Dav65].
Hofstadter[Hof79] and Penrose[Pen89] contain entertaining and informative discussions of
the foundations of computer science. Shasha and Lazere’s biography of fifteen leading
computer scientists[SL98] gives considerable insight into many different facets of the history of computer science. Finally, Knuth’s awesome series of books[Knu97, Knu98a, Knu98b]
contain an amazing amount of historical information. Shannon’s brilliant papers founding
information theory make excellent reading[Sha48] (also reprinted in [SW49]). MacWilliams
and Sloane[MS77] is not only an excellent text on error-correcting codes, but also contains
an enormous amount of useful historical information. Similarly, Cover and Thomas[CT91]
is an excellent text on information theory, with extensive historical information. Shannon’s collected works, together with many useful historical items have been collected in
a large volume[SW93] edited by Sloane and Wyner. Slepian has also collected a useful set
of reprints on information theory[Sle74]. Cryptography is an ancient art with an intricate
and often interesting history. Kahn[Kah96] is a huge history of cryptography containing a wealth of information. For more recent developments we recommend the books
by Menezes, van Oorschot, and Vanstone[MvOV96], Schneier[Sch96a], and by Diffie and
Landau[DL98] .
Quantum teleportation was discovered by Bennett, Brassard, Crépeau, Jozsa, Peres,
+
and Wootters[BBC 93], and later experimentally realized in various different forms by
+
Boschi, Branca, De Martini, Hardy and Popescu[BBM 98] using optical techniques, by
+
Bouwmeester, Pan, Mattle, Eibl, Weinfurter, and Zeilinger[BPM 97] using photon polarization, by Furusawa, Sørensen, Braunstein, Fuchs, Kimble, and Polzik using ‘squeezed’
+
states of light[FSB 98], and by Nielsen, Knill, and Laflamme using NMR[NKL98].
Deutsch’s problem was posed by Deutsch[Deu85], and a one-bit solution was given in the
same paper. The extension to the general n-bit case was given by Deutsch and Jozsa[DJ92].
The algorithms in these early papers have been substantially improved subsequently
by Cleve, Ekert, Macchiavello, and Mosca[CEMM98], and independently in unpublished
work by Tapp. In this chapter we have given the improved version of the algorithm,
which fits very nicely into the hidden subgroup problem framework that will later be
discussed in Chapter 5. The original algorithm of Deutsch only worked probabilistically;
Deutsch and Jozsa improved this to obtain a deterministic algorithm, but their method
required two function evaluations, in contrast to the improved algorithms presented in
this chapter. Nevertheless, it is still conventional to refer to these algorithms as Deutsch’s
algorithm and the Deutsch–Jozsa algorithm in honor of two huge leaps forward: the
concrete demonstration by Deutsch that a quantum computer could do something faster
than a classical computer; and the extension by Deutsch and Jozsa which demonstrated
for the first time a similar gap for the scaling of the time required to solve a problem.
Excellent discussions of the Stern–Gerlach experiment can be found in standard quantum mechanics textbooks such as the texts by Sakurai[Sak95], Volume III of Feynman,
Leighton and Sands[FLS65a], and Cohen-Tannoudji, Diu and Laloë[CTDL77a, CTDL77b].
Problem 1.1 was suggested by the lovely article of Rahim[Rah99].
2 Introduction to quantum mechanics
I ain’t no physicist but I know what matters.
– Popeye the Sailor
Quantum mechanics: Real Black Magic Calculus
– Albert Einstein
Quantum mechanics is the most accurate and complete description of the world known. It
is also the basis for an understanding of quantum computation and quantum information.
This chapter provides all the necessary background knowledge of quantum mechanics
needed for a thorough grasp of quantum computation and quantum information. No
prior knowledge of quantum mechanics is assumed.
Quantum mechanics is easy to learn, despite its reputation as a difficult subject. The
reputation comes from the difficulty of some applications, like understanding the structure of complicated molecules, which aren’t fundamental to a grasp of the subject; we
won’t be discussing such applications. The only prerequisite for understanding is some
familiarity with elementary linear algebra. Provided you have this background you can
begin working out simple problems in a few hours, even with no prior knowledge of the
subject.
Readers already familiar with quantum mechanics can quickly skim through this chapter, to become familiar with our (mostly standard) notational conventions, and to assure
themselves of familiarity with all the material. Readers with little or no prior knowledge
should work through the chapter in detail, pausing to attempt the exercises. If you have
difficulty with an exercise, move on, and return later to make another attempt.
The chapter begins with a review of some material from linear algebra in Section 2.1.
This section assumes familiarity with elementary linear algebra, but introduces the notation used by physicists to describe quantum mechanics, which is different to that used in
most introductions to linear algebra. Section 2.2 describes the basic postulates of quantum mechanics. Upon completion of the section, you will have understood all of the
fundamental principles of quantum mechanics. This section contains numerous simple
exercises designed to help consolidate your grasp of this material. The remaining sections
of the chapter, and of this book, elucidate upon this material, without introducing fundamentally new physical principles. Section 2.3 explains superdense coding, a surprising
and illuminating example of quantum information processing which combines many of
the postulates of quantum mechanics in a simple setting. Sections 2.4 and 2.5 develop
powerful mathematical tools – the density operator, purifications, and the Schmidt decomposition – which are especially useful in the study of quantum computation and
quantum information. Understanding these tools will also help you consolidate your understanding of elementary quantum mechanics. Finally, Section 2.6 examines the question
of how quantum mechanics goes beyond the usual ‘classical’ understanding of the way
the world works.
Linear algebra
61
2.1 Linear algebra
This book is written as much to disturb and annoy as to instruct.
– The first line of About Vectors, by Banesh Hoffmann.
Life is complex – it has both real and imaginary parts.
– Anonymous
Linear algebra is the study of vector spaces and of linear operations on those vector
spaces. A good understanding of quantum mechanics is based upon a solid grasp of
elementary linear algebra. In this section we review some basic concepts from linear
algebra, and describe the standard notations which are used for these concepts in the
study of quantum mechanics. These notations are summarized in Figure 2.1 on page 62,
with the quantum notation in the left column, and the linear-algebraic description in the
right column. You may like to glance at the table, and see how many of the concepts in
the right column you recognize.
In our opinion the chief obstacle to assimilation of the postulates of quantum mechanics is not the postulates themselves, but rather the large body of linear algebraic notions
required to understand them. Coupled with the unusual Dirac notation adopted by physicists for quantum mechanics, it can appear (falsely) quite fearsome. For these reasons,
we advise the reader not familiar with quantum mechanics to quickly read through the
material which follows, pausing mainly to concentrate on understanding the absolute basics of the notation being used. Then proceed to a careful study of the main topic of the
chapter – the postulates of quantum mechanics – returning to study the necessary linear
algebraic notions and notations in more depth, as required.
The basic objects of linear algebra are vector spaces. The vector space of most interest
to us is Cn , the space of all n-tuples of complex numbers, (z1 , . . . , zn ). The elements of
a vector space are called vectors, and we will sometimes use the column matrix notation
z1
..
(2.1)
.
zn
to indicate a vector. There is an addition operation defined which takes pairs of vectors
to other vectors. In Cn the addition operation for vectors is defined by
′
z1
z1
z1 + z1′
.. ..
..
(2.2)
. + . ≡
,
.
zn′
zn
zn + zn′
where the addition operations on the right are just ordinary additions of complex numbers.
Furthermore, in a vector space there is a multiplication by a scalar operation. In Cn
this operation is defined by
zz1
z1
(2.3)
z ... ≡ ... ,
zn
zzn
62
Introduction to quantum mechanics
where z is a scalar, that is, a complex number, and the multiplications on the right
are ordinary multiplication of complex numbers. Physicists sometimes refer to complex
numbers as c-numbers.
Quantum mechanics is our main motivation for studying linear algebra, so we will use
the standard notation of quantum mechanics for linear algebraic concepts. The standard
quantum mechanical notation for a vector in a vector space is the following:
|ψ.
(2.4)
ψ is a label for the vector (any label is valid, although we prefer to use simple labels like
ψ and ϕ). The |· notation is used to indicate that the object is a vector. The entire object
|ψ is sometimes called a ket, although we won’t use that terminology often.
A vector space also contains a special zero vector, which we denote by 0. It satisfies
the property that for any other vector |v, |v + 0 = |v. Note that we do not use the
ket notation for the zero vector – it is the only exception we shall make. The reason
for making the exception is because it is conventional to use the ‘obvious’ notation for
the zero vector, |0, to mean something else entirely. The scalar multiplication operation
is such that z0 = 0 for any complex number z. For convenience, we use the notation
(z1 , . . . , zn ) to denote a column matrix with entries z1 , . . . , zn . In Cn the zero element
is (0, 0, . . . , 0). A vector subspace of a vector space V is a subset W of V such that W is
also a vector space, that is, W must be closed under scalar multiplication and addition.
Notation
z∗
|ψ
ψ|
ϕ|ψ
|ϕ ⊗ |ψ
|ϕ|ψ
A∗
AT
A†
ϕ|A|ψ
Description
Complex conjugate of the complex number z.
(1 + i)∗ = 1 − i
Vector. Also known as a ket.
Vector dual to |ψ. Also known as a bra.
Inner product between the vectors |ϕ and |ψ.
Tensor product of |ϕ and |ψ.
Abbreviated notation for tensor product of |ϕ and |ψ.
Complex conjugate of the A matrix.
Transpose of the A matrix.
Hermitian conjugate or adjoint of the A matrix, A† = (AT )∗ .
† ∗
a c∗
a b
=
.
b∗ d∗
c d
Inner product between |ϕ and A|ψ.
Equivalently, inner product between A† |ϕ and |ψ.
Figure 2.1. Summary of some standard quantum mechanical notation for notions from linear algebra. This style of
notation is known as the Dirac notation.
2.1.1 Bases and linear independence
A spanning set for a vector space is a set of vectors |v1 , . . . , |vn such that any vector
|v in the vector space can be written as a linear combination |v = i ai |vi of vectors
63
Linear algebra
in that set. For example, a spanning set for the vector space C2 is the set
0
1
,
; |v2 ≡
|v1 ≡
1
0
(2.5)
since any vector
|v =
a1
a2
(2.6)
in C2 can be written as a linear combination |v = a1 |v1 + a2 |v2 of the vectors |v1 and
|v2 . We say that the vectors |v1 and |v2 span the vector space C2 .
Generally, a vector space may have many different spanning sets. A second spanning
set for the vector space C2 is the set
1
1
1
1
; |v2 ≡ √
,
(2.7)
|v1 ≡ √
2 1
2 −1
since an arbitrary vector |v = (a1 , a2 ) can be written as a linear combination of |v1 and
|v2 ,
|v =
a1 + a2
a −a
√ |v1 + 1√ 2 |v2 .
2
2
(2.8)
A set of non-zero vectors |v1 , . . . , |vn are linearly dependent if there exists a set of
complex numbers a1 , . . . , an with ai = 0 for at least one value of i, such that
a1 |v1 + a2 |v2 + · · · + an |vn = 0.
(2.9)
A set of vectors is linearly independent if it is not linearly dependent. It can be shown
that any two sets of linearly independent vectors which span a vector space V contain the
same number of elements. We call such a set a basis for V . Furthermore, such a basis
set always exists. The number of elements in the basis is defined to be the dimension of
V . In this book we will only be interested in finite dimensional vector spaces. There are
many interesting and often difficult questions associated with infinite dimensional vector
spaces. We won’t need to worry about these questions.
Exercise 2.1: (Linear dependence: example) Show that (1, −1), (1, 2) and (2, 1)
are linearly dependent.
2.1.2 Linear operators and matrices
A linear operator between vector spaces V and W is defined to be any function A :
V → W which is linear in its inputs,
A
i
ai |vi
=
i
ai A |vi .
(2.10)
Usually we just write A|v to denote A(|v). When we say that a linear operator A is
defined on a vector space, V , we mean that A is a linear operator from V to V . An
important linear operator on any vector space V is the identity operator, IV , defined by
the equation IV |v ≡ |v for all vectors |v. Where no chance of confusion arises we drop
the subscript V and just write I to denote the identity operator. Another important linear
operator is the zero operator, which we denote 0. The zero operator maps all vectors to
64
Introduction to quantum mechanics
the zero vector, 0|v ≡ 0. It is clear from (2.10) that once the action of a linear operator
A on a basis is specified, the action of A is completely determined on all inputs.
Suppose V, W , and X are vector spaces, and A : V → W and B : W → X are
linear operators. Then we use the notation BA to denote the composition of B with A,
defined by (BA)(|v) ≡ B(A(|v)). Once again, we write BA|v as an abbreviation for
(BA)(|v).
The most convenient way to understand linear operators is in terms of their matrix
representations. In fact, the linear operator and matrix viewpoints turn out to be completely equivalent. The matrix viewpoint may be more familiar to you, however. To see
the connection, it helps to first understand that an m by n complex matrix A with entries
Aij is in fact a linear operator sending vectors in the vector space Cn to the vector space
Cm , under matrix multiplication of the matrix A by a vector in Cn . More precisely, the
claim that the matrix A is a linear operator just means that
A
i
ai |vi
=
i
ai A|vi
(2.11)
is true as an equation where the operation is matrix multiplication of A by column vectors.
Clearly, this is true!
We’ve seen that matrices can be regarded as linear operators. Can linear operators
be given a matrix representation? In fact they can, as we now explain. This equivalence
between the two viewpoints justifies our interchanging terms from matrix theory and
operator theory throughout the book. Suppose A : V → W is a linear operator between
vector spaces V and W . Suppose |v1 , . . . , |vm is a basis for V and |w1 , . . . , |wn is a
basis for W . Then for each j in the range 1, . . . , m, there exist complex numbers A1j
through Anj such that
A|vj =
i
Aij |wi .
(2.12)
The matrix whose entries are the values Aij is said to form a matrix representation of the
operator A. This matrix representation of A is completely equivalent to the operator A,
and we will use the matrix representation and abstract operator viewpoints interchangeably. Note that to make the connection between matrices and linear operators we must
specify a set of input and output basis states for the input and output vector spaces of
the linear operator.
Exercise 2.2: (Matrix representations: example) Suppose V is a vector space
with basis vectors |0 and |1, and A is a linear operator from V to V such that
A|0 = |1 and A|1 = |0. Give a matrix representation for A, with respect to
the input basis |0, |1, and the output basis |0, |1. Find input and output bases
which give rise to a different matrix representation of A.
Exercise 2.3: (Matrix representation for operator products) Suppose A is a
linear operator from vector space V to vector space W , and B is a linear
operator from vector space W to vector space X. Let |vi , |wj , and |xk be
bases for the vector spaces V, W , and X, respectively. Show that the matrix
representation for the linear transformation BA is the matrix product of the
matrix representations for B and A, with respect to the appropriate bases.
65
Linear algebra
Exercise 2.4: (Matrix representation for identity) Show that the identity operator
on a vector space V has a matrix representation which is one along the diagonal
and zero everywhere else, if the matrix representation is taken with respect to the
same input and output bases. This matrix is known as the identity matrix.
2.1.3 The Pauli matrices
Four extremely useful matrices which we shall often have occasion to use are the Pauli
matrices. These are 2 by 2 matrices, which go by a variety of notations. The matrices,
and their corresponding notations, are depicted in Figure 2.2. The Pauli matrices are so
useful in the study of quantum computation and quantum information that we encourage
you to memorize them by working through in detail the many examples and exercises
based upon them in subsequent sections.
σ0 ≡ I ≡
1
0
0
1
σ2 ≡ σ y ≡ Y ≡
0 −i
i
0
σ1 ≡ σ x ≡ X ≡
0
1
1
0
σ3 ≡ σ z ≡ Z ≡
1
0
0 −1
Figure 2.2. The Pauli matrices. Sometimes I is omitted from the list with just X, Y and Z known as the Pauli
matrices.
2.1.4 Inner products
An inner product is a function which takes as input two vectors |v and |w from a vector
space and produces a complex number as output. For the time being, it will be convenient
to write the inner product of |v and |w as (|v, |w). This is not the standard quantum
mechanical notation; for pedagogical clarity the (·, ·) notation will be useful occasionally in
this chapter. The standard quantum mechanical notation for the inner product (|v, |w)
is v|w, where |v and |w are vectors in the inner product space, and the notation v|
is used for the dual vector to the vector |v; the dual is a linear operator from the inner
product space V to the complex numbers C, defined by v|(|w) ≡ v|w ≡ (|v, |w).
We will see shortly that the matrix representation of dual vectors is just a row vector.
A function (·, ·) from V × V to C is an inner product if it satisfies the requirements
that:
(1) (·, ·) is linear in the second argument,
|v,
i
λi |wi
=
i
λi |v, |wi .
(2.13)
(2) (|v, |w) = (|w, |v)∗ .
(3) (|v, |v) ≥ 0 with equality if and only if |v = 0.
For example, Cn has an inner product defined by
((y1 , . . . , yn ), (z1 , . . . , zn )) ≡
i
⎡
⎢
yi∗ zi = y1∗ . . . yn∗ ⎣
⎤
z1
.. ⎥ .
.
zn
(2.14)
66
Introduction to quantum mechanics
We call a vector space equipped with an inner product an inner product space.
Exercise 2.5: Verify that (·, ·) just defined is an inner product on Cn .
Exercise 2.6: Show that any inner product (·, ·) is conjugate-linear in the first
argument,
i
λi |wi , |v
=
i
λ∗i (|wi , |v).
(2.15)
Discussions of quantum mechanics often refer to Hilbert space. In the finite dimensional complex vector spaces that come up in quantum computation and quantum information, a Hilbert space is exactly the same thing as an inner product space. From now
on we use the two terms interchangeably, preferring the term Hilbert space. In infinite
dimensions Hilbert spaces satisfy additional technical restrictions above and beyond inner
product spaces, which we will not need to worry about.
Vectors |w and |v are orthogonal if their inner product is zero. For example, |w ≡
(1, 0) and |v ≡ (0, 1) are orthogonal with respect to the inner product defined by (2.14).
We define the norm of a vector |v by
(2.16)
|v ≡ v|v .
A unit vector is a vector |v such that |v = 1. We also say that |v is normalized if
|v = 1. It is convenient to talk of normalizing a vector by dividing by its norm; thus
|v/|v is the normalized form of |v, for any non-zero vector |v. A set |i of vectors
with index i is orthonormal if each vector is a unit vector, and distinct vectors in the set
are orthogonal, that is, i|j = δij , where i and j are both chosen from the index set.
Exercise 2.7: Verify that |w ≡ (1, 1) and |v ≡ (1, −1) are orthogonal. What are the
normalized forms of these vectors?
Suppose |w1 , . . . , |wd is a basis set for some vector space V with an inner product.
There is a useful method, the Gram–Schmidt procedure, which can be used to produce an
orthonormal basis set |v1 , . . . , |vd for the vector space V . Define |v1 ≡ |w1 / |w1 ,
and for 1 ≤ k ≤ d − 1 define |vk+1 inductively by
|wk+1 − ki=1 vi |wk+1 |vi
|vk+1 ≡
.
(2.17)
|wk+1 − ki=1 vi |wk+1 |vi
It is not difficult to verify that the vectors |v1 , . . . , |vd form an orthonormal set which
is also a basis for V . Thus, any finite dimensional vector space of dimension d has an
orthonormal basis, |v1 , . . . , |vd .
Exercise 2.8: Prove that the Gram–Schmidt procedure produces an orthonormal basis
for V .
From now on, when we speak of a matrix representation for a linear operator, we mean
a matrix representation with respect to orthonormal input and output bases. We also use
the convention that if the input and output spaces for a linear operator are the same, then
the input and output bases are the same, unless noted otherwise.
Linear algebra
67
With these conventions, the inner product on a Hilbert space can be given a convenient
matrix representation. Let |w = i wi |i and |v = j vj |j be representations of
vectors |w and |v with respect to some orthonormal basis |i. Then, since i|j = δij ,
⎞
⎛
v|w = ⎝
i
vi |i,
wj |j⎠ =
j
⎡
vi∗ wi
vi∗ wj δij =
(2.18)
i
ij
⎤
w1
⎢
⎥
= v1∗ . . . vn∗ ⎣ ... .
wn
(2.19)
That is, the inner product of two vectors is equal to the vector inner product between
two matrix representations of those vectors, provided the representations are written
with respect to the same orthonormal basis. We also see that the dual vector v| has a
nice interpretation as the row vector whose components are complex conjugates of the
corresponding components of the column vector representation of |v.
There is a useful way of representing linear operators which makes use of the inner
product, known as the outer product representation. Suppose |v is a vector in an inner
product space V , and |w is a vector in an inner product space W . Define |wv| to be
the linear operator from V to W whose action is defined by
|wv|
|v ′ ≡ |w v|v ′ = v|v ′ |w.
(2.20)
This equation fits beautifully into our notational conventions, according to which the
expression |wv|v ′ could potentially have one of two meanings: we will use it to denote
the result when the operator |wv| acts on |v ′ , and it has an existing interpretation as
the result of multiplying |w by the complex number v|v ′ . Our definitions are chosen
so that these two potential meanings coincide. Indeed, we define the former in terms of
the latter!
We can take linear combinations of outer product operators |wv| in the obvious way.
By definition i ai |wi vi | is the linear operator which, when acting on |v ′ , produces
′
i ai |wi vi |v as output.
The usefulness of the outer product notation can be discerned from an important result
known as the completeness relation for orthonormal vectors. Let |i be any orthonormal
basis for the vector space V , so an arbitrary vector |v can be written |v = i vi |i for
some set of complex numbers vi . Note that i|v = vi and therefore
i
|ii| |v =
i
|ii|v =
i
vi |i = |v.
(2.21)
Since the last equation is true for all |v it follows that
i
|ii| = I.
(2.22)
This equation is known as the completeness relation. One application of the completeness
relation is to give a means for representing any operator in the outer product notation.
Suppose A : V → W is a linear operator, |vi is an orthonormal basis for V , and |wj
an orthonormal basis for W . Using the completeness relation twice we obtain
A = IW AIV
(2.23)
68
Introduction to quantum mechanics
|wj wj |A|vi vi |
(2.24)
ij
wj |A|vi |wj vi |,
(2.25)
ij
=
=
which is the outer product representation for A. We also see from this equation that A
has matrix element wj |A|vi in the ith column and jth row, with respect to the input
basis |vi and output basis |wj .
A second application illustrating the usefulness of the completeness relation is the
Cauchy–Schwarz inequality. This important result is discussed in Box 2.1, on this
page.
Exercise 2.9: (Pauli operators and the outer product) The Pauli matrices
(Figure 2.2 on page 65) can be considered as operators with respect to an
orthonormal basis |0, |1 for a two-dimensional Hilbert space. Express each of
the Pauli operators in the outer product notation.
Exercise 2.10: Suppose |vi is an orthonormal basis for an inner product space V .
What is the matrix representation for the operator |vj vk |, with respect to the
|vi basis?
Box 2.1: The Cauchy-Schwarz inequality
The Cauchy–Schwarz inequality is an important geometric fact about Hilbert
spaces. It states that for any two vectors |v and |w, |v|w|2 ≤ v|vw|w. To
see this, use the Gram–Schmidt procedure to construct an orthonormalbasis |i
for the vector space such that the first member of the basis |i is |w/ w|w.
Using the completeness relation i |ii| = I, and dropping some non-negative
terms gives
v|vw|w =
≥
i
v|ii|vw|w
v|ww|v
w|w
w|w
= v|ww|v = |v|w|2 ,
(2.26)
(2.27)
(2.28)
as required. A little thought shows that equality occurs if and only if |v and |w
are linearly related, |v = z|w or |w = z|v, for some scalar z.
2.1.5 Eigenvectors and eigenvalues
An eigenvector of a linear operator A on a vector space is a non-zero vector |v such that
A|v = v|v, where v is a complex number known as the eigenvalue of A corresponding
to |v. It will often be convenient to use the notation v both as a label for the eigenvector,
and to represent the eigenvalue. We assume that you are familiar with the elementary
properties of eigenvalues and eigenvectors – in particular, how to find them, via the
characteristic equation. The characteristic function is defined to be c(λ) ≡ det |A − λI|,
69
Linear algebra
where det is the determinant function for matrices; it can be shown that the characteristic
function depends only upon the operator A, and not on the specific matrix representation
used for A. The solutions of the characteristic equation c(λ) = 0 are the eigenvalues
of the operator A. By the fundamental theorem of algebra, every polynomial has at least
one complex root, so every operator A has at least one eigenvalue, and a corresponding
eigenvector. The eigenspace corresponding to an eigenvalue v is the set of vectors which
have eigenvalue v. It is a vector subspace of the vector space on which A acts.
A diagonal representation for an operator A on a vector space V is a representation
A = i λi |ii|, where the vectors |i form an orthonormal set of eigenvectors for A,
with corresponding eigenvalues λi . An operator is said to be diagonalizable if it has a
diagonal representation. In the next section we will find a simple set of necessary and
sufficient conditions for an operator on a Hilbert space to be diagonalizable. As an example
of a diagonal representation, note that the Pauli Z matrix may be written
1
0
= |00| − |11|,
(2.29)
Z=
0 −1
where the matrix representation is with respect to orthonormal vectors |0 and |1, respectively. Diagonal representations are sometimes also known as orthonormal decompositions.
When an eigenspace is more than one dimensional we say that it is degenerate. For
example, the matrix A defined by
⎤
⎡
2 0 0
(2.30)
A≡⎣ 0 2 0
0 0 0
has a two-dimensional eigenspace corresponding to the eigenvalue 2. The eigenvectors
(1, 0, 0) and (0, 1, 0) are said to be degenerate because they are linearly independent
eigenvectors of A with the same eigenvalue.
Exercise 2.11: (Eigendecomposition of the Pauli matrices) Find the
eigenvectors, eigenvalues, and diagonal representations of the Pauli matrices
X, Y , and Z.
Exercise 2.12: Prove that the matrix
1 0
1 1
(2.31)
is not diagonalizable.
2.1.6 Adjoints and Hermitian operators
Suppose A is any linear operator on a Hilbert space, V . It turns out that there exists a
unique linear operator A† on V such that for all vectors |v, |w ∈ V ,
(|v, A|w) = (A† |v, |w).
(2.32)
This linear operator is known as the adjoint or Hermitian conjugate of the operator
A. From the definition it is easy to see that (AB)† = B † A† . By convention, if |v is
a vector, then we define |v† ≡ v|. With this definition it is not difficult to see that
(A|v)† = v|A† .
70
Introduction to quantum mechanics
Exercise 2.13: If |w and |v are any two vectors, show that (|wv|)† = |vw|.
Exercise 2.14: (Anti-linearity of the adjoint) Show that the adjoint operation is
anti-linear,
†
ai Ai
a∗i A†i .
=
i
(2.33)
i
Exercise 2.15: Show that (A† )† = A.
In a matrix representation of an operator A, the action of the Hermitian conjugation
operation is to take the matrix of A to the conjugate-transpose matrix, A† ≡ (A∗ )T ,
where the ∗ indicates complex conjugation, and T indicates the transpose operation. For
example, we have
†
1 − 3i 1 − i
1 + 3i
2i
.
(2.34)
=
−2i 1 + 4i
1 + i 1 − 4i
An operator A whose adjoint is A is known as a Hermitian or self-adjoint operator. An important class of Hermitian operators is the projectors. Suppose W is a
k-dimensional vector subspace of the d-dimensional vector space V . Using the Gram–
Schmidt procedure it is possible to construct an orthonormal basis |1, . . . , |d for V
such that |1, . . . , |k is an orthonormal basis for W . By definition,
k
P ≡
i=1
|ii|
(2.35)
is the projector onto the subspace W . It is easy to check that this definition is independent
of the orthonormal basis |1, . . . , |k used for W . From the definition it can be shown that
|vv| is Hermitian for any vector |v, so P is Hermitian, P † = P . We will often refer
to the ‘vector space’ P , as shorthand for the vector space onto which P is a projector.
The orthogonal complement of P is the operator Q ≡ I − P . It is easy to see that Q is
a projector onto the vector space spanned by |k + 1, . . . , |d, which we also refer to as
the orthogonal complement of P , and may denote by Q.
Exercise 2.16: Show that any projector P satisfies the equation P 2 = P .
An operator A is said to be normal if AA† = A† A. Clearly, an operator which
is Hermitian is also normal. There is a remarkable representation theorem for normal
operators known as the spectral decomposition, which states that an operator is a normal
operator if and only if it is diagonalizable. This result is proved in Box 2.2 on page 72,
which you should read closely.
Exercise 2.17: Show that a normal matrix is Hermitian if and only if it has real
eigenvalues.
A matrix U is said to be unitary if U † U = I. Similarly an operator U is unitary if
U †U = I. It is easily checked that an operator is unitary if and only if each of its matrix
representations is unitary. A unitary operator also satisfies U U † = I, and therefore U is
normal and has a spectral decomposition. Geometrically, unitary operators are important
because they preserve inner products between vectors. To see this, let |v and |w be any
71
Linear algebra
two vectors. Then the inner product of U |v and U |w is the same as the inner product
of |v and |w,
U |v, U |w = v|U † U |w = v|I|w = v|w.
(2.36)
This result suggests the following elegant outer product representation of any unitary U .
Let |vi be any orthonormal basis set. Define |wi ≡ U |vi , so |wi is also an orthonormal
basis set, since unitary operators preserve inner products. Note that U = i |wi vi |.
Conversely, if |vi and |wi are any two orthonormal bases, then it is easily checked that
the operator U defined by U ≡ i |wi vi | is a unitary operator.
Exercise 2.18: Show that all eigenvalues of a unitary matrix have modulus 1, that is,
can be written in the form eiθ for some real θ.
Exercise 2.19: (Pauli matrices: Hermitian and unitary) Show that the Pauli
matrices are Hermitian and unitary.
Exercise 2.20: (Basis changes) Suppose A′ and A′′ are matrix representations of an
operator A on a vector space V with respect to two different orthonormal bases,
|vi and |wi . Then the elements of A′ and A′′ are A′ij = vi |A|vj and
A′′ij = wi |A|wj . Characterize the relationship between A′ and A′′ .
A special subclass of Hermitian operators is extremely important. This is the positive
operators. A positive operator A is defined to be an operator such that for any vector |v,
(|v, A|v) is a real, non-negative number. If (|v, A|v) is strictly greater than zero for
all |v = 0 then we say that A is positive definite. In Exercise 2.24 on this page you will
show that any positive operator is automatically Hermitian, and therefore by the spectral
decomposition has diagonal representation i λi |ii|, with non-negative eigenvalues λi .
Exercise 2.21: Repeat the proof of the spectral decomposition in Box 2.2 for the case
when M is Hermitian, simplifying the proof wherever possible.
Exercise 2.22: Prove that two eigenvectors of a Hermitian operator with different
eigenvalues are necessarily orthogonal.
Exercise 2.23: Show that the eigenvalues of a projector P are all either 0 or 1.
Exercise 2.24: (Hermiticity of positive operators) Show that a positive operator
is necessarily Hermitian. (Hint: Show that an arbitrary operator A can be
written A = B + iC where B and C are Hermitian.)
Exercise 2.25: Show that for any operator A, A† A is positive.
2.1.7 Tensor products
The tensor product is a way of putting vector spaces together to form larger vector spaces.
This construction is crucial to understanding the quantum mechanics of multiparticle
systems. The following discussion is a little abstract, and may be difficult to follow if
you’re not already familiar with the tensor product, so feel free to skip ahead now and
revisit later when you come to the discussion of tensor products in quantum mechanics.
Suppose V and W are vector spaces of dimension m and n respectively. For convenience we also suppose that V and W are Hilbert spaces. Then V ⊗ W (read ‘V tensor
72
Introduction to quantum mechanics
Box 2.2: The spectral decomposition – important!
The spectral decomposition is an extremely useful representation theorem for normal operators.
Theorem 2.1: (Spectral decomposition) Any normal operator M on a vector
space V is diagonal with respect to some orthonormal basis for V .
Conversely, any diagonalizable operator is normal.
Proof
The converse is a simple exercise, so we prove merely the forward implication,
by induction on the dimension d of V . The case d = 1 is trivial. Let λ be an
eigenvalue of M , P the projector onto the λ eigenspace, and Q the projector onto
the orthogonal complement. Then M = (P + Q)M (P + Q) = P M P + QM P +
P M Q + QM Q. Obviously P M P = λP . Furthermore, QM P = 0, as M takes
the subspace P into itself. We claim that P M Q = 0 also. To see this, let |v
be an element of the subspace P . Then M M †|v = M †M |v = λM † |v. Thus,
M †|v has eigenvalue λ and therefore is an element of the subspace P . It follows
that QM † P = 0. Taking the adjoint of this equation gives P M Q = 0. Thus
M = P M P + QM Q. Next, we prove that QM Q is normal. To see this, note that
QM = QM (P + Q) = QM Q, and QM † = QM † (P + Q) = QM †Q. Therefore,
by the normality of M , and the observation that Q2 = Q,
QM Q QM †Q = QM QM †Q
†
= QM M Q
(2.37)
(2.38)
†
(2.39)
†
= QM QM Q
(2.40)
†
(2.41)
= QM M Q
= QM Q QM Q ,
so QM Q is normal. By induction, QM Q is diagonal with respect to some orthonormal basis for the subspace Q, and P M P is already diagonal with respect
to some orthonormal basis for P . It follows that M = P M P + QM Q is diagonal
with respect to some orthonormal basis for the total vector space.
In terms of the outer product representation, this means that M can be written as
M = i λi |ii|, where λi are the eigenvalues of M , |i is an orthonormal basis
for V , and each |i an eigenvector of M with eigenvalue λi . In terms of projectors,
M = i λi Pi , where λi are again the eigenvalues of M , and Pi is the projector
onto the λi eigenspace of M . These projectors satisfy the completeness relation
i Pi = I, and the orthonormality relation Pi Pj = δij Pi .
W ’) is an mn dimensional vector space. The elements of V ⊗ W are linear combinations
of ‘tensor products’ |v ⊗ |w of elements |v of V and |w of W . In particular, if |i and
|j are orthonormal bases for the spaces V and W then |i ⊗ |j is a basis for V ⊗ W . We
often use the abbreviated notations |v|w, |v, w or even |vw for the tensor product
Linear algebra
73
|v ⊗ |w. For example, if V is a two-dimensional vector space with basis vectors |0 and
|1 then |0 ⊗ |0 + |1 ⊗ |1 is an element of V ⊗ V .
By definition the tensor product satisfies the following basic properties:
(1) For an arbitrary scalar z and elements |v of V and |w of W ,
z |v ⊗ |w = z|v ⊗ |w = |v ⊗ z|w .
(2.42)
(2) For arbitrary |v1 and |v2 in V and |w in W ,
|v1 + |v2 ⊗ |w = |v1 ⊗ |w + |v2 ⊗ |w.
(2.43)
(3) For arbitrary |v in V and |w1 and |w2 in W ,
|v ⊗ |w1 + |w2 = |v ⊗ |w1 + |v ⊗ |w2 .
(2.44)
What sorts of linear operators act on the space V ⊗ W ? Suppose |v and |w are
vectors in V and W , and A and B are linear operators on V and W , respectively. Then
we can define a linear operator A ⊗ B on V ⊗ W by the equation
(A ⊗ B)(|v ⊗ |w) ≡ A|v ⊗ B|w.
(2.45)
The definition of A ⊗ B is then extended to all elements of V ⊗ W in the natural way
to ensure linearity of A ⊗ B, that is,
(A ⊗ B)
i
ai |vi ⊗ |wi
≡
i
ai A|vi ⊗ B|wi .
(2.46)
It can be shown that A ⊗ B defined in this way is a well-defined linear operator on
V ⊗ W . This notion of the tensor product of two operators extends in the obvious way
to the case where A : V → V ′ and B : W → W ′ map between different vector spaces.
Indeed, an arbitrary linear operator C mapping V ⊗ W to V ′ ⊗ W ′ can be represented
as a linear combination of tensor products of operators mapping V to V ′ and W to W ′ ,
C=
i
where by definition
i
ci Ai ⊗ Bi ,
ci Ai ⊗ Bi |v ⊗ |w ≡
i
(2.47)
ci Ai |v ⊗ Bi |w.
(2.48)
The inner products on the spaces V and W can be used to define a natural inner
product on V ⊗ W . Define
⎛
⎞
⎝
i
ai |vi ⊗ |wi ,
j
bj |vj′ ⊗ |wj′ ⎠ ≡
ij
a∗i bj vi |vj′ wi |wj′ .
(2.49)
It can be shown that the function so defined is a well-defined inner product. From this
inner product, the inner product space V ⊗W inherits the other structure we are familiar
with, such as notions of an adjoint, unitarity, normality, and Hermiticity.
All this discussion is rather abstract. It can be made much more concrete by moving
74
Introduction to quantum mechanics
to a convenient matrix representation known as the Kronecker product. Suppose A is
an m by n matrix, and B is a p by q matrix. Then we have the matrix representation:
nq
⎡
A11 B
⎢ A21 B
⎢
A⊗B ≡ ⎢
..
⎣
.
Am1 B
A12 B
A22 B
..
.
!"
Am2 B
#
⎤⎫
. . . A1n B ⎪
⎪
⎪
⎬
. . . A2n B ⎥
⎥
⎥ mp .
..
..
⎪
⎪
.
.
⎪
⎭
. . . Amn B
(2.50)
In this representation terms like A11 B denote p by q submatrices whose entries are
proportional to B, with overall proportionality constant A11 . For example, the tensor
product of the vectors (1, 2) and (2, 3) is the vector
⎤ ⎡
⎤
⎡
1×2
2
⎢ 1×3 ⎥ ⎢ 3 ⎥
2
1
⎥ ⎢
⎥
=⎢
⊗
(2.51)
⎣ 2×2 =⎣ 4 .
3
2
2×3
6
The tensor product of the Pauli matrices X and Y is
⎡
0
⎢ 0
0·Y 1·Y
=⎢
X ⊗Y =
⎣ 0
1·Y 0·Y
i
⎤
0 0 −i
0 i 0 ⎥
⎥.
−i 0 0
0 0 0
(2.52)
Finally, we mention the useful notation |ψ⊗k , which means |ψ tensored with itself k
times. For example |ψ⊗2 = |ψ ⊗ |ψ. An analogous notation is also used for operators
on tensor product spaces.
√
Exercise 2.26: Let |ψ = (|0 + |1)/ 2. Write out |ψ⊗2 and |ψ⊗3 explicitly, both
in terms of tensor products like |0|1, and using the Kronecker product.
Exercise 2.27: Calculate the matrix representation of the tensor products of the Pauli
operators (a) X and Z; (b) I and X; (c) X and I. Is the tensor product
commutative?
Exercise 2.28: Show that the transpose, complex conjugation, and adjoint operations
distribute over the tensor product,
(A ⊗ B)∗ = A∗ ⊗ B ∗ ; (A ⊗ B)T = AT ⊗ B T ; (A ⊗ B)† = A† ⊗ B † .(2.53)
Exercise 2.29: Show that the tensor product of two unitary operators is unitary.
Exercise 2.30: Show that the tensor product of two Hermitian operators is Hermitian.
Exercise 2.31: Show that the tensor product of two positive operators is positive.
Exercise 2.32: Show that the tensor product of two projectors is a projector.
Exercise 2.33: The Hadamard operator on one qubit may be written as
1
H = √ (|0 + |1)0| + (|0 − |1)1| .
2
(2.54)
Linear algebra
75
Show explicitly that the Hadamard transform on n qubits, H ⊗n , may be written
as
1
H ⊗n = √
2n
x,y
(−1)x·y |xy|.
(2.55)
Write out an explicit matrix representation for H ⊗2 .
2.1.8 Operator functions
There are many important functions which can be defined for operators and matrices. Generally speaking, given a function f from the complex numbers to the complex numbers, it is possible to define a corresponding matrix function on normal matrices (or some subclass, such as the Hermitian matrices) by the following construc
tion. Let A = a a|aa| be a spectral decomposition for a normal operator A. Define
f (A) ≡ a f (a)|aa|. A little thought shows that f (A) is uniquely defined. This procedure can be used, for example, to define the square root of a positive operator, the
logarithm of a positive-definite operator, or the exponential of a normal operator. As an
example,
θ
e
0
,
(2.56)
exp(θZ) =
0 e−θ
since Z has eigenvectors |0 and |1.
Exercise 2.34: Find the square root and logarithm of the matrix
4 3
.
3 4
(2.57)
Exercise 2.35: (Exponential of the Pauli matrices) Let v be any real,
three-dimensional unit vector and θ a real number. Prove that
where v · σ ≡
3
i=1
exp(iθv · σ) = cos(θ)I + i sin(θ)v · σ,
(2.58)
vi σi . This exercise is generalized in Problem 2.1 on page 117.
Another important matrix function is the trace of a matrix. The trace of A is defined
to be the sum of its diagonal elements,
tr(A) ≡
Aii .
(2.59)
i
The trace is easily seen to be cyclic, tr(AB) = tr(BA), and linear, tr(A + B) =
tr(A) + tr(B), tr(zA) = z tr(A), where A and B are arbitrary matrices, and z is a complex
number. Furthermore, from the cyclic property it follows that the trace of a matrix
is invariant under the unitary similarity transformation A → U AU †, as tr(U AU †) =
tr(U † U A) = tr(A). In light of this result, it makes sense to define the trace of an operator
A to be the trace of any matrix representation of A. The invariance of the trace under
unitary similarity transformations ensures that the trace of an operator is well defined.
As an example of the trace, suppose |ψ is a unit vector and A is an arbitrary operator. To evaluate tr(A|ψψ|) use the Gram–Schmidt procedure to extend |ψ to an
76
Introduction to quantum mechanics
orthonormal basis |i which includes |ψ as the first element. Then we have
tr(A|ψψ|) =
i
i|A|ψψ|i
= ψ|A|ψ.
(2.60)
(2.61)
This result, that tr(A|ψψ|) = ψ|A|ψ is extremely useful in evaluating the trace of an
operator.
Exercise 2.36: Show that the Pauli matrices except for I have trace zero.
Exercise 2.37: (Cyclic property of the trace) If A and B are two linear operators
show that
tr(AB) = tr(BA).
(2.62)
Exercise 2.38: (Linearity of the trace) If A and B are two linear operators, show
that
tr(A + B) = tr(A) + tr(B)
(2.63)
and if z is an arbitrary complex number show that
tr(zA) = ztr(A).
(2.64)
Exercise 2.39: (The Hilbert–Schmidt inner product on operators) The set LV
of linear operators on a Hilbert space V is obviously a vector space – the sum of
two linear operators is a linear operator, zA is a linear operator if A is a linear
operator and z is a complex number, and there is a zero element 0. An important
additional result is that the vector space LV can be given a natural inner product
structure, turning it into a Hilbert space.
(1) Show that the function (·, ·) on LV × LV defined by
(A, B) ≡ tr(A† B)
(2.65)
is an inner product function. This inner product is known as the
Hilbert–Schmidt or trace inner product.
(2) If V has d dimensions show that LV has dimension d2 .
(3) Find an orthonormal basis of Hermitian matrices for the Hilbert space LV .
2.1.9 The commutator and anti-commutator
The commutator between two operators A and B is defined to be
[A, B] ≡ AB − BA.
(2.66)
If [A, B] = 0, that is, AB = BA, then we say A commutes with B. Similarly, the
anti-commutator of two operators A and B is defined by
{A, B} ≡ AB + BA;
(2.67)
we say A anti-commutes with B if {A, B} = 0. It turns out that many important properties of pairs of operators can be deduced from their commutator and anti-commutator.
Perhaps the most useful relation is the following connection between the commutator and
the property of being able to simultaneously diagonalize Hermitian operators A and B,
77
Linear algebra
that is, write A = i ai |ii|, B = i bi |ii|, where |i is some common orthonormal
set of eigenvectors for A and B.
Theorem 2.2: (Simultaneous diagonalization theorem) Suppose A and B are
Hermitian operators. Then [A, B] = 0 if and only if there exists an orthonormal
basis such that both A and B are diagonal with respect to that basis. We say that
A and B are simultaneously diagonalizable in this case.
This result connects the commutator of two operators, which is often easy to compute,
to the property of being simultaneously diagonalizable, which is a priori rather difficult
to determine. As an example, consider that
0 1
0 −i
0 −i
0 1
(2.68)
−
[X, Y ] =
1 0
i
0
i
0
1 0
1
0
= 2i
(2.69)
0 −1
= 2iZ ,
(2.70)
so X and Y do not commute. You have already shown, in Exercise 2.11, that X and Y
do not have common eigenvectors, as we expect from the simultaneous diagonalization
theorem.
Proof
You can (and should!) easily verify that if A and B are diagonal in the same orthonormal
basis then [A, B] = 0. To show the converse, let |a, j be an orthonormal basis for the
eigenspace Va of A with eigenvalue a; the index j is used to label possible degeneracies.
Note that
AB|a, j = BA|a, j = aB|a, j,
(2.71)
and therefore B|a, j is an element of the eigenspace Va . Let Pa denote the projector
onto the space Va and define Ba ≡ Pa BPa . It is easy to see that the restriction of Ba to
the space Va is Hermitian on Va , and therefore has a spectral decomposition in terms of
an orthonormal set of eigenvectors which span the space Va . Let’s call these eigenvectors
|a, b, k, where the indices a and b label the eigenvalues of A and Ba , and k is an extra
index to allow for the possibility of a degenerate Ba . Note that B|a, b, k is an element
of Va , so B|a, b, k = Pa B|a, b, k. Moreover we have Pa |a, b, k = |a, b, k, so
B|a, b, k = Pa BPa |a, b, k = b|a, b, k.
(2.72)
It follows that |a, b, k is an eigenvector of B with eigenvalue b, and therefore |a, b, k is
an orthonormal set of eigenvectors of both A and B, spanning the entire vector space on
which A and B are defined. That is, A and B are simultaneously diagonalizable.
Exercise 2.40: (Commutation relations for the Pauli matrices) Verify the
commutation relations
[X, Y ] = 2iZ; [Y, Z] = 2iX; [Z, X] = 2iY.
(2.73)
There is an elegant way of writing this using ǫjkl , the antisymmetric tensor on
78
Introduction to quantum mechanics
three indices, for which ǫjkl = 0 except for ǫ123 = ǫ231 = ǫ312 = 1, and
ǫ321 = ǫ213 = ǫ132 = −1:
3
[σj , σk ] = 2i
ǫjkl σl .
(2.74)
l=1
Exercise 2.41: (Anti-commutation relations for the Pauli matrices) Verify the
anti-commutation relations
{σi , σj } = 0
(2.75)
where i = j are both chosen from the set 1, 2, 3. Also verify that (i = 0, 1, 2, 3)
σi2 = I.
(2.76)
Exercise 2.42: Verify that
AB =
[A, B] + {A, B}
.
2
(2.77)
Exercise 2.43: Show that for j, k = 1, 2, 3,
3
σj σk = δjk I + i
ǫjkl σl .
(2.78)
l=1
Exercise 2.44: Suppose [A, B] = 0, {A, B} = 0, and A is invertible. Show that B
must be 0.
Exercise 2.45: Show that [A, B]† = [B † , A† ].
Exercise 2.46: Show that [A, B] = −[B, A].
Exercise 2.47: Suppose A and B are Hermitian. Show that i[A, B] is Hermitian.
2.1.10 The polar and singular value decompositions
The polar and singular value decompositions are useful ways of breaking linear operators
up into simpler parts. In particular, these decompositions allow us to break general linear
operators up into products of unitary operators and positive operators. While we don’t
understand the structure of general linear operators terribly well, we do understand
unitary operators and positive operators in quite some detail. The polar and singular
value decompositions allow us to apply this understanding to better understand general
linear operators.
Theorem 2.3: (Polar decomposition) Let A be a linear operator on a vector space V .
Then there exists unitary U and positive operators J and K such that
A = U J = KU,
(2.79)
where the unique
√ J and K satisfying these equations are
√positive operators
defined by J ≡ A† A and K ≡ AA† . Moreover, if A is invertible then U is
unique.
79
Linear algebra
We call the expression A = U J the left polar decomposition of A, and A = KU the
right polar decomposition of A. Most often, we’ll omit the ‘right’ or ‘left’ nomenclature,
and use the term ‘polar decomposition’ for both expressions, with context indicating
which is meant.
Proof √
J ≡ A† A is a positive operator, so it can be given a spectral decomposition, J =
2
i λi |ii| (λi ≥ 0). Define |ψi ≡ A|i. From the definition, we see that ψi |ψi = λi .
Consider for now only those i for which λi = 0. For those i define |ei ≡ |ψi /λi , so
the |ei are normalized. Moreover, they are orthogonal, since if i = j then ei |ej =
i|A† A|j/λi λj = i|J 2 |j/λi λj = 0.
We have been considering i such that λi = 0. Now use the Gram–Schmidt procedure
to extend the orthonormal set |ei so it forms an orthonormal basis, which we also label
|ei . Define a unitary operator U ≡ i |ei i|. When λi = 0 we have U J|i = λi |ei =
|ψi = A|i. When λi = 0 we have U J|i = 0 = |ψi . We have proved that the action of
A and U J agree on the basis |i, and thus that A = U J.
J is unique, since multiplying A = U J on√the left by the adjoint equation A† = JU †
gives J 2 = A† A, from which we see that J = A† A, uniquely. A little thought shows that
if A is invertible, then so is J, so U is uniquely determined by the equation U = AJ −1 .
The proof of the right polar decomposition follows, since A = U J = U JU †U = KU ,
where√K ≡ U JU † is a positive operator. Since AA† = KU U †K = K 2 we must have
K = AA† , as claimed.
The singular value decomposition combines the polar decomposition and the spectral
theorem.
Corollary 2.4: (Singular value decomposition) Let A be a square matrix. Then
there exist unitary matrices U and V , and a diagonal matrix D with
non-negative entries such that
A = U DV .
(2.80)
The diagonal elements of D are called the singular values of A.
Proof
By the polar decomposition, A = SJ, for unitary S, and positive J. By the spectral
theorem, J = T DT † , for unitary T and diagonal D with non-negative entries. Setting
U ≡ ST and V ≡ T † completes the proof.
Exercise 2.48: What is the polar decomposition of a positive matrix P ? Of a unitary
matrix U ? Of a Hermitian matrix, H?
Exercise 2.49: Express the polar decomposition of a normal matrix in the outer
product representation.
Exercise 2.50: Find the left and right polar decompositions of the matrix
1 0
.
1 1
(2.81)
80
Introduction to quantum mechanics
2.2 The postulates of quantum mechanics
All understanding begins with our not accepting the world as it appears.
– Alan Kay
The most incomprehensible thing about the world is that it is comprehensible.
– Albert Einstein
Quantum mechanics is a mathematical framework for the development of physical theories. On its own quantum mechanics doesn’t tell you what laws a physical system must
obey, but it does provide a mathematical and conceptual framework for the development
of such laws. In the next few sections we give a complete description of the basic postulates of quantum mechanics. These postulates provide a connection between the physical
world and the mathematical formalism of quantum mechanics.
The postulates of quantum mechanics were derived after a long process of trial and
(mostly) error, which involved a considerable amount of guessing and fumbling by the
originators of the theory. Don’t be surprised if the motivation for the postulates is not
always clear; even to experts the basic postulates of quantum mechanics appear surprising.
What you should expect to gain in the next few sections is a good working grasp of the
postulates – how to apply them, and when.
2.2.1 State space
The first postulate of quantum mechanics sets up the arena in which quantum mechanics
takes place. The arena is our familiar friend from linear algebra, Hilbert space.
Postulate 1: Associated to any isolated physical system is a complex vector space
with inner product (that is, a Hilbert space) known as the state space of the
system. The system is completely described by its state vector, which is a unit
vector in the system’s state space.
Quantum mechanics does not tell us, for a given physical system, what the state space
of that system is, nor does it tell us what the state vector of the system is. Figuring that
out for a specific system is a difficult problem for which physicists have developed many
intricate and beautiful rules. For example, there is the wonderful theory of quantum
electrodynamics (often known as QED), which describes how atoms and light interact.
One aspect of QED is that it tells us what state spaces to use to give quantum descriptions
of atoms and light. We won’t be much concerned with the intricacies of theories like QED
(except in so far as they apply to physical realizations, in Chapter 7), as we are mostly
interested in the general framework provided by quantum mechanics. For our purposes
it will be sufficient to make some very simple (and reasonable) assumptions about the
state spaces of the systems we are interested in, and stick with those assumptions.
The simplest quantum mechanical system, and the system which we will be most
concerned with, is the qubit. A qubit has a two-dimensional state space. Suppose |0 and
|1 form an orthonormal basis for that state space. Then an arbitrary state vector in the
state space can be written
|ψ = a|0 + b|1,
(2.82)
The postulates of quantum mechanics
81
where a and b are complex numbers. The condition that |ψ be a unit vector, ψ|ψ = 1,
is therefore equivalent to |a|2 + |b|2 = 1. The condition ψ|ψ = 1 is often known as the
normalization condition for state vectors.
We will take the qubit as our fundamental quantum mechanical system. Later, in
Chapter 7, we will see that there are real physical systems which may be described in
terms of qubits. For now, though, it is sufficient to think of qubits in abstract terms,
without reference to a specific realization. Our discussions of qubits will always be referred
to some orthonormal set of basis vectors, |0 and |1, which should be thought of as being
fixed in advance. Intuitively, the states |0 and |1 are analogous to the two values 0 and
1 which a bit may take. The way a qubit differs from a bit is that superpositions of these
two states, of the form a|0 + b|1, can also exist, in which it is not possible to say that
the qubit is definitely in the state |0, or definitely in the state |1.
We conclude with some useful terminology which is often used in connection with
the description of quantum states. We say that any linear combination i αi |ψi is a
superposition of the states |ψi with amplitude αi for the state |ψi . So, for example,
the state
|0 − |1
√
2
(2.83)
√
is a superposition
√ of the states |0 and |1 with amplitude 1/ 2 for the state |0, and
amplitude −1/ 2 for the state |1.
2.2.2 Evolution
How does the state, |ψ, of a quantum mechanical system change with time? The following
postulate gives a prescription for the description of such state changes.
Postulate 2: The evolution of a closed quantum system is described by a unitary
transformation. That is, the state |ψ of the system at time t1 is related to the
state |ψ ′ of the system at time t2 by a unitary operator U which depends only on
the times t1 and t2 ,
|ψ ′ = U |ψ .
(2.84)
Just as quantum mechanics does not tell us the state space or quantum state of a
particular quantum system, it does not tell us which unitary operators U describe realworld quantum dynamics. Quantum mechanics merely assures us that the evolution of
any closed quantum system may be described in such a way. An obvious question to ask
is: what unitary operators are natural to consider? In the case of single qubits, it turns
out that any unitary operator at all can be realized in realistic systems.
Let’s look at a few examples of unitary operators on a single qubit which are important in quantum computation and quantum information. We have already seen several
examples of such unitary operators – the Pauli matrices, defined in Section 2.1.3, and
the quantum gates described in Chapter 1. As remarked in Section 1.3.1, the X matrix is
gate, by analogy to the classical
gate. The X and
often known as the quantum
Z Pauli matrices are also sometimes referred to as the bit flip and phase flip matrices: the
X matrix takes |0 to |1, and |1 to |0, thus earning the name bit flip; and the Z matrix
leaves |0 invariant, and takes |1 to −|1, with the extra factor of −1 added known as a
phase factor, thus justifying the term phase flip. We will not use the term phase flip for
82
Introduction to quantum mechanics
Z very often, since it is easily confused with the phase gate to be defined in Chapter 4.
(Section 2.2.7 contains more discussion of the many uses of the term ‘phase’.)
Another interesting unitary operator
√ is the Hadamard gate,
√ which we denote H. This
has the action H|0 ≡ (|0 + |1)/ 2, H|1 ≡ (|0 − |1)/ 2, and corresponding matrix
representation
1
1 1
.
(2.85)
H=√
2 1 −1
Exercise 2.51: Verify that the Hadamard gate H is unitary.
Exercise 2.52: Verify that H 2 = I.
Exercise 2.53: What are the eigenvalues and eigenvectors of H?
Postulate 2 requires that the system being described be closed. That is, it is not
interacting in any way with other systems. In reality, of course, all systems (except the
Universe as a whole) interact at least somewhat with other systems. Nevertheless, there
are interesting systems which can be described to a good approximation as being closed,
and which are described by unitary evolution to some good approximation. Furthermore,
at least in principle every open system can be described as part of a larger closed system
(the Universe) which is undergoing unitary evolution. Later, we’ll introduce more tools
which allow us to describe systems which are not closed, but for now we’ll continue with
the description of the evolution of closed systems.
Postulate 2 describes how the quantum states of a closed quantum system at two
different times are related. A more refined version of this postulate can be given which
describes the evolution of a quantum system in continuous time. From this more refined
postulate we will recover Postulate 2. Before we state the revised postulate, it is worth
pointing out two things. First, a notational remark. The operator H appearing in the
following discussion is not the same as the Hadamard operator, which we just introduced.
Second, the following postulate makes use of the apparatus of differential equations.
Readers with little background in the study of differential equations should be reassured
that they will not be necessary for much of the book, with the exception of parts of
Chapter 7, on real physical implementations of quantum information processing.
Postulate 2′ : The time evolution of the state of a closed quantum system is
described by the Schrödinger equation,
i
d|ψ
= H|ψ.
dt
(2.86)
In this equation, is a physical constant known as Planck’s constant whose value
must be experimentally determined. The exact value is not important to us. In
practice, it is common to absorb the factor into H, effectively setting = 1. H
is a fixed Hermitian operator known as the Hamiltonian of the closed system.
If we know the Hamiltonian of a system, then (together with a knowledge of ) we
understand its dynamics completely, at least in principle. In general figuring out the
Hamiltonian needed to describe a particular physical system is a very difficult problem
– much of twentieth century physics has been concerned with this problem – which
requires substantial input from experiment in order to be answered. From our point of
83
The postulates of quantum mechanics
view this is a problem of detail to be addressed by physical theories built within the
framework of quantum mechanics – what Hamiltonian do we need to describe atoms
in such-and-such a configuration – and is not a question that needs to be addressed by
the theory of quantum mechanics itself. Most of the time in our discussion of quantum
computation and quantum information we won’t need to discuss Hamiltonians, and when
we do, we will usually just posit that some matrix is the Hamiltonian as a starting point,
and proceed from there, without attempting to justify the use of that Hamiltonian.
Because the Hamiltonian is a Hermitian operator it has a spectral decomposition
E|EE|,
H=
(2.87)
E
with eigenvalues E and corresponding normalized eigenvectors |E. The states |E are
conventionally referred to as energy eigenstates, or sometimes as stationary states, and
E is the energy of the state |E. The lowest energy is known as the ground state energy
for the system, and the corresponding energy eigenstate (or eigenspace) is known as the
ground state. The reason the states |E are sometimes known as stationary states is
because their only change in time is to acquire an overall numerical factor,
|E → exp(−iEt/)|E.
(2.88)
As an example, suppose a single qubit has Hamiltonian
H = ωX.
(2.89)
In this equation ω is a parameter that, in practice, needs to be experimentally determined.
We won’t worry about the parameter overly much here – the point is to give you a feel
for the sort of Hamiltonians that are sometimes written down in the study of quantum
computation and quantum information. The energy eigenstates √
of this Hamiltonian√are
obviously the same as the eigenstates of X, namely (|0 + |1)/ 2 and (|0 − |1)/√2,
with corresponding energies ω and −ω. The ground state is therefore (|0 − |1)/ 2,
and the ground state energy is −ω.
What is the connection between the Hamiltonian picture of dynamics, Postulate 2′ ,
and the unitary operator picture, Postulate 2? The answer is provided by writing down
the solution to Schrödinger’s equation, which is easily verified to be:
−iH(t2 − t1 )
|ψ(t1 ) = U (t1 , t2 )|ψ(t1 ) ,
(2.90)
|ψ(t2 ) = exp
where we define
U (t1 , t2 ) ≡ exp
−iH(t2 − t1 )
.
(2.91)
You will show in the exercises that this operator is unitary, and furthermore, that any
unitary operator U can be realized in the form U = exp(iK) for some Hermitian operator
K. There is therefore a one-to-one correspondence between the discrete-time description
of dynamics using unitary operators, and the continuous time description using Hamiltonians. For most of the book we use the unitary formulation of quantum dynamics.
Exercise 2.54: Suppose A and B are commuting Hermitian operators. Prove that
exp(A) exp(B) = exp(A + B). (Hint: Use the results of Section 2.1.9.)
84
Introduction to quantum mechanics
Exercise 2.55: Prove that U (t1 , t2 ) defined in Equation (2.91) is unitary.
Exercise 2.56: Use the spectral decomposition to show that K ≡ −i log(U ) is
Hermitian for any unitary U , and thus U = exp(iK) for some Hermitian K.
In quantum computation and quantum information we often speak of applying a
unitary operator to a particular quantum system. For example, in the context of quantum
circuits we may speak of applying the unitary gate X to a single qubit. Doesn’t this
contradict what we said earlier, about unitary operators describing the evolution of a
closed quantum system? After all, if we are ‘applying’ a unitary operator, then that
implies that there is an external ‘we’ who is interacting with the quantum system, and
the system is not closed.
An example of this occurs when a laser is focused on an atom. After a lot of thought
and hard work it is possible to write down a Hamiltonian describing the total atom–
laser system. The interesting thing is that when we write down the Hamiltonian for the
atom–laser system and consider the effects on the atom alone, the behavior of the state
vector of the atom turns out to be almost but not quite perfectly described by another
Hamiltonian, the atomic Hamiltonian. The atomic Hamiltonian contains terms related
to laser intensity, and other parameters of the laser, which we can vary at will. It is as if
the evolution of the atom were being described by a Hamiltonian which we can vary at
will, despite the atom not being a closed system.
More generally, for many systems like this it turns out to be possible to write down
a time-varying Hamiltonian for a quantum system, in which the Hamiltonian for the
system is not a constant, but varies according to some parameters which are under an
experimentalist’s control, and which may be changed during the course of an experiment. The system is not, therefore, closed, but it does evolve according to Schrödinger’s
equation with a time-varying Hamiltonian, to some good approximation.
The upshot is that to begin we will often describe the evolution of quantum systems –
even systems which aren’t closed – using unitary operators. The main exception to this,
quantum measurement, will be described in the next section. Later on we will investigate
in more detail possible deviations from unitary evolution due to the interaction with other
systems, and understand more precisely the dynamics of realistic quantum systems.
2.2.3 Quantum measurement
We postulated that closed quantum systems evolve according to unitary evolution. The
evolution of systems which don’t interact with the rest of the world is all very well, but
there must also be times when the experimentalist and their experimental equipment –
an external physical system in other words – observes the system to find out what is
going on inside the system, an interaction which makes the system no longer closed, and
thus not necessarily subject to unitary evolution. To explain what happens when this
is done, we introduce Postulate 3, which provides a means for describing the effects of
measurements on quantum systems.
Postulate 3: Quantum measurements are described by a collection {Mm } of
measurement operators. These are operators acting on the state space of the
system being measured. The index m refers to the measurement outcomes that
may occur in the experiment. If the state of the quantum system is |ψ
immediately before the measurement then the probability that result m occurs is
The postulates of quantum mechanics
85
given by
†
Mm |ψ ,
p(m) = ψ|Mm
(2.92)
and the state of the system after the measurement is
Mm |ψ
.
†
ψ|Mm
Mm |ψ
(2.93)
The measurement operators satisfy the completeness equation,
†
Mm
Mm = I .
(2.94)
m
The completeness equation expresses the fact that probabilities sum to one:
p(m) =
1=
m
m
†
ψ|Mm
Mm |ψ .
(2.95)
This equation being satisfied for all |ψ is equivalent to the completeness equation.
However, the completeness equation is much easier to check directly, so that’s why it
appears in the statement of the postulate.
A simple but important example of a measurement is the measurement of a qubit in
the computational basis. This is a measurement on a single qubit with two outcomes
defined by the two measurement operators M0 = |00|, M1 = |11|. Observe that
each measurement operator is Hermitian, and that M02 = M0 , M12 = M1 . Thus the
completeness relation is obeyed, I = M0†M0 + M1† M1 = M0 + M1 . Suppose the state
being measured is |ψ = a|0 + b|1. Then the probability of obtaining measurement
outcome 0 is
p(0) = ψ|M0† M0 |ψ = ψ|M0 |ψ = |a|2 .
(2.96)
Similarly, the probability of obtaining the measurement outcome 1 is p(1) = |b|2 . The
state after measurement in the two cases is therefore
M0 |ψ
a
=
|0
|a|
|a|
b
M1 |ψ
= |1.
|b|
|b|
(2.97)
(2.98)
We will see in Section 2.2.7 that multipliers like a/|a|, which have modulus one, can
effectively be ignored, so the two post-measurement states are effectively |0 and |1, just
as described in Chapter 1.
The status of Postulate 3 as a fundamental postulate intrigues many people. Measuring
devices are quantum mechanical systems, so the quantum system being measured and
the measuring device together are part of a larger, isolated, quantum mechanical system.
(It may be necessary to include quantum systems other than the system being measured
and the measuring device to obtain a completely isolated system, but the point is that
this can be done.) According to Postulate 2, the evolution of this larger isolated system
can be described by a unitary evolution. Might it be possible to derive Postulate 3 as a
consequence of this picture? Despite considerable investigation along these lines there is
still disagreement between physicists about whether or not this is possible. We, however,
are going to take the very pragmatic approach that in practice it is clear when to apply
86
Introduction to quantum mechanics
Postulate 2 and when to apply Postulate 3, and not worry about deriving one postulate
from the other.
Over the next few sections we apply Postulate 3 to several elementary but important
measurement scenarios. Section 2.2.4 examines the problem of distinguishing a set of
quantum states. Section 2.2.5 explains a special case of Postulate 3, the projective or
von Neumann measurements. Section 2.2.6 explains another special case of Postulate 3,
known as POVM measurements. Many introductions to quantum mechanics only discuss
projective measurements, omitting a full discussion of Postulate 3 or of POVM elements.
For this reason we have included Box 2.5 on page 91 which comments on the relationship
between the different classes of measurement we describe.
Exercise 2.57: (Cascaded measurements are single measurements) Suppose
{Ll } and {Mm } are two sets of measurement operators. Show that a
measurement defined by the measurement operators {Ll } followed by a
measurement defined by the measurement operators {Mm } is physically
equivalent to a single measurement defined by measurement operators {Nlm }
with the representation Nlm ≡ Mm Ll .
2.2.4 Distinguishing quantum states
An important application of Postulate 3 is to the problem of distinguishing quantum
states. In the classical world, distinct states of an object are usually distinguishable, at
least in principle. For example, we can always identify whether a coin has landed heads or
tails, at least in the ideal limit. Quantum mechanically, the situation is more complicated.
In Section 1.6 we gave a plausible argument that non-orthogonal quantum states cannot
be distinguished. With Postulate 3 as a firm foundation we can now give a much more
convincing demonstration of this fact.
Distinguishability, like many ideas in quantum computation and quantum information,
is most easily understood using the metaphor of a game involving two parties, Alice and
Bob. Alice chooses a state |ψi (1 ≤ i ≤ n) from some fixed set of states known to both
parties. She gives the state |ψi to Bob, whose task it is to identify the index i of the
state Alice has given him.
Suppose the states |ψi are orthonormal. Then Bob can do a quantum measurement
to distinguish these states, using the following procedure. Define measurement operators
Mi ≡ |ψi ψi |, one for each possible index i, and an additional measurement operator
M0 defined as the positive square root of the positive operator I − i = 0 |ψi ψi |.
These operators satisfy the completeness relation, and if the state |ψi is prepared then
p(i) = ψi |Mi |ψi = 1, so the result i occurs with certainty. Thus, it is possible to
reliably distinguish the orthonormal states |ψi .
By contrast, if the states |ψi are not orthonormal then we can prove that there is no
quantum measurement capable of distinguishing the states. The idea is that Bob will do
a measurement described by measurement operators Mj , with outcome j. Depending on
the outcome of the measurement Bob tries to guess what the index i was using some rule,
i = f (j), where f (·) represents the rule he uses to make the guess. The key to why Bob
can’t distinguish non-orthogonal states |ψ1 and |ψ2 is the observation that |ψ2 can be
decomposed into a (non-zero) component parallel to |ψ1 , and a component orthogonal
to |ψ1 . Suppose j is a measurement outcome such that f (j) = 1, that is, Bob guesses
that the state was |ψ1 when he observes j. But because of the component of |ψ2 parallel
87
The postulates of quantum mechanics
to |ψ1 , there is a non-zero probability of getting outcome j when |ψ2 is prepared, so
sometimes Bob will make an error identifying which state was prepared. A more rigorous
argument that non-orthogonal states can’t be distinguished is given in Box 2.3, but this
captures the essential idea.
Box 2.3: Proof that non-orthogonal states can’t be reliably distinguished
A proof by contradiction shows that no measurement distinguishing the nonorthogonal states |ψ1 and |ψ2 is possible. Suppose such a measurement is possible.
If the state |ψ1 (|ψ2 ) is prepared then the probability of measuring j such that
f (j) = 1 (f (j) = 2) must be 1. Defining Ei ≡ j:f (j)=i Mj†Mj , these observations
may be written as:
ψ1 |E1 |ψ1 = 1; ψ2 |E2 |ψ2 = 1.
(2.99)
Since i Ei = I it follows that i ψ1 |E√i |ψ1 = 1, and since ψ1 |E1 |ψ1 = 1
we must have ψ1 |E2 |ψ1 = 0, and thus E2 |ψ1 = 0. Suppose we decompose
+ |β|2 = 1, and |β| < 1
|ψ2 = α|ψ1 + β|ϕ, where |ϕ is orthonormal√to |ψ1 , |α|2 √
since |ψ1 and |ψ2 are not orthogonal. Then E2 |ψ2 = β E2 |ϕ, which implies
a contradiction with (2.99), as
ψ2 |E2 |ψ2 = |β|2 ϕ|E2 |ϕ ≤ |β|2 < 1,
(2.100)
where the second last inequality follows from the observation that
ϕ|E2 |ϕ ≤
i
ϕ|Ei |ϕ = ϕ|ϕ = 1.
(2.101)
2.2.5 Projective measurements
In this section we explain an important special case of the general measurement postulate,
Postulate 3. This special class of measurements is known as projective measurements.
For many applications of quantum computation and quantum information we will be
concerned primarily with projective measurements. Indeed, projective measurements actually turn out to be equivalent to the general measurement postulate, when they are
augmented with the ability to perform unitary transformations, as described in Postulate 2. We will explain this equivalence in detail in Section 2.2.8, as the statement of the
measurement postulate for projective measurements is superficially rather different from
the general postulate, Postulate 3.
Projective measurements: A projective measurement is described by an
observable, M , a Hermitian operator on the state space of the system being
observed. The observable has a spectral decomposition,
mPm ,
M=
(2.102)
m
where Pm is the projector onto the eigenspace of M with eigenvalue m. The
possible outcomes of the measurement correspond to the eigenvalues, m, of the
observable. Upon measuring the state |ψ, the probability of getting result m is
88
Introduction to quantum mechanics
given by
p(m) = ψ|Pm |ψ .
(2.103)
Given that outcome m occurred, the state of the quantum system immediately
after the measurement is
P |ψ
√m
.
p(m)
(2.104)
Projective measurements can be understood as a special case of Postulate 3. Suppose the
measurement operators in Postulate 3, in addition to satisfying the completeness relation
†
m Mm Mm = I, also satisfy the conditions that Mm are orthogonal projectors, that is,
the Mm are Hermitian, and Mm Mm′ = δm,m′ Mm . With these additional restrictions,
Postulate 3 reduces to a projective measurement as just defined.
Projective measurements have many nice properties. In particular, it is very easy to
calculate average values for projective measurements. By definition, the average (see
Appendix 1 for elementary definitions and results in probability theory) value of the
measurement is
m p(m)
(2.110)
E(M ) =
m
=
m
= ψ|
mψ|Pm |ψ
m
(2.111)
mPm |ψ
= ψ|M |ψ.
(2.112)
(2.113)
This is a useful formula, which simplifies many calculations. The average value of the
observable M is often written M ≡ ψ|M |ψ. From this formula for the average
follows a formula for the standard deviation associated to observations of M ,
[∆(M )]2 = (M − M )2
2
2
= M − M .
(2.114)
(2.115)
The standard deviation is a measure of the typical spread of the observed values upon measurement of M . In particular, if we perform a large number of experiments in which the
state |ψ is prepared and the observable M is measured, then the
standard deviation ∆(M )
of the observed values is determined by the formula ∆(M ) = M 2 − M 2 . This formulation of measurement and standard deviations in terms of observables gives rise in
an elegant way to results such as the Heisenberg uncertainty principle (see Box 2.4).
Exercise 2.58: Suppose we prepare a quantum system in an eigenstate |ψ of some
observable M , with corresponding eigenvalue m. What is the average observed
value of M , and the standard deviation?
Two widely used nomenclatures for measurements deserve emphasis. Rather than giving an observable to describe a projective measurement, often people simply list a com
plete set of orthogonal projectors Pm satisfying the relations m Pm = I and Pm Pm′ =
89
The postulates of quantum mechanics
Box 2.4: The Heisenberg uncertainty principle
Perhaps the best known result of quantum mechanics is the Heisenberg uncertainty principle. Suppose A and B are two Hermitian operators, and |ψ is a
quantum state. Suppose ψ|AB|ψ = x + iy, where x and y are real. Note that
ψ|[A, B]|ψ = 2iy and ψ|{A, B}|ψ = 2x. This implies that
2
2
2
|ψ|[A, B]|ψ| + |ψ|{A, B}|ψ| = 4 |ψ|AB|ψ| .
(2.105)
By the Cauchy–Schwarz inequality
2
|ψ|AB|ψ| ≤ ψ|A2 |ψψ|B 2 |ψ,
(2.106)
which combined with Equation (2.105) and dropping a non-negative term gives
2
|ψ|[A, B]|ψ| ≤ 4ψ|A2 |ψψ|B 2 |ψ.
(2.107)
Suppose C and D are two observables. Substituting A = C −C and B = D−D
into the last equation, we obtain Heisenberg’s uncertainty principle as it is usually
stated:
|ψ|[C, D]|ψ|
.
(2.108)
Δ(C)Δ(D) ≥
2
You should be wary of a common misconception about the uncertainty principle,
that measuring an observable C to some ‘accuracy’ Δ(C) causes the value of D to
be ‘disturbed’ by an amount Δ(D) in such a way that some sort of inequality similar
to (2.108) is satisfied. While it is true that measurements in quantum mechanics
cause disturbance to the system being measured, this is most emphatically not the
content of the uncertainty principle.
The correct interpretation of the uncertainty principle is that if we prepare a large
number of quantum systems in identical states, |ψ, and then perform measurements
of C on some of those systems, and of D in others, then the standard deviation
Δ(C) of the C results times the standard deviation Δ(D) of the results for D will
satisfy the inequality (2.108).
As an example of the uncertainty principle, consider the observables X and Y
when measured for the quantum state |0. In Equation (2.70) we showed that
[X, Y ] = 2iZ, so the uncertainty principle tells us that
Δ(X)Δ(Y ) ≥ 0|Z|0 = 1 .
(2.109)
One elementary consequence of this is that Δ(X) and Δ(Y ) must both be strictly
greater than 0, as can be verified by direct calculation.
δmm′ Pm . The corresponding observable implicit in this usage is M = m mPm . Another widely used phrase, to ‘measure in a basis |m’, where |m form an orthonormal basis, simply means to perform the projective measurement with projectors Pm = |mm|.
Let’s look at an example of projective measurements on single qubits. First is the
measurement of the observable Z. This has eigenvalues +1 and −1 with corresponding
eigenvectors√|0 and |1. Thus, for example, measurement of Z on the state |ψ =
(|0 + |1)/ 2 gives the result +1 with probability ψ|00|ψ = 1/2, and similarly the
90
Introduction to quantum mechanics
result −1 with probability 1/2. More generally, suppose v is any real three-dimensional
unit vector. Then we can define an observable:
v · σ ≡ v1 σ1 + v2 σ2 + v3 σ3 .
(2.116)
Measurement of this observable is sometimes referred to as a ‘measurement of spin along
the v axis’, for historical reasons. The following two exercises encourage you to work out
some elementary but important properties of such a measurement.
Exercise 2.59: Suppose we have qubit in the state |0, and we measure the observable
X. What is the average value of X? What is the standard deviation of X?
Exercise 2.60: Show that v · σ has eigenvalues ±1, and that the projectors onto the
corresponding eigenspaces are given by P± = (I ± v · σ)/2.
Exercise 2.61: Calculate the probability of obtaining the result +1 for a measurement
of v · σ, given that the state prior to measurement is |0. What is the state of the
system after the measurement if +1 is obtained?
2.2.6 POVM measurements
The quantum measurement postulate, Postulate 3, involves two elements. First, it gives
a rule describing the measurement statistics, that is, the respective probabilities of the
different possible measurement outcomes. Second, it gives a rule describing the postmeasurement state of the system. However, for some applications the post-measurement
state of the system is of little interest, with the main item of interest being the probabilities
of the respective measurement outcomes. This is the case, for example, in an experiment
where the system is measured only once, upon conclusion of the experiment. In such
instances there is a mathematical tool known as the POVM formalism which is especially
well adapted to the analysis of the measurements. (The acronym POVM stands for
‘Positive Operator-Valued Measure’, a technical term whose historical origins we won’t
worry about.) This formalism is a simple consequence of the general description of
measurements introduced in Postulate 3, but the theory of POVMs is so elegant and
widely used that it merits a separate discussion here.
Suppose a measurement described by measurement operators Mm is performed upon
a quantum system in the state |ψ. Then the probability of outcome m is given by
†
Mm |ψ. Suppose we define
p(m) = ψ|Mm
†
Mm .
Em ≡ Mm
(2.117)
Then from Postulate 3 and elementary linear algebra, Em is a positive operator such
that m Em = I and p(m) = ψ|Em |ψ. Thus the set of operators Em are sufficient to
determine the probabilities of the different measurement outcomes. The operators Em
are known as the POVM elements associated with the measurement. The complete set
{Em } is known as a POVM.
As an example of a POVM, consider a projective measurement described by measurement operators Pm , where the Pm are projectors such that Pm Pm′ = δmm′ Pm and
m Pm = I. In this instance (and only this instance) all the POVM elements are the
†
P m = Pm .
same as the measurement operators themselves, since Em ≡ Pm
The postulates of quantum mechanics
Box 2.5: General measurements, projective measurements, and POVMs
Most introductions to quantum mechanics describe only projective measurements,
and consequently the general description of measurements given in Postulate 3
may be unfamiliar to many physicists, as may the POVM formalism described in
Section 2.2.6. The reason most physicists don’t learn the general measurement
formalism is because most physical systems can only be measured in a very coarse
manner. In quantum computation and quantum information we aim for an exquisite
level of control over the measurements that may be done, and consequently it helps
to use a more comprehensive formalism for the description of measurements.
Of course, when the other axioms of quantum mechanics are taken into account,
projective measurements augmented by unitary operations turn out to be completely
equivalent to general measurements, as shown in Section 2.2.8. So a physicist
trained in the use of projective measurements might ask to what end we start with
the general formalism, Postulate 3? There are several reasons for doing so. First,
mathematically general measurements are in some sense simpler than projective
measurements, since they involve fewer restrictions on the measurement operators;
there is, for example, no requirement for general measurements analogous to the
condition Pi Pj = δij Pi for projective measurements. This simpler structure also
gives rise to many useful properties for general measurements that are not possessed
by projective measurements. Second, it turns out that there are important problems
in quantum computation and quantum information – such as the optimal way
to distinguish a set of quantum states – the answer to which involves a general
measurement, rather than a projective measurement.
A third reason for preferring Postulate 3 as a starting point is related to a property
of projective measurements known as repeatability. Projective measurements are
repeatable in the sense that if we perform a projective measurement once, and
obtain the outcome m, repeating the measurement gives the outcome m again and
does not change the state. To see this, suppose
|ψ
was the initial state. After the
'
(
first measurement the state is |ψm = Pm |ψ / ψ|Pm |ψ. Applying Pm to
|ψm does not change it, so we have ψm |Pm |ψm = 1, and therefore repeated
measurement gives the result m each time, without changing the state.
This repeatability of projective measurements tips us off to the fact that many
important measurements in quantum mechanics are not projective measurements.
For instance, if we use a silvered screen to measure the position of a photon we
destroy the photon in the process. This certainly makes it impossible to repeat
the measurement of the photon’s position! Many other quantum measurements
are also not repeatable in the same sense as a projective measurement. For such
measurements, the general measurement postulate, Postulate 3, must be employed.
Where do POVMs fit in this picture? POVMs are best viewed as a special case
of the general measurement formalism, providing the simplest means by which
one can study general measurement statistics, without the necessity for knowing
the post-measurement state. They are a mathematical convenience that sometimes
gives extra insight into quantum measurements.
91
92
Introduction to quantum mechanics
Exercise 2.62: Show that any measurement where the measurement operators and the
POVM elements coincide is a projective measurement.
Above we noticed that the POVM operators are positive and satisfy m Em = I.
Suppose now that {Em } is some arbitrary set of positive operators such that m Em = I.
We will show that there exists a set of measurement operators
Mm defining a measurement
√
†
Mm =
described by the POVM {Em }. Defining Mm ≡ Em we see that m Mm
m Em = I, and therefore the set {Mm } describes a measurement with POVM {Em }.
For this reason it is convenient to define a POVM to be any set of operators {Em } such
that: (a) each operator Em is positive; and (b) the completeness relation m Em = I is
obeyed, expressing the fact that probabilities sum to one. To complete the description
of POVMs, we note again that given a POVM {Em }, the probability of outcome m is
given by p(m) = ψ|Em |ψ.
We’ve looked at projective measurements as an example of the use of POVMs, but
it wasn’t very exciting since we didn’t learn much that was new. The following more
sophisticated example illustrates the use of the POVM formalism as a guide for our
intuition in quantum computation and quantum information. Suppose
√ Alice gives Bob a
qubit prepared in one of two states, |ψ1 = |0 or |ψ2 = (|0 + |1)/ 2. As explained in
Section 2.2.4 it is impossible for Bob to determine whether he has been given |ψ1 or |ψ2
with perfect reliability. However, it is possible for him to perform a measurement which
distinguishes the states some of the time, but never makes an error of mis-identification.
Consider a POVM containing three elements,
√
2
√ |11|,
(2.118)
E1 ≡
1+ 2
√
2 |0 − |1 0| − 1|
√
,
(2.119)
E2 ≡
2
1+ 2
(2.120)
E3 ≡ I − E1 − E2 .
It is straightforward to verify that these are positive operators which satisfy the com
pleteness relation m Em = I, and therefore form a legitimate POVM.
Suppose Bob is given the state |ψ1 = |0. He performs the measurement described
by the POVM {E1 , E2 , E3 }. There is zero probability that he will observe the result
E1 , since E1 has been cleverly chosen to ensure that ψ1 |E1 |ψ1 = 0. Therefore, if the
result of his measurement is E1 then Bob can safely conclude that the state he received
must have been |ψ2 . A similar line of reasoning shows that if the measurement outcome
E2 occurs then it must have been the state |ψ1 that Bob received. Some of the time,
however, Bob will obtain the measurement outcome E3 , and he can infer nothing about
the identity of the state he was given. The key point, however, is that Bob never makes a
mistake identifying the state he has been given. This infallibility comes at the price that
sometimes Bob obtains no information about the identity of the state.
This simple example demonstrates the utility of the POVM formalism as a simple
and intuitive way of gaining insight into quantum measurements in instances where
only the measurement statistics matter. In many instances later in the book we will only
be concerned with measurement statistics, and will therefore use the POVM formalism
rather than the more general formalism for measurements described in Postulate 3.
Exercise 2.63: Suppose a measurement is described by measurement operators Mm .
93
The postulates of quantum mechanics
√
Show that there exist unitary operators Um such that Mm = Um Em , where
Em is the POVM associated to the measurement.
Exercise 2.64: Suppose Bob is given a quantum state chosen from a set |ψ1 , . . . , |ψm
of linearly independent states. Construct a POVM {E1 , E2 , . . . , Em+1 } such that
if outcome Ei occurs, 1 ≤ i ≤ m, then Bob knows with certainty that he was
given the state |ψi . (The POVM must be such that ψi |Ei |ψi > 0 for each i.)
2.2.7 Phase
‘Phase’ is a commonly used term in quantum mechanics, with several different meanings dependent upon context. At this point it is convenient to review a couple of these
meanings. Consider, for example, the state eiθ |ψ, where |ψ is a state vector, and θ is a
real number. We say that the state eiθ |ψ is equal to |ψ, up to the global phase factor
eiθ . It is interesting to note that the statistics of measurement predicted for these two
states are the same. To see this, suppose Mm is a measurement operator associated to
some quantum measurement, and note that the respective probabilities for outcome m
†
†
†
Mm |ψ and ψ|e−iθ Mm
Mm eiθ |ψ = ψ|Mm
Mm |ψ. Therefore,
occurring are ψ|Mm
from an observational point of view these two states are identical. For this reason we may
ignore global phase factors as being irrelevant to the observed properties of the physical
system.
There is another kind of phase known as the relative phase, which has quite a different
meaning. Consider the states
|0 + |1
√
2
and
|0 − |1
√
.
2
(2.121)
√
2. For the second state the amplitude is
In the
first
state
the
amplitude
of
|1
is
1/
√
−1/ 2. In each case the magnitude of the amplitudes is the same, but they differ in
sign. More generally, we say that two amplitudes, a and b, differ by a relative phase if
there is a real θ such that a = exp(iθ)b. More generally still, two states are said to differ
by a relative phase in some basis if each of the amplitudes in that basis is related by such
a phase factor. For example, the two states displayed above are the same up to a relative
phase shift because the |0 amplitudes are identical (a relative phase factor of 1), and
the |1 amplitudes differ only by a relative phase factor of −1. The difference between
relative phase factors and global phase factors is that for relative phase the phase factors
may vary from amplitude to amplitude. This makes the relative phase a basis-dependent
concept unlike global phase. As a result, states which differ only by relative phases in
some basis give rise to physically observable differences in measurement statistics, and
it is not possible to regard these states as physically equivalent, as we do with states
differing by a global phase factor
√
√
Exercise 2.65: Express the states (|0 + |1)/ 2 and (|0 − |1)/ 2 in a basis in
which they are not the same up to a relative phase shift.
2.2.8 Composite systems
Suppose we are interested in a composite quantum system made up of two (or more)
distinct physical systems. How should we describe states of the composite system? The
following postulate describes how the state space of a composite system is built up from
the state spaces of the component systems.
94
Introduction to quantum mechanics
Postulate 4: The state space of a composite physical system is the tensor product
of the state spaces of the component physical systems. Moreover, if we have
systems numbered 1 through n, and system number i is prepared in the state
|ψi , then the joint state of the total system is |ψ1 ⊗ |ψ2 ⊗ · · · ⊗ |ψn .
Why is the tensor product the mathematical structure used to describe the state space of
a composite physical system? At one level, we can simply accept it as a basic postulate, not
reducible to something more elementary, and move on. After all, we certainly expect that
there be some canonical way of describing composite systems in quantum mechanics.
Is there some other way we can arrive at this postulate? Here is one heuristic that is
sometimes used. Physicists sometimes like to speak of the superposition principle of
quantum mechanics, which states that if |x and |y are two states of a quantum system,
then any superposition α|x + β|y should also be an allowed state of a quantum system,
where |α|2 + |β|2 = 1. For composite systems, it seems natural that if |A is a state of
system A, and |B is a state of system B, then there should be some corresponding state,
which we might denote |A|B, of the joint system AB. Applying the superposition
principle to product states of this form, we arrive at the tensor product postulate given
above. This is not a derivation, since we are not taking the superposition principle as a
fundamental part of our description of quantum mechanics, but it gives you the flavor of
the various ways in which these ideas are sometimes reformulated.
A variety of different notations for composite systems appear in the literature. Part of
the reason for this proliferation is that different notations are better adapted for different
applications, and we will also find it convenient to introduce some specialized notations
on occasion. At this point it suffices to mention a useful subscript notation to denote
states and operators on different systems, when it is not clear from context. For example,
in a system containing three qubits, X2 is the Pauli σx operator acting on the second
qubit.
Exercise 2.66: Show that the average value of the
√ observable X1 Z2 for a two qubit
system measured in the state (|00 + |11)/ 2 is zero.
In Section 2.2.5 we claimed that projective measurements together with unitary dynamics are sufficient to implement a general measurement. The proof of this statement
makes use of composite quantum systems, and is a nice illustration of Postulate 4 in
action. Suppose we have a quantum system with state space Q, and we want to perform a measurement described by measurement operators Mm on the system Q. To do
this, we introduce an ancilla system, with state space M , having an orthonormal basis
|m in one-to-one correspondence with the possible outcomes of the measurement we
wish to implement. This ancilla system can be regarded as merely a mathematical device
appearing in the construction, or it can be interpreted physically as an extra quantum
system introduced into the problem, which we assume has a state space with the required
properties.
Letting |0 be any fixed state of M , define an operator U on products |ψ|0 of states
|ψ from Q with the state |0 by
U |ψ|0 ≡
m
Mm |ψ|m.
Using the orthonormality of the states |m and the completeness relation
(2.122)
m
†
Mm
Mm =
The postulates of quantum mechanics
95
I, we can see that U preserves inner products between states of the form |ψ|0,
ϕ|0|U †U |ψ|0 =
m,m′
=
m
†
ϕ|Mm
Mm′ |ψ m|m′
(2.123)
†
ϕ|Mm
Mm |ψ
(2.124)
= ϕ|ψ.
(2.125)
By the results of Exercise 2.67 it follows that U can be extended to a unitary operator on
the space Q ⊗ M , which we also denote by U .
Exercise 2.67: Suppose V is a Hilbert space with a subspace W . Suppose
U : W → V is a linear operator which preserves inner products, that is, for any
|w1 and |w2 in W ,
w1 |U † U |w2 = w1 |w2 .
(2.126)
Prove that there exists a unitary operator U ′ : V → V which extends U . That is,
U ′ |w = U |w for all |w in W , but U ′ is defined on the entire space V . Usually
we omit the prime symbol ′ and just write U to denote the extension.
Next, suppose we perform a projective measurement on the two systems described by
projectors Pm ≡ IQ ⊗ |mm|. Outcome m occurs with probability
p(m) = ψ|0|U †Pm U |ψ|0
=
m′ ,m′′
†
′
ψ|Mm
′ m |(IQ
(2.127)
′′
⊗ |mm|)Mm′′ |ψ|m
†
Mm |ψ,
= ψ|Mm
(2.128)
(2.129)
just as given in Postulate 3. The joint state of the system QM after measurement,
conditional on result m occurring, is given by
Pm U |ψ|0
Mm |ψ|m
.
=
†
†
ψ|U Pm U |ψ
ψ|Mm
Mm |ψ
(2.130)
It follows that the state of system M after the measurement is |m, and the state of
system Q is
Mm |ψ
,
†
ψ|Mm
Mm |ψ
(2.131)
just as prescribed by Postulate 3. Thus unitary dynamics, projective measurements, and
the ability to introduce ancillary systems, together allow any measurement of the form
described in Postulate 3 to be realized.
Postulate 4 also enables us to define one of the most interesting and puzzling ideas
associated with composite quantum systems – entanglement. Consider the two qubit state
|ψ =
|00 + |11
√
.
2
(2.132)
This state has the remarkable property that there are no single qubit states |a and |b
such that |ψ = |a|b, a fact which you should now convince yourself of:
96
Introduction to quantum mechanics
Exercise 2.68: Prove that |ψ = |a|b for all single qubit states |a and |b.
We say that a state of a composite system having this property (that it can’t be written
as a product of states of its component systems) is an entangled state. For reasons which
nobody fully understands, entangled states play a crucial role in quantum computation
and quantum information, and arise repeatedly through the remainder of this book. We
have already seen entanglement play a crucial role in quantum teleportation, as described
in Section 1.3.7. In this chapter we give two examples of the strange effects enabled by
entangled quantum states, superdense coding (Section 2.3), and the violation of Bell’s
inequality (Section 2.6).
2.2.9 Quantum mechanics: a global view
We have now explained all the fundamental postulates of quantum mechanics. Most of
the rest of the book is taken up with deriving consequences of these postulates. Let’s
quickly review the postulates and try to place them in some kind of global perspective.
Postulate 1 sets the arena for quantum mechanics, by specifying how the state of an
isolated quantum system is to be described. Postulate 2 tells us that the dynamics of
closed quantum systems are described by the Schrödinger equation, and thus by unitary
evolution. Postulate 3 tells us how to extract information from our quantum systems by
giving a prescription for the description of measurement. Postulate 4 tells us how the
state spaces of different quantum systems may be combined to give a description of the
composite system.
What’s odd about quantum mechanics, at least by our classical lights, is that we can’t
directly observe the state vector. It’s a little bit like a game of chess where you can
never find out exactly where each piece is, but only know the rank of the board they
are on. Classical physics – and our intuition – tells us that the fundamental properties
of an object, like energy, position, and velocity, are directly accessible to observation. In
quantum mechanics these quantities no longer appear as fundamental, being replaced by
the state vector, which can’t be directly observed. It is as though there is a hidden world
in quantum mechanics, which we can only indirectly and imperfectly access. Moreover,
merely observing a classical system does not necessarily change the state of the system.
Imagine how difficult it would be to play tennis if each time you looked at the ball its
position changed! But according to Postulate 3, observation in quantum mechanics is an
invasive procedure that typically changes the state of the system.
What conclusions should we draw from these strange features of quantum mechanics?
Might it be possible to reformulate quantum mechanics in a mathematically equivalent
way so that it had a structure more like classical physics? In Section 2.6 we’ll prove
Bell’s inequality, a surprising result that shows any attempt at such a reformulation is
doomed to failure. We’re stuck with the counter-intuitive nature of quantum mechanics.
Of course, the proper reaction to this is glee, not sorrow! It gives us an opportunity
to develop tools of thought that make quantum mechanics intuitive. Moreover, we can
exploit the hidden nature of the state vector to do information processing tasks beyond
what is possible in the classical world. Without this counter-intuitive behavior, quantum
computation and quantum information would be a lot less interesting.
We can also turn this discussion about, and ask ourselves: ‘If quantum mechanics is
so different from classical physics, then how come the everyday world looks so classical?’
Why do we see no evidence of a hidden state vector in our everyday lives? It turns out
Application: superdense coding
97
that the classical world we see can be derived from quantum mechanics as an approximate
description of the world that will be valid on the sort of time, length and mass scales
we commonly encounter in our everyday lives. Explaining the details of how quantum
mechanics gives rise to classical physics is beyond the scope of this book, but the interested
reader should check out the discussion of this topic in ‘History and further reading’at
the end of Chapter 8.
2.3 Application: superdense coding
Superdense coding is a simple yet surprising application of elementary quantum mechanics. It combines in a concrete, non-trivial way all the basic ideas of elementary quantum
mechanics, as covered in the previous sections, and is therefore an ideal example of the
information processing tasks that can be accomplished using quantum mechanics.
Superdense coding involves two parties, conventionally known as ‘Alice’ and ‘Bob’,
who are a long way away from one another. Their goal is to transmit some classical
information from Alice to Bob. Suppose Alice is in possession of two classical bits of
information which she wishes to send Bob, but is only allowed to send a single qubit to
Bob. Can she achieve her goal?
Superdense coding tells us that the answer to this question is yes. Suppose Alice and
Bob initially share a pair of qubits in the entangled state
|ψ =
|00 + |11
√
.
2
(2.133)
Alice is initially in possession of the first qubit, while Bob has possession of the second
qubit, as illustrated in Figure 2.3. Note that |ψ is a fixed state; there is no need for Alice
to have sent Bob any qubits in order to prepare this state. Instead, some third party may
prepare the entangled state ahead of time, sending one of the qubits to Alice, and the
other to Bob.
=
00 + 11
2
Figure 2.3. The initial setup for superdense coding, with Alice and Bob each in possession of one half of an
entangled pair of qubits. Alice can use superdense coding to transmit two classical bits of information to Bob, using
only a single qubit of communication and this preshared entanglement.
By sending the single qubit in her possession to Bob, it turns out that Alice can
communicate two bits of classical information to Bob. Here is the procedure she uses. If
she wishes to send the bit string ‘00’ to Bob then she does nothing at all to her qubit. If
she wishes to send ‘01’ then she applies the phase flip Z to her qubit. If she wishes to
gate, X, to her qubit. If she wishes to send
send ‘10’ then she applies the quantum
‘11’ then she applies the iY gate to her qubit. The four resulting states are easily seen
98
Introduction to quantum mechanics
to be:
|00 + |11
√
2
|00 − |11
√
01 : |ψ →
2
|10 + |01
√
10 : |ψ →
2
|01 − |10
√
11 : |ψ →
.
2
00 : |ψ →
(2.134)
(2.135)
(2.136)
(2.137)
As we noted in Section 1.3.6, these four states are known as the Bell basis, Bell states,
or EPR pairs, in honor of several of the pioneers who first appreciated the novelty of
entanglement. Notice that the Bell states form an orthonormal basis, and can therefore
be distinguished by an appropriate quantum measurement. If Alice sends her qubit to
Bob, giving Bob possession of both qubits, then by doing a measurement in the Bell basis
Bob can determine which of the four possible bit strings Alice sent.
Summarizing, Alice, interacting with only a single qubit, is able to transmit two bits
of information to Bob. Of course, two qubits are involved in the protocol, but Alice
never need interact with the second qubit. Classically, the task Alice accomplishes would
have been impossible had she only transmitted a single classical bit, as we will show
in Chapter 12. Furthermore, this remarkable superdense coding protocol has received
partial verification in the laboratory. (See ‘History and further reading’ for references to
the experimental verification.) In later chapters we will see many other examples, some
of them much more spectacular than superdense coding, of quantum mechanics being
harnessed to perform information processing tasks. However, a key point can already be
seen in this beautiful example: information is physical, and surprising physical theories
such as quantum mechanics may predict surprising information processing abilities.
Exercise 2.69: Verify that the Bell basis forms an orthonormal basis for the two qubit
state space.
Exercise 2.70: Suppose E is any positive operator acting on Alice’s qubit. Show that
ψ|E ⊗ I|ψ takes the same value when |ψ is any of the four Bell states.
Suppose some malevolent third party (‘Eve’) intercepts Alice’s qubit on the way
to Bob in the superdense coding protocol. Can Eve infer anything about which
of the four possible bit strings 00, 01, 10, 11 Alice is trying to send? If so, how, or
if not, why not?
2.4 The density operator
We have formulated quantum mechanics using the language of state vectors. An alternate
formulation is possible using a tool known as the density operator or density matrix.
This alternate formulation is mathematically equivalent to the state vector approach,
but it provides a much more convenient language for thinking about some commonly
encountered scenarios in quantum mechanics. The next three sections describe the density
operator formulation of quantum mechanics. Section 2.4.1 introduces the density operator
using the concept of an ensemble of quantum states. Section 2.4.2 develops some general
The density operator
99
properties of the density operator. Finally, Section 2.4.3 describes an application where
the density operator really shines – as a tool for the description of individual subsystems
of a composite quantum system.
2.4.1 Ensembles of quantum states
The density operator language provides a convenient means for describing quantum
systems whose state is not completely known. More precisely, suppose a quantum system
is in one of a number of states |ψi , where i is an index, with respective probabilities pi .
We shall call {pi , |ψi } an ensemble of pure states. The density operator for the system
is defined by the equation
ρ≡
i
pi |ψi ψi |.
(2.138)
The density operator is often known as the density matrix; we will use the two terms
interchangeably. It turns out that all the postulates of quantum mechanics can be reformulated in terms of the density operator language. The purpose of this section and
the next is to explain how to perform this reformulation, and explain when it is useful.
Whether one uses the density operator language or the state vector language is a matter of
taste, since both give the same results; however it is sometimes much easier to approach
problems from one point of view rather than the other.
Suppose, for example, that the evolution of a closed quantum system is described by
the unitary operator U . If the system was initially in the state |ψi with probability pi then
after the evolution has occurred the system will be in the state U |ψi with probability
pi . Thus, the evolution of the density operator is described by the equation
U
ρ=
i
pi |ψi ψi | −→
i
pi U |ψi ψi |U † = U ρU †.
(2.139)
Measurements are also easily described in the density operator language. Suppose we
perform a measurement described by measurement operators Mm . If the initial state was
|ψi , then the probability of getting result m is
†
†
p(m|i) = ψi |Mm
Mm |ψi = tr(Mm
Mm |ψi ψi |),
(2.140)
where we have used Equation (2.61) to obtain the last equality. By the law of total
probability (see Appendix 1 for an explanation of this and other elementary notions of
probability theory) the probability of obtaining result m is
p(m) =
p(m|i)pi
(2.141)
†
pi tr(Mm
Mm |ψi ψi |)
(2.142)
i
=
i
†
Mm ρ).
= tr(Mm
(2.143)
What is the density operator of the system after obtaining the measurement result m? If
the initial state was |ψi then the state after obtaining the result m is
Mm |ψi
|ψim =
.
†
ψi |Mm
Mm |ψi
(2.144)
100
Introduction to quantum mechanics
Thus, after a measurement which yields the result m we have an ensemble of states |ψim
with respective probabilities p(i|m). The corresponding density operator ρm is therefore
ρm =
i
p(i|m)|ψim ψim | =
p(i|m)
i
†
Mm |ψi ψi |Mm
†
ψi |Mm
Mm |ψi
.
(2.145)
But by elementary probability theory, p(i|m) = p(m, i)/p(m) = p(m|i)pi /p(m). Substituting from (2.143) and (2.140) we obtain
pi
ρm =
i
=
†
Mm |ψi ψi |Mm
†
tr(Mm
Mm ρ)
†
Mm ρMm
†
tr(Mm
Mm ρ)
.
(2.146)
(2.147)
What we have shown is that the basic postulates of quantum mechanics related to
unitary evolution and measurement can be rephrased in the language of density operators.
In the next section we complete this rephrasing by giving an intrinsic characterization of
the density operator that does not rely on the idea of a state vector.
Before doing so, however, it is useful to introduce some more language, and one more
fact about the density operator. First, the language. A quantum system whose state |ψ
is known exactly is said to be in a pure state. In this case the density operator is simply
ρ = |ψψ|. Otherwise, ρ is in a mixed state; it is said to be a mixture of the different
pure states in the ensemble for ρ. In the exercises you will be asked to demonstrate a
simple criterion for determining whether a state is pure or mixed: a pure state satisfies
tr(ρ2 ) = 1, while a mixed state satisfies tr(ρ2 ) < 1. A few words of warning about the
nomenclature: sometimes people use the term ‘mixed state’ as a catch-all to include both
pure and mixed quantum states. The origin for this usage seems to be that it implies that
the writer is not necessarily assuming that a state is pure. Second, the term ‘pure state’
is often used in reference to a state vector |ψ, to distinguish it from a density operator
ρ.
Finally, imagine a quantum system is prepared in the state ρi with probability pi . It is
not difficult to convince yourself that the system may be described by the density matrix
i pi ρi . A proof of this is to suppose that ρi arises from some ensemble {pij , |ψij }
(note that i is fixed) of pure states, so the probability for being in the state |ψij is pi pij .
The density matrix for the system is thus
ρ=
ij
=
pi pij |ψij ψij |
(2.148)
pi ρi ,
(2.149)
i
where we have used the definition ρi = j pij |ψij ψij |. We say that ρ is a mixture
of the states ρi with probabilities pi . This concept of a mixture comes up repeatedly in
the analysis of problems like quantum noise, where the effect of the noise is to introduce
ignorance into our knowledge of the quantum state. A simple example is provided by the
measurement scenario described above. Imagine that, for some reason, our record of the
result m of the measurement was lost. We would have a quantum system in the state
ρm with probability p(m), but would no longer know the actual value of m. The state of
The density operator
101
such a quantum system would therefore be described by the density operator
ρ=
p(m)ρm
(2.150)
m
†
tr(Mm
Mm ρ)
=
m
†
Mm ρMm
†
tr(Mm Mm ρ)
†
Mm ρMm
,
=
(2.151)
(2.152)
m
a nice compact formula which may be used as the starting point for analysis of further
operations on the system.
2.4.2 General properties of the density operator
The density operator was introduced as a means of describing ensembles of quantum
states. In this section we move away from this description to develop an intrinsic characterization of density operators that does not rely on an ensemble interpretation. This
allows us to complete the program of giving a description of quantum mechanics that
does not take as its foundation the state vector. We also take the opportunity to develop
numerous other elementary properties of the density operator.
The class of operators that are density operators are characterized by the following
useful theorem:
Theorem 2.5: (Characterization of density operators) An operator ρ is the density
operator associated to some ensemble {pi , |ψi } if and only if it satisfies the
conditions:
(1) (Trace condition) ρ has trace equal to one.
(2) (Positivity condition) ρ is a positive operator.
Proof
Suppose ρ = i pi |ψi ψi | is a density operator. Then
tr(ρ) =
i
pi tr(|ψi ψi |) =
pi = 1,
(2.153)
i
so the trace condition tr(ρ) = 1 is satisfied. Suppose |ϕ is an arbitrary vector in state
space. Then
ϕ|ρ|ϕ =
pi ϕ|ψi ψi |ϕ
(2.154)
i
(2.155)
i
pi |ϕ|ψi |2
=
≥ 0,
(2.156)
so the positivity condition is satisfied.
Conversely, suppose ρ is any operator satisfying the trace and positivity conditions.
Since ρ is positive, it must have a spectral decomposition
ρ=
j
λj |jj|,
(2.157)
where the vectors |j are orthogonal, and λj are real, non-negative eigenvalues of ρ.
102
Introduction to quantum mechanics
From the trace condition we see that j λj = 1. Therefore, a system in state |j with
probability λj will have density operator ρ. That is, the ensemble {λj , |j} is an ensemble
of states giving rise to the density operator ρ.
This theorem provides a characterization of density operators that is intrinsic to the
operator itself: we can define a density operator to be a positive operator ρ which has
trace equal to one. Making this definition allows us to reformulate the postulates of
quantum mechanics in the density operator picture. For ease of reference we state all the
reformulated postulates here:
Postulate 1: Associated to any isolated physical system is a complex vector space
with inner product (that is, a Hilbert space) known as the state space of the
system. The system is completely described by its density operator, which is a
positive operator ρ with trace one, acting on the state space of the system. If a
quantum system is in the state ρi with probability pi , then the density operator for
the system is i pi ρi .
Postulate 2: The evolution of a closed quantum system is described by a unitary
transformation. That is, the state ρ of the system at time t1 is related to the state
ρ′ of the system at time t2 by a unitary operator U which depends only on the
times t1 and t2 ,
ρ′ = U ρU †.
(2.158)
Postulate 3: Quantum measurements are described by a collection {Mm } of
measurement operators. These are operators acting on the state space of the
system being measured. The index m refers to the measurement outcomes that
may occur in the experiment. If the state of the quantum system is ρ immediately
before the measurement then the probability that result m occurs is given by
†
Mm ρ),
p(m) = tr(Mm
(2.159)
and the state of the system after the measurement is
†
Mm ρMm
†
tr(Mm
Mm ρ)
.
(2.160)
The measurement operators satisfy the completeness equation,
†
Mm
Mm = I.
(2.161)
m
Postulate 4: The state space of a composite physical system is the tensor product
of the state spaces of the component physical systems. Moreover, if we have
systems numbered 1 through n, and system number i is prepared in the state ρi ,
then the joint state of the total system is ρ1 ⊗ ρ2 ⊗ . . . ρn .
These reformulations of the fundamental postulates of quantum mechanics in terms of
the density operator are, of course, mathematically equivalent to the description in terms
of the state vector. Nevertheless, as a way of thinking about quantum mechanics, the
density operator approach really shines for two applications: the description of quantum
systems whose state is not known, and the description of subsystems of a composite
103
The density operator
quantum system, as will be described in the next section. For the remainder of this
section we flesh out the properties of the density matrix in more detail.
Exercise 2.71: (Criterion to decide if a state is mixed or pure) Let ρ be a
density operator. Show that tr(ρ2 ) ≤ 1, with equality if and only if ρ is a pure
state.
It is a tempting (and surprisingly common) fallacy to suppose that the eigenvalues
and eigenvectors of a density matrix have some special significance with regard to the
ensemble of quantum states represented by that density matrix. For example, one might
suppose that a quantum system with density matrix
3
1
ρ = |00| + |11| .
4
4
(2.162)
must be in the state |0 with probability 3/4 and in the state |1 with probability 1/4.
However, this is not necessarily the case. Suppose we define
)
)
3
1
|0 +
|1
(2.163)
|a ≡
4
4
)
)
3
1
|0 −
|1,
(2.164)
|b ≡
4
4
and the quantum system is prepared in the state |a with probability 1/2 and in the state
|b with probability 1/2. Then it is easily checked that the corresponding density matrix
is
1
1
3
1
ρ = |aa| + |bb| = |00| + |11|.
2
2
4
4
(2.165)
That is, these two different ensembles of quantum states give rise to the same density
matrix. In general, the eigenvectors and eigenvalues of a density matrix just indicate one
of many possible ensembles that may give rise to a specific density matrix, and there is
no reason to suppose it is an especially privileged ensemble.
A natural question to ask in the light of this discussion is what class of ensembles does
give rise to a particular density matrix? The solution to this problem, which we now give,
has surprisingly many applications in quantum computation and quantum information,
notably in the understanding of quantum noise and quantum error-correction (Chapters 8
and 10). For the solution it is convenient to make use of vectors |ψ̃i which may not be
normalized to unit length. We say the set |ψ̃i generates the operator ρ ≡ i |ψ̃i ψ̃i |,
and thus the connection to the usual ensemble picture of density operators is expressed
√
by the equation |ψ̃i = pi |ψi . When do two sets of vectors, |ψ̃i and |ϕ̃j generate the
same operator ρ? The solution to this problem will enable us to answer the question of
what ensembles give rise to a given density matrix.
Theorem 2.6: (Unitary freedom in the ensemble for density matrices) The sets
|ψ̃i and |ϕ̃j generate the same density matrix if and only if
|ψ̃i =
j
uij |ϕ̃j ,
(2.166)
where uij is a unitary matrix of complex numbers, with indices i and j, and we
104
Introduction to quantum mechanics
‘pad’ whichever set of vectors |ψ̃i or |ϕ̃j is smaller with additional vectors 0 so
that the two sets have the same number of elements.
As a consequence of the theorem, note that ρ = i pi |ψi ψi | = j qj |ϕj ϕj | for
normalized states |ψi , |ϕj and probability distributions pi and qj if and only if
√
√
uij qj |ϕj ,
pi |ψi =
(2.167)
j
for some unitary matrix uij , and we may pad the smaller ensemble with entries having
probability zero in order to make the two ensembles the same size. Thus, Theorem 2.6
characterizes the freedom in ensembles {pi , |ψi } giving rise to a given density matrix ρ.
Indeed, it is easily checked that our earlier example of a density matrix with two different
decompositions, (2.162), arises as a special case of this general result. Let’s turn now to
the proof of the theorem.
Proof
Suppose |ψ̃i = j uij |ϕ̃j for some unitary uij . Then
i
|ψ̃i ψ̃i | =
ijk
=
jk
=
jk
=
j
uij u∗ik |ϕ̃j ϕ̃k |
(2.168)
(2.169)
u†ki uij
i
|ϕ̃j ϕ̃k |
δkj |ϕ̃j ϕ̃k |
(2.170)
|ϕ̃j ϕ̃j |,
(2.171)
which shows that |ψ̃i and |ϕ̃j generate the same operator.
Conversely, suppose
A=
i
|ψ̃i ψ̃i | =
j
|ϕ̃j ϕ̃j | .
(2.172)
Let A = k λk |kk| be a decomposition for A such that the states |k are orthonormal,
and the√ λk are strictly positive. Our strategy is to relate the states |ψ̃i to the states
|k̃ ≡ λk |k, and similarly relate the states |ϕ̃j to the states |k̃. Combining the two
relations will give the result. Let |ψ be any vector orthonormal to the space spanned by
the |k̃, so ψ|k̃k̃|ψ = 0 for all k, and thus we see that
0 = ψ|A|ψ =
i
ψ|ψ̃i ψ̃i |ψ =
i
|ψ|ψ̃i |2 .
(2.173)
Thus ψ|ψ̃i = 0 for all i and all |ψ orthonormal to the space spanned by the |k̃.
It follows that each |ψ̃i can be expressed as a linear combination of the |k̃, |ψ̃i =
i |ψ̃i ψ̃i | we see that
k |k̃k̃| =
k cik |k̃. Since A =
k
|k̃k̃| =
kl
i
cik c∗il |k̃l̃|.
(2.174)
The operators |k̃l̃| are easily seen to be linearly independent, and thus it must be that
The density operator
105
∗
i cik cil
= δkl . This ensures that we may append extra columns to c to obtain a unitary
matrix v such that |ψ̃i = k vik |k̃, where we have appended zero vectors to the list
of |k̃. Similarly, we can find a unitary matrix w such that |ϕ̃j = k wjk |k̃. Thus
|ψ̃i = j uij |ϕ̃j , where u = vw† is unitary.
Exercise 2.72: (Bloch sphere for mixed states) The Bloch sphere picture for pure
states of a single qubit was introduced in Section 1.2. This description has an
important generalization to mixed states as follows.
(1) Show that an arbitrary density matrix for a mixed state qubit may be written
as
I + r · σ
,
(2.175)
ρ=
2
where r is a real three-dimensional vector such that r ≤ 1. This vector is
known as the Bloch vector for the state ρ.
(2) What is the Bloch vector representation for the state ρ = I/2?
(3) Show that a state ρ is pure if and only if r = 1.
(4) Show that for pure states the description of the Bloch vector we have given
coincides with that in Section 1.2.
Exercise 2.73: Let ρ be a density operator. A minimal ensemble for ρ is an ensemble
{pi , |ψi } containing a number of elements equal to the rank of ρ. Let |ψ be
any state in the support of ρ. (The support of a Hermitian operator A is the
vector space spanned by the eigenvectors of A with non-zero eigenvalues.) Show
that there is a minimal ensemble for ρ that contains |ψ, and moreover that in
any such ensemble |ψ must appear with probability
pi =
1
,
ψi |ρ−1 |ψi
(2.176)
where ρ−1 is defined to be the inverse of ρ, when ρ is considered as an operator
acting only on the support of ρ. (This definition removes the problem that ρ may
not have an inverse.)
2.4.3 The reduced density operator
Perhaps the deepest application of the density operator is as a descriptive tool for subsystems of a composite quantum system. Such a description is provided by the reduced
density operator, which is the subject of this section. The reduced density operator is so
useful as to be virtually indispensable in the analysis of composite quantum systems.
Suppose we have physical systems A and B, whose state is described by a density
operator ρAB . The reduced density operator for system A is defined by
ρA ≡ trB (ρAB ),
(2.177)
where trB is a map of operators known as the partial trace over system B. The partial
trace is defined by
trB |a1 a2 | ⊗ |b1 b2 | ≡ |a1 a2 | tr(|b1 b2 |),
(2.178)
where |a1 and |a2 are any two vectors in the state space of A, and |b1 and |b2 are any
two vectors in the state space of B. The trace operation appearing on the right hand side
106
Introduction to quantum mechanics
is the usual trace operation for system B, so tr(|b1 b2 |) = b2 |b1 . We have defined the
partial trace operation only on a special subclass of operators on AB; the specification is
completed by requiring in addition to Equation (2.178) that the partial trace be linear in
its input.
It is not obvious that the reduced density operator for system A is in any sense a
description for the state of system A. The physical justification for making this identification is that the reduced density operator provides the correct measurement statistics for
measurements made on system A. This is explained in more detail in Box 2.6 on page 107.
The following simple example calculations may also help understand the reduced density
operator. First, suppose a quantum system is in the product state ρAB = ρ ⊗ σ, where
ρ is a density operator for system A, and σ is a density operator for system B. Then
ρA = trB (ρ ⊗ σ) = ρ tr(σ) = ρ,
(2.184)
B
which is the result we intuitively expect.
√ Similarly, ρ = σ for this state. A less trivial
example is the Bell state (|00 + |11)/ 2. This has density operator
00| + 11|
|00 + |11
√
√
ρ=
(2.185)
2
2
|0000| + |1100| + |0011| + |1111|
=
.
(2.186)
2
Tracing out the second qubit, we find the reduced density operator of the first qubit,
ρ1 = tr2 (ρ)
tr2 (|0000|) + tr2 (|1100|) + tr2 (|0011|) + tr2 (|1111|)
=
2
|00|0|0 + |10|0|1 + |01|1|0 + |11|1|1
=
2
|00| + |11|
=
2
I
= .
2
(2.187)
(2.188)
(2.189)
(2.190)
(2.191)
Notice that this state is a mixed state, since tr((I/2)2 ) = 1/2 < 1. This is quite a
remarkable result. The state of the joint system of two qubits is a pure state, that is,
it is known exactly; however, the first qubit is in a mixed state, that is, a state about
which we apparently do not have maximal knowledge. This strange property, that the
joint state of a system can be completely known, yet a subsystem be in mixed states, is
another hallmark of quantum entanglement.
Exercise 2.74: Suppose a composite of systems A and B is in the state |a|b, where
|a is a pure state of system A, and |b is a pure state of system B. Show that
the reduced density operator of system A alone is a pure state.
Exercise 2.75: For each of the four Bell states, find the reduced density operator for
each qubit.
Quantum teleportation and the reduced density operator
A useful application of the reduced density operator is to the analysis of quantum teleportation. Recall from Section 1.3.7 that quantum teleportation is a procedure for sending
The density operator
107
Box 2.6: Why the partial trace?
Why is the partial trace used to describe part of a larger quantum system? The
reason for doing this is because the partial trace operation is the unique operation
which gives rise to the correct description of observable quantities for subsystems
of a composite system, in the following sense.
Suppose M is any observable on system A, and we have some measuring device
which is capable of realizing measurements of M . Let M̃ denote the corresponding
observable for the same measurement, performed on the composite system AB.
Our immediate goal is to argue that M̃ is necessarily equal to M ⊗ IB . Note that
if the system AB is prepared in the state |m|ψ, where |m is an eigenstate of M
with eigenvalue m, and |ψ is any state of B, then the measuring device must yield
the result m for the measurement, with probability one. Thus, if Pm is the projector
onto the m eigenspace of the observable M , then the corresponding projector for
M̃ is Pm ⊗ IB . We therefore have
M̃ =
m
mPm ⊗ IB = M ⊗ IB .
(2.179)
The next step is to show that the partial trace procedure gives the correct measurement statistics for observations on part of a system. Suppose we perform a
measurement on system A described by the observable M . Physical consistency
requires that any prescription for associating a ‘state’, ρA , to system A, must have
the property that measurement averages be the same whether computed via ρA or
ρAB ,
tr(M ρA ) = tr(M̃ ρAB ) = tr((M ⊗ IB )ρAB ).
(2.180)
This equation is certainly satisfied if we choose ρA ≡ trB (ρAB ). In fact, the partial
trace turns out to be the unique function having this property. To see this uniqueness property, let f (·) be any map of density operators on AB to density operators
on A such that
tr(M f (ρAB )) = tr((M ⊗ IB )ρAB ),
(2.181)
for all observables M . Let Mi be an orthonormal basis of operators for the space of
Hermitian operators with respect to the Hilbert–Schmidt inner product (X, Y ) ≡
tr(XY ) (compare Exercise 2.39 on page 76). Then expanding f (ρAB ) in this basis
gives
f (ρAB ) =
Mi tr(Mi f (ρAB ))
(2.182)
Mi tr((Mi ⊗ IB )ρAB ).
(2.183)
i
=
i
It follows that f is uniquely determined by Equation (2.180). Moreover, the partial
trace satisfies (2.180), so it is the unique function having this property.
quantum information from Alice to Bob, given that Alice and Bob share an EPR pair,
and have a classical communications channel.
108
Introduction to quantum mechanics
At first sight it appears as though teleportation can be used to do faster than light
communication, a big no-no according to the theory of relativity. We surmised in Section 1.3.7 that what prevents faster than light communication is the need for Alice to
communicate her measurement result to Bob. The reduced density operator allows us to
make this rigorous.
Recall that immediately before Alice makes her measurement the quantum state of the
three qubits is (Equation (1.32)):
1
|00 α|0 + β|1 + |01 α|1 + β|0
|ψ2 =
2
+|10 α|0 − β|1 + |11 α|1 − β|0 .
(2.192)
Measuring in Alice’s computational basis, the state of the system after the measurement
is:
1
|00 α|0 + β|1 with probability
(2.193)
4
1
(2.194)
|01 α|1 + β|0 with probability
4
1
(2.195)
|10 α|0 − β|1 with probability
4
1
(2.196)
|11 α|1 − β|0 with probability .
4
The density operator of the system is thus
1
|0000|(α|0 + β|1)(α∗ 0| + β ∗ 1|) + |0101|(α|1 + β|0)(α∗ 1| + β ∗ 0|)
ρ=
4
+|1010|(α|0 − β|1)(α∗ 0| − β ∗ 1|) + |1111|(α|1 − β|0)(α∗ 1| − β ∗ 0|) .
(2.197)
Tracing out Alice’s system, we see that the reduced density operator of Bob’s system is
1
ρB = (α|0 + β|1)(α∗ 0| + β ∗ 1|) + (α|1 + β|0)(α∗ 1| + β ∗ 0|)
4
+(α|0 − β|1)(α∗ 0| − β ∗ 1|) + (α|1 − β|0)(α∗ 1| − β ∗ 0|)
2
2
2
2(|α| + |β| )|00| + 2(|α| + |β| )|11|
4
|00| + |11|
=
2
I
= ,
2
=
(2.198)
2
(2.199)
(2.200)
(2.201)
where we have used the completeness relation in the last line. Thus, the state of Bob’s
system after Alice has performed the measurement but before Bob has learned the measurement result is I/2. This state has no dependence upon the state |ψ being teleported,
and thus any measurements performed by Bob will contain no information about |ψ,
thus preventing Alice from using teleportation to transmit information to Bob faster than
light.
109
The Schmidt decomposition and purifications
2.5 The Schmidt decomposition and purifications
Density operators and the partial trace are just the beginning of a wide array of tools
useful for the study of composite quantum systems, which are at the heart of quantum computation and quantum information. Two additional tools of great value are the
Schmidt decomposition and purifications. In this section we present both these tools,
and try to give the flavor of their power.
Theorem 2.7: (Schmidt decomposition) Suppose |ψ is a pure state of a composite
system, AB. Then there exist orthonormal states |iA for system A, and
orthonormal states |iB of system B such that
|ψ =
i
λi |iA |iB ,
where λi are non-negative real numbers satisfying
co-efficients.
(2.202)
i
λ2i = 1 known as Schmidt
This result is very useful. As a taste of its power, consider the following consequence:
let |ψ be a pure state of a composite system, AB. Then by the Schmidt decomposition
ρA = i λ2i |iA iA | and ρB = i λ2i |iB iB |, so the eigenvalues of ρA and ρB are
identical, namely λ2i for both density operators. Many important properties of quantum
systems are completely determined by the eigenvalues of the reduced density operator of
the system, so for a pure state of a composite system such properties will be the same√for
both systems. As an example, consider the state of two qubits, (|00 + |01 + |11)/ 3.
This has no obvious symmetry property, yet if you calculate tr (ρA )2 and tr (ρB )2
you will discover that they have the same value, 7/9 in each case. This is but one small
consequence of the Schmidt decomposition.
Proof
We give the proof for the case where systems A and B have state spaces of the same
dimension, and leave the general case to Exercise 2.76. Let |j and |k be any fixed
orthonormal bases for systems A and B, respectively. Then |ψ can be written
|ψ =
jk
ajk |j|k,
(2.203)
for some matrix a of complex numbers ajk . By the singular value decomposition, a = udv,
where d is a diagonal matrix with non-negative elements, and u and v are unitary matrices.
Thus
|ψ =
Defining |iA ≡
j
uji |j, |iB ≡
ijk
k
|ψ =
uji dii vik |j|k.
(2.204)
vik |k, and λi ≡ dii , we see that this gives
i
λi |iA |iB .
(2.205)
It is easy to check that |iA forms an orthonormal set, from the unitarity of u and the
orthonormality of |j, and similarly that the |iB form an orthonormal set.
110
Introduction to quantum mechanics
Exercise 2.76: Extend the proof of the Schmidt decomposition to the case where A
and B may have state spaces of different dimensionality.
Exercise 2.77: Suppose ABC is a three component quantum system. Show by
example that there are quantum states |ψ of such systems which can not be
written in the form
|ψ =
i
λi |iA |iB |iC ,
(2.206)
where λi are real numbers, and |iA , |iB , |iC are orthonormal bases of the
respective systems.
The bases |iA and |iB are called the Schmidt bases for A and B, respectively, and
the number of non-zero values λi is called the Schmidt number for the state |ψ. The
Schmidt number is an important property of a composite quantum system, which in
some sense quantifies the ‘amount’ of entanglement between systems A and B. To get
some idea of why this is the case, consider the following obvious but important property:
the Schmidt number is preserved under unitary transformations on system A or system
B alone. To see this, notice that if i λi |iA |iB is the Schmidt decomposition for |ψ
then i λi (U |iA )|iB is the Schmidt decomposition for U |ψ, where U is a unitary
operator acting on system A alone. Algebraic invariance properties of this type make the
Schmidt number a very useful tool.
Exercise 2.78: Prove that a state |ψ of a composite system AB is a product state if
and only if it has Schmidt number 1. Prove that |ψ is a product state if and only
if ρA (and thus ρB ) are pure states.
A second, related technique for quantum computation and quantum information is
purification. Suppose we are given a state ρA of a quantum system A. It is possible to
introduce another system, which we denote R, and define a pure state |AR for the joint
system AR such that ρA = trR (|ARAR|). That is, the pure state |AR reduces to ρA
when we look at system A alone. This is a purely mathematical procedure, known as
purification, which allows us to associate pure states with mixed states. For this reason
we call system R a reference system: it is a fictitious system, without a direct physical
significance.
To prove that purification can be done for any state, we explain how to construct
a system R and purification |AR for ρA . Suppose ρA has orthonormal decomposition
ρA = i pi |iA iA |. To purify ρA we introduce a system R which has the same state
space as system A, with orthonormal basis states |iR , and define a pure state for the
combined system
√ A R
pi |i |i .
(2.207)
|AR ≡
i
We now calculate the reduced density operator for system A corresponding to the state
|AR:
√
trR (|ARAR|) =
pi pj |iA j A | tr(|iR j R |)
(2.208)
ij
√
=
ij
pi pj |iA j A | δij
(2.209)
EPR and the Bell inequality
=
i
A
pi |iA iA |
=ρ .
111
(2.210)
(2.211)
Thus |AR is a purification of ρA .
Notice the close relationship of the Schmidt decomposition to purification: the procedure used to purify a mixed state of system A is to define a pure state whose Schmidt
basis for system A is just the basis in which the mixed state is diagonal, with the Schmidt
coefficients being the square root of the eigenvalues of the density operator being purified.
In this section we’ve explained two tools for studying composite quantum systems, the
Schmidt decomposition and purifications. These tools will be indispensable to the study of
quantum computation and quantum information, especially quantum information, which
is the subject of Part III of this book.
Exercise 2.79: Consider a composite system consisting of two qubits. Find the
Schmidt decompositions of the states
|00 + |01 + |10
|00 + |11 |00 + |01 + |10 + |11
√
√
; and
;
. (2.212)
2
3
2
Exercise 2.80: Suppose |ψ and |ϕ are two pure states of a composite quantum
system with components A and B, with identical Schmidt coefficients. Show
that there are unitary transformations U on system A and V on system B such
that |ψ = (U ⊗ V )|ϕ.
Exercise 2.81: (Freedom in purifications) Let |AR1 and |AR2 be two
purifications of a state ρA to a composite system AR. Prove that there exists a
unitary transformation UR acting on system R such that
|AR1 = (IA ⊗ UR )|AR2 .
Exercise 2.82: Suppose {pi , |ψi } is an ensemble of states generating a density matrix
ρ = i pi |ψi ψi | for a quantum system A. Introduce a system R with
orthonormal basis |i.
√
(1) Show that i pi |ψi |i is a purification of ρ.
(2) Suppose we measure R in the basis |i, obtaining outcome i. With what
probability do we obtain the result i, and what is the corresponding state of
system A?
(3) Let |AR be any purification of ρ to the system AR. Show that there exists
an orthonormal basis |i in which R can be measured such that the
corresponding post-measurement state for system A is |ψi with probability
pi .
2.6 EPR and the Bell inequality
Anybody who is not shocked by quantum theory has not understood it.
– Niels Bohr
112
Introduction to quantum mechanics
I recall that during one walk Einstein suddenly stopped, turned to me and asked
whether I really believed that the moon exists only when I look at it. The rest
of this walk was devoted to a discussion of what a physicist should mean by the
term ‘to exist’.
– Abraham Pais
...quantum phenomena do not occur in a Hilbert space, they occur in a laboratory.
– Asher Peres
...what is proved by impossibility proofs is lack of imagination.
– John Bell
This chapter has focused on introducing the tools and mathematics of quantum mechanics. As these techniques are applied in the following chapters of this book, an important
recurring theme is the unusual, non-classical properties of quantum mechanics. But
what exactly is the difference between quantum mechanics and the classical world? Understanding this difference is vital in learning how to perform information processing
tasks that are difficult or impossible with classical physics. This section concludes the
chapter with a discussion of the Bell inequality, a compelling example of an essential
difference between quantum and classical physics.
When we speak of an object such as a person or a book, we assume that the physical
properties of that object have an existence independent of observation. That is, measurements merely act to reveal such physical properties. For example, a tennis ball has as one
of its physical properties its position, which we typically measure using light scattered
from the surface of the ball. As quantum mechanics was being developed in the 1920s
and 1930s a strange point of view arose that differs markedly from the classical view. As
described earlier in the chapter, according to quantum mechanics, an unobserved particle
does not possess physical properties that exist independent of observation. Rather, such
physical properties arise as a consequence of measurements performed upon the system.
For example, according to quantum mechanics a qubit does not possess definite properties of ‘spin in the z direction, σz ’, and ‘spin in the x direction, σx ’, each of which can
be revealed by performing the appropriate measurement. Rather, quantum mechanics
gives a set of rules which specify, given the state vector, the probabilities for the possible
measurement outcomes when the observable σz is measured, or when the observable σx
is measured.
Many physicists rejected this new view of Nature. The most prominent objector was
Albert Einstein. In the famous ‘EPR paper’, co-authored with Nathan Rosen and Boris
Podolsky, Einstein proposed a thought experiment which, he believed, demonstrated that
quantum mechanics is not a complete theory of Nature.
The essence of the EPR argument is as follows. EPR were interested in what they
termed ‘elements of reality’. Their belief was that any such element of reality must be
represented in any complete physical theory. The goal of the argument was to show that
quantum mechanics is not a complete physical theory, by identifying elements of reality
that were not included in quantum mechanics. The way they attempted to do this was
by introducing what they claimed was a sufficient condition for a physical property to
EPR and the Bell inequality
113
be an element of reality, namely, that it be possible to predict with certainty the value
that property will have, immediately before measurement.
Box 2.7: Anti-correlations in the EPR experiment
Suppose we prepare the two qubit state
|ψ =
|01 − |10
√
,
2
(2.213)
a state sometimes known as the spin singlet for historical reasons. It is not difficult
to show that this state is an entangled state of the two qubit system. Suppose we
perform a measurement of spin along the v axis on both qubits, that is, we measure
the observable v · σ (defined in Equation (2.116) on page 90) on each qubit, getting
a result of +1 or −1 for each qubit. It turns out that no matter what choice of v
we make, the results of the two measurements are always opposite to one another.
That is, if the measurement on the first qubit yields +1, then the measurement on
the second qubit will yield −1, and vice versa. It is as though the second qubit
knows the result of the measurement on the first, no matter how the first qubit is
measured. To see why this is true, suppose |a and |b are the eigenstates of v · σ.
Then there exist complex numbers α, β, γ, δ such that
|0 = α|a + β|b
|1 = γ|a + δ|b.
(2.214)
(2.215)
Substituting we obtain
|ab − |ba
|01 − |10
√
√
= (αδ − βγ)
.
(2.216)
2
2
α β
, and thus is equal
But αδ − βγ is the determinant of the unitary matrix
γ δ
to a phase factor eiθ for some real θ. Thus
|01 − |10 |ab − |ba
√
√
=
,
2
2
(2.217)
up to an unobservable global phase factor. As a result, if a measurement of v · σ
is performed on both qubits, then we can see that a result of +1 (−1) on the first
qubit implies a result of −1 (+1) on the second qubit.
Consider, for example, an entangled pair of qubits belonging to Alice and Bob, respectively:
|01 − |10
√
.
2
(2.218)
Suppose Alice and Bob are a long way away from one another. Alice performs a measurement of spin along the v axis, that is, she measures the observable v · σ (defined in
Equation (2.116) on page 90). Suppose Alice receives the result +1. Then a simple quantum mechanical calculation, given in Box 2.7, shows that she can predict with certainty
114
Introduction to quantum mechanics
that Bob will measure −1 on his qubit if he also measures spin along the v axis. Similarly,
if Alice measured −1, then she can predict with certainty that Bob will measure +1 on
his qubit. Because it is always possible for Alice to predict the value of the measurement
result recorded when Bob’s qubit is measured in the v direction, that physical property
must correspond to an element of reality, by the EPR criterion, and should be represented in any complete physical theory. However, standard quantum mechanics, as we
have presented it, merely tells one how to calculate the probabilities of the respective
measurement outcomes if v · σ is measured. Standard quantum mechanics certainly does
not include any fundamental element intended to represent the value of v · σ, for all unit
vectors v.
The goal of EPR was to show that quantum mechanics is incomplete, by demonstrating
that quantum mechanics lacked some essential ‘element of reality’, by their criterion. They
hoped to force a return to a more classical view of the world, one in which systems could
be ascribed properties which existed independently of measurements performed on those
systems. Unfortunately for EPR, most physicists did not accept the above reasoning as
convincing. The attempt to impose on Nature by fiat properties which she must obey
seems a most peculiar way of studying her laws.
Indeed, Nature has had the last laugh on EPR. Nearly thirty years after the EPR paper
was published, an experimental test was proposed that could be used to check whether
or not the picture of the world which EPR were hoping to force a return to is valid or not.
It turns out that Nature experimentally invalidates that point of view, while agreeing
with quantum mechanics.
The key to this experimental invalidation is a result known as Bell’s inequality. Bell’s
inequality is not a result about quantum mechanics, so the first thing we need to do is
momentarily forget all our knowledge of quantum mechanics. To obtain Bell’s inequality,
we’re going to do a thought experiment, which we will analyze using our common sense
notions of how the world works – the sort of notions Einstein and his collaborators thought
Nature ought to obey. After we have done the common sense analysis, we will perform a
quantum mechanical analysis which we can show is not consistent with the common sense
analysis. Nature can then be asked, by means of a real experiment, to decide between
our common sense notions of how the world works, and quantum mechanics.
Imagine we perform the following experiment, illustrated in Figure 2.4. Charlie prepares two particles. It doesn’t matter how he prepares the particles, just that he is capable
of repeating the experimental procedure which he uses. Once he has performed the preparation, he sends one particle to Alice, and the second particle to Bob.
Once Alice receives her particle, she performs a measurement on it. Imagine that she
has available two different measurement apparatuses, so she could choose to do one of
two different measurements. These measurements are of physical properties which we
shall label PQ and PR , respectively. Alice doesn’t know in advance which measurement
she will choose to perform. Rather, when she receives the particle she flips a coin or
uses some other random method to decide which measurement to perform. We suppose
for simplicity that the measurements can each have one of two outcomes, +1 or −1.
Suppose Alice’s particle has a value Q for the property PQ . Q is assumed to be an
objective property of Alice’s particle, which is merely revealed by the measurement,
much as we imagine the position of a tennis ball to be revealed by the particles of light
being scattered off it. Similarly, let R denote the value revealed by a measurement of the
property PR .
EPR and the Bell inequality
115
Similarly, suppose that Bob is capable of measuring one of two properties, PS or PT ,
once again revealing an objectively existing value S or T for the property, each taking
value +1 or −1. Bob does not decide beforehand which property he will measure, but
waits until he has received the particle and then chooses randomly. The timing of the
experiment is arranged so that Alice and Bob do their measurements at the same time
(or, to use the more precise language of relativity, in a causally disconnected manner).
Therefore, the measurement which Alice performs cannot disturb the result of Bob’s
measurement (or vice versa), since physical influences cannot propagate faster than light.
Q = ±1
R = ±1
S = ±1
T = ±1
Figure 2.4. Schematic experimental setup for the Bell inequalities. Alice can choose to measure either Q or R, and
Bob chooses to measure either S or T . They perform their measurements simultaneously. Alice and Bob are
assumed to be far enough apart that performing a measurement on one system can not have any effect on the result
of measurements on the other.
We are going to do some simple algebra with the quantity QS + RS + RT − QT .
Notice that
QS + RS + RT − QT = (Q + R)S + (R − Q)T.
(2.219)
Because R, Q = ±1 it follows that either (Q + R)S = 0 or (R − Q)T = 0. In either
case, it is easy to see from (2.219) that QS + RS + RT − QT = ±2. Suppose next that
p(q, r, s, t) is the probability that, before the measurements are performed, the system is
in a state where Q = q, R = r, S = s, and T = t. These probabilities may depend on
how Charlie performs his preparation, and on experimental noise. Letting E(·) denote
the mean value of a quantity, we have
E(QS + RS + RT − QT ) =
p(q, r, s, t)(qs + rs + rt − qt)
(2.220)
qrst
≤
p(q, r, s, t) × 2
(2.221)
qrst
= 2.
(2.222)
Also,
E(QS + RS + RT − QT ) =
p(q, r, s, t)qs +
qrst
+
qrst
p(q, r, s, t)rs
qrst
p(q, r, s, t)rt −
p(q, r, s, t)qt
(2.223)
qrst
= E(QS) + E(RS) + E(RT ) − E(QT ).
(2.224)
Comparing (2.222) and (2.224) we obtain the Bell inequality,
E(QS) + E(RS) + E(RT ) − E(QT ) ≤ 2.
(2.225)
116
Introduction to quantum mechanics
This result is also often known as the CHSH inequality after the initials of its four
discoverers. It is part of a larger set of inequalities known generically as Bell inequalities,
since the first was found by John Bell.
By repeating the experiment many times, Alice and Bob can determine each quantity on
the left hand side of the Bell inequality. For example, after finishing a set of experiments,
Alice and Bob get together to analyze their data. They look at all the experiments where
Alice measured PQ and Bob measured PS . By multiplying the results of their experiments
together, they get a sample of values for QS. By averaging over this sample, they can
estimate E(QS) to an accuracy only limited by the number of experiments which they
perform. Similarly, they can estimate all the other quantities on the left hand side of the
Bell inequality, and thus check to see whether it is obeyed in a real experiment.
It’s time to put some quantum mechanics back in the picture. Imagine we perform the
following quantum mechanical experiment. Charlie prepares a quantum system of two
qubits in the state
|ψ =
|01 − |10
√
.
2
(2.226)
He passes the first qubit to Alice, and the second qubit to Bob. They perform measurements of the following observables:
Q = Z1
R = X1
−Z2 − X2
√
2
Z2 − X2
T = √
.
2
S=
(2.227)
(2.228)
Simple calculations show that the average values for these observables, written in the
quantum mechanical · notation, are:
1
1
1
1
QS = √ ; RS = √ ; RT = √ ; QT = − √ .
2
2
2
2
(2.229)
√
QS + RS + RT − QT = 2 2.
(2.230)
Thus,
Hold on! We learned back in (2.225) that the average value of QS plus the average value
of RS plus the average value of RT minus the average value of QT can never
√ exceed
two. Yet here, quantum mechanics predicts that this sum of averages yields 2 2!
Fortunately, we can ask Nature to resolve the apparent paradox for us. Clever experiments using photons – particles of light – have been done to check the prediction (2.230)
of quantum mechanics versus the Bell inequality (2.225) which we were led to by our
common sense reasoning. The details of the experiments are outside the scope of the
book, but the results were resoundingly in favor of the quantum mechanical prediction.
The Bell inequality (2.225) is not obeyed by Nature.
What does this mean? It means that one or more of the assumptions that went into
the derivation of the Bell inequality must be incorrect. Vast tomes have been written
analyzing the various forms in which this type of argument can be made, and analyzing
the subtly different assumptions which must be made to reach Bell-like inequalities. Here
we merely summarize the main points.
There are two assumptions made in the proof of (2.225) which are questionable:
117
Chapter problems
(1) The assumption that the physical properties PQ , PR , PS , PT have definite values
Q, R, S, T which exist independent of observation. This is sometimes known as the
assumption of realism.
(2) The assumption that Alice performing her measurement does not influence the
result of Bob’s measurement. This is sometimes known as the assumption of
locality.
These two assumptions together are known as the assumptions of local realism. They are
certainly intuitively plausible assumptions about how the world works, and they fit our
everyday experience. Yet the Bell inequalities show that at least one of these assumptions
is not correct.
What can we learn from Bell’s inequality? For physicists, the most important lesson
is that their deeply held commonsense intuitions about how the world works are wrong.
The world is not locally realistic. Most physicists take the point of view that it is the
assumption of realism which needs to be dropped from our worldview in quantum mechanics, although others have argued that the assumption of locality should be dropped
instead. Regardless, Bell’s inequality together with substantial experimental evidence now
points to the conclusion that either or both of locality and realism must be dropped from
our view of the world if we are to develop a good intuitive understanding of quantum
mechanics.
What lessons can the fields of quantum computation and quantum information learn
from Bell’s inequality? Historically the most useful lesson has perhaps also been the most
vague: there is something profoundly ‘up’ with entangled states like the EPR state. A lot
of mileage in quantum computation and, especially, quantum information, has come from
asking the simple question: ‘what would some entanglement buy me in this problem?’
As we saw in teleportation and superdense coding, and as we will see repeatedly later
in the book, by throwing some entanglement into a problem we open up a new world
of possibilities unimaginable with classical information. The bigger picture is that Bell’s
inequality teaches us that entanglement is a fundamentally new resource in the world that
goes essentially beyond classical resources; iron to the classical world’s bronze age. A major
task of quantum computation and quantum information is to exploit this new resource to
do information processing tasks impossible or much more difficult with classical resources.
Problem 2.1: (Functions of the Pauli matrices) Let f (·) be any function from
complex numbers to complex numbers. Let n be a normalized vector in three
dimensions, and let θ be real. Show that
f (θn · σ) =
f (θ) − f (−θ)
f (θ) + f (−θ)
I+
n · σ.
2
2
(2.231)
Problem 2.2: (Properties of the Schmidt number) Suppose |ψ is a pure state of
a composite system with components A and B.
(1) Prove that the Schmidt number of |ψ is equal to the rank of the reduced
density matrix ρA ≡ trB (|ψψ|). (Note that the rank of a Hermitian
operator is equal to the dimension of its support.)
(2) Suppose |ψ = j |αj |βj is a representation for |ψ, where |αj and |βj
are (un-normalized) states for systems A and B, respectively. Prove that the
118
Introduction to quantum mechanics
number of terms in such a decomposition is greater than or equal to the
Schmidt number of |ψ, Sch(ψ).
(3) Suppose |ψ = α|ϕ + β|γ. Prove that
Sch(ψ) ≥ |Sch(ϕ) − Sch(γ)| .
(2.232)
Problem 2.3: (Tsirelson’s inequality) Suppose
Q = q · σ, R = r · σ, S = s · σ, T = t · σ, where q, r, s and t are real unit vectors
in three dimensions. Show that
(Q ⊗ S + R ⊗ S + R ⊗ T − Q ⊗ T )2 = 4I + [Q, R] ⊗ [S, T ].
(2.233)
Use this result to prove that
√
Q ⊗ S + R ⊗ S + R ⊗ T − Q ⊗ T ≤ 2 2,
(2.234)
so the violation of the Bell inequality found in Equation (2.230) is the maximum
possible in quantum mechanics.
History and further reading
There are an enormous number of books on linear algebra at levels ranging from High
School through to Graduate School. Perhaps our favorites are the two volume set by
Horn and Johnson[HJ85, HJ91], which cover an extensive range of topics in an accessible
manner. Other useful references include Marcus and Minc[MM92], and Bhatia[Bha97]. Good
introductions to linear algebra include Halmos[Hal58], Perlis[Per52], and Strang[Str76].
There are many excellent books on quantum mechanics. Unfortunately, most of
these books focus on topics of tangential interest to quantum information and computation. Perhaps the most relevant in the existing literature is Peres’ superb book[Per93].
Beside an extremely clear exposition of elementary quantum mechanics, Peres gives
an extensive discussion of the Bell inequalities and related results. Good introductory
level texts include Sakurai’s book[Sak95], Volume III of the superb series by Feynman,
Leighton, and Sands[FLS65a], and the two volume work by Cohen-Tannoudji, Diu and
Laloë[CTDL77a, CTDL77b]. All three of these works are somewhat closer in spirit to quantum computation and quantum information than are most other quantum mechanics
texts, although the great bulk of each is still taken up by applications far removed from
quantum computation and quantum information. As a result, none of these texts need
be read in detail by someone interested in learning about quantum computation and
quantum information. However, any one of these texts may prove handy as a reference,
especially when reading articles by physicists. References for the history of quantum
mechanics may be found at the end of Chapter 1.
Many texts on quantum mechanics deal only with projective measurements. For applications to quantum computing and quantum information it is more convenient – and,
we believe, easier for novices – to start with the general description of measurements,
of which projective measurements can be regarded as a special case. Of course, ultimately, as we have shown, the two approaches are equivalent. The theory of generalized
measurements which we have employed was developed between the 1940s and 1970s.
Much of the history can be distilled from the book of Kraus[Kra83]. Interesting discussion of quantum measurements may be found in Section 2.2 of Gardiner[Gar91], and in
the book by Braginsky and Khahili[BK92]. The POVM measurement for distinguishing
History and further reading
119
non-orthogonal states described in Section 2.2.6 is due to Peres[Per88]. The extension
described in Exercise 2.64 appeared in Duan and Guo[DG98].
Superdense coding was invented by Bennett and Wiesner[BW92]. An experiment implementing a variant of superdense coding using entangled photon pairs was performed
by Mattle, Weinfurter, Kwiat, and Zeilinger[MWKZ96].
The density operator formalism was introduced independently by Landau[Lan27] and
by von Neumann[von27]. The unitary freedom in the ensemble for density matrices, Theorem 2.6, was first pointed out by Schrod̈inger[Sch36], and was later rediscovered and
extended by Jaynes[Jay57] and by Hughston, Jozsa and Wootters[HJW93]. The result of Exercise 2.73 is from the paper by Jaynes, and the results of Exercises 2.81 and 2.82 appear
in the paper by Hughston, Jozsa and Wootters. The class of probability distributions
which may appear in a density matrix decomposition for a given density matrix has been
studied by Uhlmann[Uhl70] and by Nielsen[Nie99b]. Schmidt’s eponymous decomposition
appeared in[Sch06]. The result of Exercise 2.77 was noted by Peres[Per95].
The EPR thought experiment is due to Einstein, Podolsky and Rosen[EPR35], and
was recast in essentially the form we have given here by Bohm[Boh51]. It is sometimes
misleadingly referred to as the EPR ‘paradox’. The Bell inequality is named in honour
of Bell[Bel64], who first derived inequalities of this type. The form we have presented is
due to Clauser, Horne, Shimony, and Holt[CHSH69], and is often known as the CHSH
inequality. This inequality was derived independently by Bell, who did not publish the
result.
Part 3 of Problem 2.2 is due to Thapliyal (private communication). Tsirelson’s inequality is due to Tsirelson[Tsi80].
3 Introduction to computer science
In natural science, Nature has given us a world and we’re just to discover its
laws. In computers, we can stuff laws into it and create a world.
– Alan Kay
Our field is still in its embryonic stage. It’s great that we haven’t been around
for 2000 years. We are still at a stage where very, very important results occur
in front of our eyes.
– Michael Rabin, on computer science
Algorithms are the key concept of computer science. An algorithm is a precise recipe for
performing some task, such as the elementary algorithm for adding two numbers which
we all learn as children. This chapter outlines the modern theory of algorithms developed
by computer science. Our fundamental model for algorithms will be the Turing machine.
This is an idealized computer, rather like a modern personal computer, but with a simpler
set of basic instructions, and an idealized unbounded memory. The apparent simplicity
of Turing machines is misleading; they are very powerful devices. We will see that they
can be used to execute any algorithm whatsoever, even one running on an apparently
much more powerful computer.
The fundamental question we are trying to address in the study of algorithms is: what
resources are required to perform a given computational task? This question splits up
naturally into two parts. First, we’d like to understand what computational tasks are possible, preferably by giving explicit algorithms for solving specific problems. For example,
we have many excellent examples of algorithms that can quickly sort a list of numbers
into ascending order. The second aspect of this question is to demonstrate limitations
on what computational tasks may be accomplished. For example, lower bounds can be
given for the number of operations that must be performed by any algorithm which
sorts a list of numbers into ascending order. Ideally, these two tasks – the finding of
algorithms for solving computational problems, and proving limitations on our ability to
solve computational problems – would dovetail perfectly. In practice, a significant gap
often exists between the best techniques known for solving a computational problem, and
the most stringent limitations known on the solution. The purpose of this chapter is to
give a broad overview of the tools which have been developed to aid in the analysis of
computational problems, and in the construction and analysis of algorithms to solve such
problems.
Why should a person interested in quantum computation and quantum information
spend time investigating classical computer science? There are three good reasons for this
effort. First, classical computer science provides a vast body of concepts and techniques
which may be reused to great effect in quantum computation and quantum information. Many of the triumphs of quantum computation and quantum information have
come by combining existing ideas from computer science with novel ideas from quantum
Introduction to computer science
121
mechanics. For example, some of the fast algorithms for quantum computers are based
upon the Fourier transform, a powerful tool utilized by many classical algorithms. Once
it was realized that quantum computers could perform a type of Fourier transform much
more quickly than classical computers this enabled the development of many important
quantum algorithms.
Second, computer scientists have expended great effort understanding what resources
are required to perform a given computational task on a classical computer. These results
can be used as the basis for a comparison with quantum computation and quantum
information. For example, much attention has been focused on the problem of finding the
prime factors of a given number. On a classical computer this problem is believed to have
no ‘efficient’ solution, where ‘efficient’ has a meaning we’ll explain later in the chapter.
What is interesting is that an efficient solution to this problem is known for quantum
computers. The lesson is that, for this task of finding prime factors, there appears to be a
gap between what is possible on a classical computer and what is possible on a quantum
computer. This is both intrinsically interesting, and also interesting in the broader sense
that it suggests such a gap may exist for a wider class of computational problems than
merely the finding of prime factors. By studying this specific problem further, it may be
possible to discern features of the problem which make it more tractable on a quantum
computer than on a classical computer, and then act on these insights to find interesting
quantum algorithms for the solution of other problems.
Third, and most important, there is learning to think like a computer scientist. Computer scientists think in a rather different style than does a physicist or other natural
scientist. Anybody wanting a deep understanding of quantum computation and quantum
information must learn to think like a computer scientist at least some of the time; they
must instinctively know what problems, what techniques, and most importantly what
problems are of greatest interest to a computer scientist.
The structure of this chapter is as follows. In Section 3.1 we introduce two models for
computation: the Turing machine model, and the circuit model. The Turing machine
model will be used as our fundamental model for computation. In practice, however,
we mostly make use of the circuit model of computation, and it is this model which is
most useful in the study of quantum computation. With our models for computation
in hand, the remainder of the chapter discusses resource requirements for computation.
Section 3.2 begins by overviewing the computational tasks we are interested in as well
as discusing some associated resource questions. It continues with a broad look at the
key concepts of computational complexity, a field which examines the time and space
requirements necessary to solve particular computational problems, and provides a broad
classification of problems based upon their difficulty of solution. Finally, the section
concludes with an examination of the energy resources required to perform computations.
Surprisingly, it turns out that the energy required to perform a computation can be made
vanishingly small, provided one can make the computation reversible. We explain how to
construct reversible computers, and explain some of the reasons they are important both
for computer science and for quantum computation and quantum information. Section 3.3
concludes the chapter with a broad look at the entire field of computer science, focusing
on issues of particular relevance to quantum computation and quantum information.
122
Introduction to computer science
3.1 Models for computation
...algorithms are concepts that have existence apart from any programming
language.
– Donald Knuth
What does it mean to have an algorithm for performing some task? As children we all
learn a procedure which enables us to add together any two numbers, no matter how
large those numbers are. This is an example of an algorithm. Finding a mathematically
precise formulation of the concept of an algorithm is the goal of this section.
Historically, the notion of an algorithm goes back centuries; undergraduates learn
Euclid’s two thousand year old algorithm for finding the greatest common divisor of two
positive integers. However, it wasn’t until the 1930s that the fundamental notions of
the modern theory of algorithms, and thus of computation, were introduced, by Alonzo
Church, Alan Turing, and other pioneers of the computer era. This work arose in response
to a profound challenge laid down by the great mathematician David Hilbert in the early
part of the twentieth century. Hilbert asked whether or not there existed some algorithm
which could be used, in principle, to solve all the problems of mathematics. Hilbert
expected that the answer to this question, sometimes known as the entscheidungsproblem,
would be yes.
Amazingly, the answer to Hilbert’s challenge turned out to be no: there is no algorithm
to solve all mathematical problems. To prove this, Church and Turing had to solve the
deep problem of capturing in a mathematical definition what we mean when we use the
intuitive concept of an algorithm. In so doing, they laid the foundations for the modern
theory of algorithms, and consequently for the modern theory of computer science.
In this chapter, we use two ostensibly different approaches to the theory of computation. The first approach is that proposed by Turing. Turing defined a class of machines,
now known as Turing machines, in order to capture the notion of an algorithm to perform
a computational task. In Section 3.1.1, we describe Turing machines, and then discuss
some of the simpler variants of the Turing machine model. The second approach is via
the circuit model of computation, an approach that is especially useful as preparation for
our later study of quantum computers. The circuit model is described in Section 3.1.2.
Although these models of computation appear different on the surface, it turns out that
they are equivalent. Why introduce more than one model of computation, you may ask?
We do so because different models of computation may yield different insights into the
solution of specific problems. Two (or more) ways of thinking about a concept are better
than one.
3.1.1 Turing machines
The basic elements of a Turing machine are illustrated in Figure 3.1. A Turing machine
contains four main elements: (a) a program, rather like an ordinary computer; (b) a finite
state control, which acts like a stripped-down microprocessor, co-ordinating the other
operations of the machine; (c) a tape, which acts like a computer memory; and (d) a readwrite tape-head, which points to the position on the tape which is currently readable or
writable. We now describe each of these four elements in more detail.
The finite state control for a Turing machine consists of a finite set of internal states,
Models for computation
123
!"
"
% &'
(
#$
Figure 3.1. Main elements of a Turing machine. In the text, blanks on the tape are denoted by a ‘b’. Note the ⊲
marking the left hand end of the tape.
q1 , . . . , qm . The number m is allowed to be varied; it turns out that for m sufficiently
large this does not affect the power of the machine in any essential way, so without loss
of generality we may suppose that m is some fixed constant. The best way to think of
the finite state control is as a sort of microprocessor, co-ordinating the Turing machine’s
operation. It provides temporary storage off-tape, and a central place where all processing
for the machine may be done. In addition to the states q1 , . . . , qm , there are also two special
internal states, labelled qs and qh . We call these the starting state and the halting state,
respectively. The idea is that at the beginning of the computation, the Turing machine
is in the starting state qs . The execution of the computation causes the Turing machine’s
internal state to change. If the computation ever finishes, the Turing machine ends up
in the state qh to indicate that the machine has completed its operation.
The Turing machine tape is a one-dimensional object, which stretches off to infinity
in one direction. The tape consists of an infinite sequence of tape squares. We number
the tape squares 0, 1, 2, 3, . . .. The tape squares each contain one symbol drawn from
some alphabet, Γ, which contains a finite number of distinct symbols. For now, it will
be convenient to assume that the alphabet contains four symbols, which we denote by
0, 1, b (the ‘blank’ symbol), and ⊲, to mark the left hand edge of the tape. Initially, the
tape contains a ⊲ at the left hand end, a finite number of 0s and 1s, and the rest of the
tape contains blanks. The read-write tape-head identifies a single square on the Turing
machine tape as the square that is currently being accessed by the machine.
Summarizing, the machine starts its operation with the finite state control in the state
qs , and with the read-write head at the leftmost tape square, the square numbered 0. The
computation then proceeds in a step by step manner according to the program, to be
defined below. If the current state is qh , then the computation has halted, and the output
of the computation is the current (non-blank) contents of the tape.
A program for a Turing machine is a finite ordered list of program lines of the form
q, x, q ′ , x′ , s. The first item in the program line, q, is a state from the set of internal
states of the machine. The second item, x, is taken from the alphabet of symbols which
may appear on the tape, Γ. The way the program works is that on each machine cycle,
the Turing machine looks through the list of program lines in order, searching for a
line q, x, ·, ·, ·, such that the current internal state of the machine is q, and the symbol
124
Introduction to computer science
being read on the tape is x. If it doesn’t find such a program line, the internal state of the
machine is changed to qh , and the machine halts operation. If such a line is found, then
that program line is executed. Execution of a program line involves the following steps:
the internal state of the machine is changed to q ′ ; the symbol x on the tape is overwritten
by the symbol x′ , and the tape-head moves left, right, or stands still, depending on
whether s is −1, +1, or 0, respectively. The only exception to this rule is if the tape-head
is at the leftmost tape square, and s = −1, in which case the tape-head stays put.
Now that we know what a Turing machine is, let’s see how it may be used to compute
a simple function. Consider the following example of a Turing machine. The machine
starts with a binary number, x, on the tape, followed by blanks. The machine has three
internal states, q1 , q2 , and q3 , in addition to the starting state qs and halting state qh . The
program contains the following program lines (the numbers on the left hand side are for
convenience in referring to the program lines in later discussion, and do not form part
of the program):
1 : qs , ⊲, q1 , ⊲, +1
2 : q1 , 0, q1 , b, +1
3 : q1 , 1, q1 , b, +1
4 : q1 , b, q2 , b, −1
5 : q2 , b, q2 , b, −1
6 : q2 , ⊲, q3 , ⊲, +1
7 : q3 , b, qh , 1, 0.
What function does this program compute? Initially the machine is in the state qs and
at the left-most tape position so line 1, qs , ⊲, q1 , ⊲, +1, is executed, which causes the
tape-head to move right without changing what is written on the tape, but changing the
internal state of the machine to q1 . The next three lines of the program ensure that while
the machine is in the state q1 the tape-head will continue moving right while it reads
either 0s (line 2) or 1s (line 3) on the tape, over-writing the tape contents with blanks as
it goes and remaining in the state q1 , until it reaches a tape square that is already blank,
at which point the tape-head is moved one position to the left, and the internal state is
changed to q2 (line 4). Line 5 then ensures that the tape-head keeps moving left while
blanks are being read by the tape-head, without changing the contents of the tape. This
keeps up until the tape-head returns to its starting point, at which point it reads a ⊲ on
the tape, changes the internal state to q3 , and moves one step to the right (line 6). Line
7 completes the program, simply printing the number 1 onto the tape, and then halting.
The preceding analysis shows that this program computes the constant function f (x) =
1. That is, regardless of what number is input on the tape the number 1 is output. More
generally, a Turing machine can be thought of as computing functions from the nonnegative integers to the non-negative integers; the initial state of the tape is used to
represent the input to the function, and the final state of the tape is used to represent
the output of the function.
It seems as though we have gone to a very great deal of trouble to compute this
simple function using our Turing machines. Is it possible to build up more complicated
functions using Turing machines? For example, could we construct a machine such that
when two numbers, x and y, are input on the tape with a blank to demarcate them, it will
Models for computation
125
output the sum x + y on the tape? More generally, what class of functions is it possible
to compute using a Turing machine?
It turns out that the Turing machine model of computation can be used to compute an
enormous variety of functions. For example, it can be used to do all the basic arithmetical
operations, to search through text represented as strings of bits on the tape, and many
other interesting operations. Surprisingly, it turns out that a Turing machine can be
used to simulate all the operations performed on a modern computer! Indeed, according
to a thesis put forward independently by Church and by Turing, the Turing machine
model of computation completely captures the notion of computing a function using an
algorithm. This is known as the Church–Turing thesis:
The class of functions computable by a Turing machine corresponds exactly to
the class of functions which we would naturally regard as being computable by an
algorithm.
The Church–Turing thesis asserts an equivalence between a rigorous mathematical
concept – function computable by a Turing machine – and the intuitive concept of
what it means for a function to be computable by an algorithm. The thesis derives its
importance from the fact that it makes the study of real-world algorithms, prior to 1936
a rather vague concept, amenable to rigorous mathematical study. To understand the
significance of this point it may be helpful to consider the definition of a continuous
function from real analysis. Every child can tell you what it means for a line to be
continuous on a piece of paper, but it is far from obvious how to capture that intuition in
a rigorous definition. Mathematicians in the nineteenth century spent a great deal of time
arguing about the merits of various definitions of continuity before the modern definition
of continuity came to be accepted. When making fundamental definitions like that of
continuity or of computability it is important that good definitions be chosen, ensuring
that one’s intuitive notions closely match the precise mathematical definition. From this
point of view the Church–Turing thesis is simply the assertion that the Turing machine
model of computation provides a good foundation for computer science, capturing the
intuitive notion of an algorithm in a rigorous definition.
A priori it is not obvious that every function which we would intuitively regard as
computable by an algorithm can be computed using a Turing machine. Church, Turing and many other people have spent a great deal of time gathering evidence for the
Church–Turing thesis, and in sixty years no evidence to the contrary has been found.
Nevertheless, it is possible that in the future we will discover in Nature a process which
computes a function not computable on a Turing machine. It would be wonderful if
that ever happened, because we could then harness that process to help us perform new
computations which could not be performed before. Of course, we would also need to
overhaul the definition of computability, and with it, computer science.
Exercise 3.1: (Non-computable processes in Nature) How might we recognize
that a process in Nature computes a function not computable by a Turing
machine?
Exercise 3.2: (Turing numbers) Show that single-tape Turing machines can each
be given a number from the list 1, 2, 3, . . . in such a way that the number
uniquely specifies the corresponding machine. We call this number the Turing
number of the corresponding Turing machine. (Hint: Every positive integer has
126
Introduction to computer science
a unique prime factorization pa1 1 pa2 2 . . . pakk , where pi are distinct prime numbers,
and a1 , . . . , ak are non-negative integers.)
In later chapters, we will see that quantum computers also obey the Church–Turing
thesis. That is, quantum computers can compute the same class of functions as is computable by a Turing machine. The difference between quantum computers and Turing
machines turns out to lie in the efficiency with which the computation of the function
may be performed – there are functions which can be computed much more efficiently
on a quantum computer than is believed to be possible with a classical computing device
such as a Turing machine.
Demonstrating in complete detail that the Turing machine model of computation can
be used to build up all the usual concepts used in computer programming languages is
beyond the scope of this book (see ‘History and further reading’ at the end of the chapter
for more information). When specifying algorithms, instead of explicitly specifying the
Turing machine used to compute the algorithm, we shall usually use a much higher level
pseudocode, trusting in the Church–Turing thesis that this pseudocode can be translated
into the Turing machine model of computation. We won’t give any sort of rigorous
definition for pseudocode. Think of it as a slightly more formal version of English or, if
you like, a sloppy version of a high-level programming language such as C++ or BASIC.
Pseudocode provides a convenient way of expressing algorithms, without going into the
extreme level of detail required by a Turing machine. An example use of pseudocode
may be found in Box 3.2 on page 130; it is also used later in the book to describe quantum
algorithms.
There are many variants on the basic Turing machine model. We might imagine
Turing machines with different kinds of tapes. For example, one could consider two-way
infinite tapes, or perhaps computation with tapes of more than one dimension. So far
as is presently known, it is not possible to change any aspect of the Turing model in
a way that is physically reasonable, and which manages to extend the class of functions
computable by the model.
As an example consider a Turing machine equipped with multiple tapes. For simplicity
we consider the two-tape case, as the generalization to more than two tapes is clear from
this example. Like the basic Turing machine, a two-tape Turing machine has a finite
number of internal states q1 , . . . , qm , a start state qs , and a halt state qh . It has two tapes,
each of which contain symbols from some finite alphabet of symbols, Γ. As before we
find it convenient to assume that the alphabet contains four symbols, 0, 1, b and ⊲, where
⊲ marks the left hand edge of each tape. The machine has two tape-heads, one for each
tape. The main difference between the two-tape Turing machine and the basic Turing
machine is in the program. Program lines are of the form q, x1 , x2 , q ′ , x′1 , x′2 , s1 , s2 ,
meaning that if the internal state of the machine is q, tape one is reading x1 at its current
position, and tape two is reading x2 at its current position, then the internal state of the
machine should be changed to q ′ , x1 overwritten with x′1 , x2 overwritten with x′2 , and
the tape-heads for tape one and tape two moved according to whether s1 or s2 are equal
to +1, −1 or 0, respectively.
In what sense are the basic Turing machine and the two-tape Turing machine equivalent models of computation? They are equivalent in the sense that each computational
model is able to simulate the other. Suppose we have a two-tape Turing machine which
takes as input a bit string x on the first tape and blanks on the remainder of both tapes,
Models for computation
127
except the endpoint marker ⊲. This machine computes a function f (x), where f (x) is
defined to be the contents of the first tape after the Turing machine has halted. Rather
remarkably, it turns out that given a two-tape Turing machine to compute f , there exists
an equivalent single-tape Turing machine that is also able to compute f . We won’t explain how to do this explicitly, but the basic idea is that the single-tape Turing machine
simulates the two-tape Turing machine, using its single tape to store the contents of both
tapes of the two-tape Turing machine. There is some computational overhead required
to do this simulation, but the important point is that in principle it can always be done.
In fact, there exists a Universal Turing machine (see Box 3.1) which can simulate any
other Turing machine!
Another interesting variant of the Turing machine model is to introduce randomness
into the model. For example, imagine that the Turing machine can execute a program
line whose effect is the following: if the internal state is q and the tape-head reads x,
then flip an unbiased coin. If the coin lands heads, change the internal state to qiH ,
and if it lands tails, change the internal state to qiT , where qiH and qiT are two internal
states of the Turing machine. Such a program line can be represented as q, x, qiH , qiT .
However, even this variant doesn’t change the essential power of the Turing machine
model of computation. It is not difficult to see that we can simulate the effect of the above
algorithm on a deterministic Turing machine by explicitly ‘searching out’ all the possible
computational paths corresponding to different values of the coin tosses. Of course, this
deterministic simulation may be far less efficient than the random model, but the key
point for the present discussion is that the class of functions computable is not changed
by introducing randomness into the underlying model.
Exercise 3.3: (Turing machine to reverse a bit string) Describe a Turing
machine which takes a binary number x as input, and outputs the bits of x in
reverse order. (Hint: In this exercise and the next it may help to use a multi-tape
Turing machine and/or symbols other than ⊲, 0, 1 and the blank.)
Exercise 3.4: (Turing machine to add modulo 2) Describe a Turing machine to
add two binary numbers x and y modulo 2. The numbers are input on the
Turing machine tape in binary, in the form x, followed by a single blank,
followed by a y. If one number is not as long as the other then you may assume
that it has been padded with leading 0s to make the two numbers the same
length.
Let us return to Hilbert’s entscheidungsproblem, the original inspiration for the
founders of computer science. Is there an algorithm to decide all the problems of mathematics? The answer to this question was shown by Church and Turing to be no. In
Box 3.2, we explain Turing’s proof of this remarkable fact. This phenomenon of undecidability is now known to extend far beyond the examples which Church and Turing
constructed. For example, it is known that the problem of deciding whether two topological spaces are topologically equivalent (‘homeomorphic’) is undecidable. There are
simple problems related to the behavior of dynamical systems which are undecidable, as
you will show in Problem 3.4. References for these and other examples are given in the
end of chapter ‘History and further reading’.
Besides its intrinsic interest, undecidability foreshadows a topic of great concern in
computer science, and also to quantum computation and quantum information: the dis-
128
Introduction to computer science
Box 3.1: The Universal Turing Machine
We’ve described Turing machines as containing three elements which may vary
from machine to machine – the initial configuration of the tape, the internal states
of the finite state control, and the program for the machine. A clever idea known
as the Universal Turing Machine (UTM) allows us to fix the program and finite
state control once and for all, leaving the initial contents of the tape as the only part
of the machine which needs to be varied.
The Universal Turing Machine (see the figure below) has the following property.
Let M be any Turing machine, and let TM be the Turing number associated to
machine M . Then on input of the binary representation for TM followed by a blank,
followed by any string of symbols x on the remainder of the tape, the Universal
Turing Machine gives as output whatever machine M would have on input of x.
Thus, the Universal Turing Machine is capable of simulating any other Turing
machine!
)#*
#*
)#*
$
*
The Universal Turing Machine is similar in spirit to a modern programmable
computer, in which the action to be taken by the computer – the ‘program’ – is
stored in memory, analogous to the bit string TM stored at the beginning of the
tape by the Universal Turing Machine. The data to be processed by the program
is stored in a separate part of memory, analogous to the role of x in the Universal
Turing Machine. Then some fixed hardware is used to run the program, producing
the output. This fixed hardware is analogous to the internal states and the (fixed)
program being executed by the Universal Turing Machine.
Describing the detailed construction of a Universal Turing Machine is beyond the
scope of this book. (Though industrious readers may like to attempt the construction.) The key point is the existence of such a machine, showing that a single fixed
machine can be used to run any algorithm whatsoever. The existence of a Universal Turing Machine also explains our earlier statement that the number of internal
states in a Turing machine does not matter much, for provided that number m
exceeds the number needed for a Universal Turing Machine, such a machine can
be used to simulate a Turing machine with any number of internal states.
tinction between problems which are easy to solve, and problems which are hard to solve.
Undecidability provides the ultimate example of problems which are hard to solve – so
hard that they are in fact impossible to solve.
Exercise 3.5: (Halting problem with no inputs) Show that given a Turing
Models for computation
129
machine M there is no algorithm to determine whether M halts when the input
to the machine is a blank tape.
Exercise 3.6: (Probabilistic halting problem) Suppose we number the
probabilistic Turing machines using a scheme similar to that found in
Exercise 3.2 and define the probabilistic halting function hp (x) to be 1 if
machine x halts on input of x with probability at least 1/2 and 0 if machine x
halts on input of x with probability less than 1/2. Show that there is no
probabilistic Turing machine which can output hp (x) with probability of
correctness strictly greater than 1/2 for all x.
Exercise 3.7: (Halting oracle) Suppose a black box is made available to us which
takes a non-negative integer x as input, and then outputs the value of h(x),
where h(·) is the halting function defined in Box 3.2 on page 130. This type of
black box is sometimes known as an oracle for the halting problem. Suppose we
have a regular Turing machine which is augmented by the power to call the
oracle. One way of accomplishing this is to use a two-tape Turing machine, and
add an extra program instruction to the Turing machine which results in the
oracle being called, and the value of h(x) being printed on the second tape,
where x is the current contents of the second tape. It is clear that this model for
computation is more powerful than the conventional Turing machine model,
since it can be used to compute the halting function. Is the halting problem for
this model of computation undecidable? That is, can a Turing machine aided by
an oracle for the halting problem decide whether a program for the Turing
machine with oracle will halt on a particular input?
3.1.2 Circuits
Turing machines are rather idealized models of computing devices. Real computers are
finite in size, whereas for Turing machines we assumed a computer of unbounded size.
In this section we investigate an alternative model of computation, the circuit model, that
is equivalent to the Turing machine in terms of computational power, but is more convenient and realistic for many applications. In particular the circuit model of computation
is especially important as preparation for our investigation of quantum computers.
A circuit is made up of wires and gates, which carry information around, and perform
simple computational tasks, respectively. For example, Figure 3.2 shows a simple circuit
gate, which flips
which takes as input a single bit, a. This bit is passed through a
gate serve merely to
the bit, taking 1 to 0 and 0 to 1. The wires before and after the
gate; they can represent movement of the bit through
carry the bit to and from the
space, or perhaps just through time.
More generally, a circuit may involve many input and output bits, many wires, and
many logical gates. A logic gate is a function f : {0, 1}k → {0, 1}l from some fixed
gate
number k of input bits to some fixed number l of output bits. For example, the
is a gate with one input bit and one output bit which computes the function f (a) = 1 ⊕ a,
where a is a single bit, and ⊕ is modulo 2 addition. It is also usual to make the convention
that no loops are allowed in the circuit, to avoid possible instabilities, as illustrated in
Figure 3.3. We say such a circuit is acyclic, and we adhere to the convention that circuits
in the circuit model of computation be acyclic.
130
Introduction to computer science
Box 3.2: The halting problem
In Exercise 3.2 you showed that each Turing machine can be uniquely associated
with a number from the list 1, 2, 3, . . .. To solve Hilbert’s problem, Turing used this
numbering to pose the halting problem: does the machine with Turing number x
halt upon input of the number y? This is a well posed and interesting mathematical
problem. After all, it is a matter of some considerable interest to us whether our
algorithms halt or not. Yet it turns out that there is no algorithm which is capable of
solving the halting problem. To see this, Turing asked whether there is an algorithm
to solve an even more specialized problem: does the machine with Turing number
x halt upon input of the same number x? Turing defined the halting function,
*
0 if machine number x does not halt upon input of x
h(x) ≡
1 if machine number x halts upon input of x.
If there is an algorithm to solve the halting problem, then there surely is an algorithm to evaluate h(x). We will try to reach a contradiction by supposing such
an algorithm exists, denoted by HALT(x). Consider an algorithm computing the
function TURING(x), with pseudocode
TURING(x)
y = HALT(x)
if y = 0 then
halt
else
loop forever
end if
Since HALT is a valid program, TURING must also be a valid program, with
some Turing number, t. By definition of the halting function, h(t) = 1 if and only
if TURING halts on input of t. But by inspection of the program for TURING,
we see that TURING halts on input of t if and only if h(t) = 0. Thus h(t) = 1 if
and only if h(t) = 0, a contradiction. Therefore, our initial assumption that there
is an algorithm to evaluate h(x) must have been wrong. We conclude that there is
no algorithm allowing us to solve the halting problem.
=
Figure 3.2. Elementary circuit performing a single
=
gate on a single input bit.
There are many other elementary logic gates which are useful for computation. A
gate, the
gate, the
gate, the
gate, and the
partial list includes the
gate. Each of these gates takes two bits as input, and produces a single bit as output.
gate outputs 1 if and only if both of its inputs are 1. The
gate outputs 1 if
The
Models for computation
131
Figure 3.3. Circuits containing cycles can be unstable, and are not usually permitted in the circuit model of
computation.
and only if at least one of its inputs is 1. The
gate outputs the sum, modulo 2, of
and
gates take the
and , respectively, of their inputs,
its inputs. The
and then apply a
to whatever is output. The action of these gates is illustrated in
Figure 3.4.
=
NOT
=
>
=
= AND >
=
>
=
>
= OR >
= XOR >
=
>
= NAND >
=
>
= NOR >
Figure 3.4. Elementary circuits performing the
,
,
,
, and
gates.
There are two important ‘gates’ missing from Figure 3.4, namely the
gate and
gate. In circuits we often allow bits to ‘divide’, replacing a bit with two
the
copies of itself, an operation referred to as
. We also allow bits to
,
that is, the value of two bits are interchanged. A third operation missing from Figure 3.4,
not really a logic gate at all, is to allow the preparation of extra ancilla or work bits, to
allow extra working space during the computation.
These simple circuit elements can be put together to perform an enormous variety of
computations. Below we’ll show that these elements can be used to compute any function
whatsoever. In the meantime, let’s look at a simple example of a circuit which adds two
n bit integers, using essentially the same algorithm taught to school-children around the
132
Introduction to computer science
world. The basic element in this circuit is a smaller circuit known as a half-adder, shown
in Figure 3.5. A half-adder takes two bits, x and y, as input, and outputs the sum of
the bits x ⊕ y modulo 2, together with a carry bit set to 1 if x and y are both 1, or 0
otherwise.
N
?
N ÅO
O
Figure 3.5. Half-adder circuit. The carry bit c is set to 1 when x and y are both 1, otherwise it is 0.
Two cascaded half-adders may be used to build a full-adder, as shown in Figure 3.6.
A full-adder takes as input three bits, x, y, and c. The bits x and y should be thought
of as data to be added, while c is a carry bit from an earlier computation. The circuit
outputs two bits. One output bit is the modulo 2 sum, x ⊕ y ⊕ c of all three input bits.
The second output bit, c′ , is a carry bit, which is set to 1 if two or more of the inputs is
1, and is 0 otherwise.
N
O
?
?
(
(
N ÅO Å?
Figure 3.6. Full-adder circuit.
By cascading many of these full-adders together we obtain a circuit to add two n-bit
integers, as illustrated in Figure 3.7 for the case n = 3.
N
O
N
O
N
O
!
!
(
Figure 3.7. Addition circuit for two three-bit integers, x = x2 x1 x0 and y = y2 y1 y0 , using the elementary
algorithm taught to school-children.
We claimed earlier that just a few fixed gates can be used to compute any function
f : {0, 1}n → {0, 1}m whatsoever. We will now prove this for the simplified case of a
function f : {0, 1}n → {0, 1} with n input bits and a single output bit. Such a function
Models for computation
133
is known as a Boolean function, and the corresponding circuit is a Boolean circuit. The
general universality proof follows immediately from the special case of Boolean functions.
The proof is by induction on n. For n = 1 there are four possible functions: the identity,
which has a circuit consisting of a single wire; the bit flip, which is implemented using
gate; the function which replaces the input bit with a 0, which can be
a single
ing the input with a work bit initially in the 0 state; and the function
obtained by
which replaces the input with a 1, which can be obtained by ing the input with a work
bit initially in the 1 state.
To complete the induction, suppose that any function on n bits may be computed
by a circuit, and let f be a function on n + 1 bits. Define n-bit functions f0 and f1
by f0 (x1 , . . . , xn ) ≡ f (0, x1 , . . . , xn ) and f1 (x1 , . . . , xn ) ≡ f (1, x1 , . . . , xn ). These are
both n-bit functions, so by the inductive hypothesis there are circuits to compute these
functions.
It is now an easy matter to design a circuit which computes f . The circuit computes
both f0 and f1 on the last n bits of the input. Then, depending on whether the first bit of
the input was a 0 or a 1 it outputs the appropriate answer. A circuit to do this is shown
in Figure 3.8. This completes the induction.
+#
.
.
.
.
.
.
,
-+%
.
.
.
,
Figure 3.8. Circuit to compute an arbitrary function f on n + 1 bits, assuming by induction that there are circuits
to compute the n-bit functions f0 and f1 .
Five elements may be identified in the universal circuit construction: (1) wires, which
preserve the states of the bits; (2) ancilla bits prepared in standard states, used in the
operation, which takes a single bit as input
n = 1 case of the proof; (3) the
operation, which interchanges
and outputs two copies of that bit; (4) the
,
, and
gates. In Chapter 4 we’ll define
the value of two bits; and (5) the
the quantum circuit model of computation in a manner analogous to classical circuits. It
is interesting to note that many of these five elements pose some interesting challenges
when extending to the quantum case: it is not necessarily obvious that good quantum
wires for the preservation of qubits can be constructed, even in principle, the
134
Introduction to computer science
operation cannot be performed in a straightforward manner in quantum mechanics, due
and
gates
to the no-cloning theorem (as explained in Section 1.3.5), and the
are not invertible, and thus can’t be implemented in a straightforward manner as unitary
quantum gates. There is certainly plenty to think about in defining a quantum circuit
model of computation!
Exercise 3.8: (Universality of
simulate the
,
and
are available.
) Show that the
gate can be used to
gates, provided wires, ancilla bits and
Let’s return from our brief quantum digression, to the properties of classical circuits.
We claimed earlier that the Turing machine model is equivalent to the circuit model of
computation. In what sense do we mean the two models are equivalent? On the face of
it, the two models appear quite different. The unbounded nature of a Turing machine
makes them more useful for abstractly specifying what it is we mean by an algorithm,
while circuits more closely capture what an actual physical computer does.
The two models are connected by introducing the notion of a uniform circuit family.
A circuit family consists of a collection of circuits, {Cn }, indexed by a positive integer
n. The circuit Cn has n input bits, and may have any finite number of extra work bits,
and output bits. The output of the circuit Cn , upon input of a number x of at most n
bits in length, is denoted by Cn (x). We require that the circuits be consistent, that is, if
m < n and x is at most m bits in length, then Cm (x) = Cn (x). The function computed
by the circuit family {Cn } is the function C(·) such that if x is n bits in length then
C(x) = Cn (x). For example, consider a circuit Cn that squares an n-bit number. This
defines a family of circuits {Cn } that computes the function, C(x) = x2 , where x is any
positive integer.
It’s not enough to consider unrestricted families of circuits, however. In practice, we
need an algorithm to build the circuit. Indeed, if we don’t place any restrictions on the
circuit family then it becomes possible to compute all sorts of functions which we do
not expect to be able to compute in a reasonable model of computation. For example, let
hn (x) denote the halting function, restricted to values of x which are n bits in length.
Thus hn is a function from n bits to 1 bit, and we have proved there exists a circuit
Cn to compute hn (·). Therefore the circuit family {Cn } computes the halting function!
However, what prevents us from using this circuit family to solve the halting problem is
that we haven’t specified an algorithm which will allow us to build the circuit Cn for all
values of n. Adding this requirement results in the notion of a uniform circuit family.
That is, a family of circuits {Cn } is said to be a uniform circuit family if there is some
algorithm running on a Turing machine which, upon input of n, generates a description
of Cn . That is, the algorithm outputs a description of what gates are in the circuit Cn ,
how those gates are connected together to form a circuit, any ancilla bits needed by
and
operations, and where the output from the circuit
the circuit,
should be read out. For example, the family of circuits we described earlier for squaring
n-bit numbers is certainly a uniform circuit family, since there is an algorithm which,
given n, outputs a description of the circuit needed to square an n-bit number. You can
think of this algorithm as the means by which an engineer is able to generate a description
of (and thus build) the circuit for any n whatsoever. By contrast, a circuit family that is
not uniform is said to be a non-uniform circuit family. There is no algorithm to construct
The analysis of computational problems
135
the circuit for arbitrary n, which prevents our engineer from building circuits to compute
functions like the halting function.
Intuitively, a uniform circuit family is a family of circuits that can be generated by some
reasonable algorithm. It can be shown that the class of functions computable by uniform
circuit families is exactly the same as the class of functions which can be computed on a
Turing machine. With this uniformity restriction, results in the Turing machine model
of computation can usually be given a straightforward translation into the circuit model
of computation, and vice versa. Later we give similar attention to issues of uniformity in
the quantum circuit model of computation.
3.2 The analysis of computational problems
The analysis of computational problems depends upon the answer to three fundamental
questions:
(1) What is a computational problem? Multiplying two numbers together is a
computational problem; so is programming a computer to exceed human abilities in
the writing of poetry. In order to make progress developing a general theory for the
analysis of computational problems we are going to isolate a special class of
problems known as decision problems, and concentrate our analysis on those.
Restricting ourselves in this way enables the development of a theory which is both
elegant and rich in structure. Most important, it is a theory whose principles have
application far beyond decision problems.
(2) How may we design algorithms to solve a given computational problem?
Once a problem has been specified, what algorithms can be used to solve the
problem? Are there general techniques which can be used to solve wide classes of
problems? How can we be sure an algorithm behaves as claimed?
(3) What are the minimal resources required to solve a given computational
problem? Running an algorithm requires the consumption of resources, such as
time, space, and energy. In different situations it may be desirable to minimize
consumption of one or more resource. Can we classify problems according to the
resource requirements needed to solve them?
In the next few sections we investigate these three questions, especially questions 1
and 3. Although question 1, ‘what is a computational problem?’, is perhaps the most
fundamental of the questions, we shall defer answering it until Section 3.2.3, pausing first
to establish some background notions related to resource quantification in Section 3.2.1,
and then reviewing the key ideas of computational complexity in Section 3.2.2.
Question 2, how to design good algorithms, is the subject of an enormous amount of
ingenious work by many researchers. So much so that in this brief introduction we cannot
even begin to describe the main ideas employed in the design of good algorithms. If you
are interested in this beautiful subject, we refer you to the end of chapter ‘History and
further reading’. Our closest direct contact with this subject will occur later in the book,
when we study quantum algorithms. The techniques involved in the creation of quantum
algorithms have typically involved a blend of deep existing ideas in algorithm design for
classical computers, and the creation of new, wholly quantum mechanical techniques for
algorithm design. For this reason, and because the spirit of quantum algorithm design
136
Introduction to computer science
is so similar in many ways to classical algorithm design, we encourage you to become
familiar with at least the basic ideas of algorithm design.
Question 3, what are the minimal resources required to solve a given computational
problem, is the main focus of the next few sections. For example, suppose we are given
two numbers, each n bits in length, which we wish to multiply. If the multiplication
is performed on a single-tape Turing machine, how many computational steps must be
executed by the Turing machine in order to complete the task? How much space is used
on the Turing machine while completing the task?
These are examples of the type of resource questions we may ask. Generally speaking, computers make use of many different kinds of resources, however we will focus
most of our attention on time, space, and energy. Traditionally in computer science, time
and space have been the two major resource concerns in the study of algorithms, and
we study these issues in Sections 3.2.2 through 3.2.4. Energy has been a less important consideration; however, the study of energy requirements motivates the subject of
reversible classical computation, which in turn is a prerequisite for quantum computation, so we examine energy requirements for computation in some considerable detail in
Section 3.2.5.
3.2.1 How to quantify computational resources
Different models of computation lead to different resource requirements for computation. Even something as simple as changing from a single-tape to a two-tape Turing
machine may change the resources required to solve a given computational problem. For
a computational task which is extremely well understood, like addition of integers, for
example, such differences between computational models may be of interest. However,
for a first pass at understanding a problem, we would like a way of quantifying resource
requirements that is independent of relatively trivial changes in the computational model.
One of the tools which has been developed to do this is the asymptotic notation, which
can be used to summarize the essential behavior of a function. This asymptotic notation
can be used, for example, to summarize the essence of how many time steps it takes a
given algorithm to run, without worrying too much about the exact time count. In this
section we describe this notation in detail, and apply it to a simple problem illustrating
the quantification of computational resources – the analysis of algorithms for sorting a
list of names into alphabetical order.
Suppose, for example, that we are interested in the number of gates necessary to add
together two n-bit numbers. Exact counts of the number of gates required obscure the
big picture: perhaps a specific algorithm requires 24n + 2⌈log n⌉ + 16 gates to perform
this task. However, in the limit of large problem size the only term which matters is the
24n term. Furthermore, we disregard constant factors as being of secondary importance
to the analysis of the algorithm. The essential behavior of the algorithm is summed up
by saying that the number of operations required scales like n, where n is the number of
bits in the numbers being added. The asymptotic notation consists of three tools which
make this notion precise.
The O (‘big O’) notation is used to set upper bounds on the behavior of a function.
Suppose f (n) and g(n) are two functions on the non-negative integers. We say ‘f (n) is
in the class of functions O(g(n))’, or just ‘f (n) is O(g(n))’, if there are constants c and
n0 such that for all values of n greater than n0 , f (n) ≤ cg(n). That is, for sufficiently
large n, the function g(n) is an upper bound on f (n), up to an unimportant constant
The analysis of computational problems
137
factor. The big O notation is particularly useful for studying the worst-case behavior of
specific algorithms, where we are often satisfied with an upper bound on the resources
consumed by an algorithm.
When studying the behaviors of a class of algorithms – say the entire class of algorithms
which can be used to multiply two numbers – it is interesting to set lower bounds on
the resources required. For this the Ω (‘big Omega’) notation is used. A function f (n)
is said to be Ω(g(n)) if there exist constants c and n0 such that for all n greater than n0 ,
cg(n) ≤ f (n). That is, for sufficiently large n, g(n) is a lower bound on f (n), up to an
unimportant constant factor.
Finally, the Θ (‘big Theta’) notation is used to indicate that f (n) behaves the same as
g(n) asymptotically, up to unimportant constant factors. That is, we say f (n) is Θ(g(n))
if it is both O(g(n)) and Ω(g(n)).
Asymptotic notation: examples
Let’s consider a few simple examples of the asymptotic notation. The function 2n is
in the class O(n2 ), since 2n ≤ 2n2 for all positive n. The function 2n is Ω(n3 ), since
√
n3 ≤ 2n for sufficiently large n. Finally, the function 7n2 + n log(n) is Θ(n2 ), since
√
7n2 ≤ 7n2 + n log(n) ≤ 8n2 for all sufficiently large values of n. In the following
few exercises you will work through some of the elementary properties of the asymptotic
notation that make it a useful tool in the analysis of algorithms.
Exercise 3.9: Prove that f (n) is O(g(n)) if and only if g(n) is Ω(f (n)). Deduce that
f (n) is Θ(g(n)) if and only if g(n) is Θ(f (n)).
Exercise 3.10: Suppose g(n) is a polynomial of degree k. Show that g(n) is O(nl ) for
any l ≥ k.
Exercise 3.11: Show that log n is O(nk ) for any k > 0.
Exercise 3.12: (nlog n is super-polynomial) Show that nk is O(nlog n ) for any k, but
that nlog n is never O(nk ).
Exercise 3.13: (nlog n is sub-exponential) Show that cn is Ω(nlog n ) for any c > 1,
but that nlog n is never Ω(cn ).
Exercise 3.14: Suppose e(n) is O(f (n)) and g(n) is O(h(n)). Show that e(n)g(n) is
O(f (n)h(n)).
An example of the use of the asymptotic notation in quantifying resources is the
following simple application to the problem of sorting an n element list of names into
alphabetical order. Many sorting algorithms are based upon the ‘compare-and-swap’
operation: two elements of an n element list are compared, and swapped if they are in
the wrong order. If this compare-and-swap operation is the only means by which we can
access the list, how many such operations are required in order to ensure that the list has
been correctly sorted?
A simple compare-and-swap algorithm for solving the sorting problem is as follows:
(note that compare-and-swap(j,k) compares the list entries numbered j and k, and
swaps them if they are out of order)
138
Introduction to computer science
for j = 1 to n-1
for k = j+1 to n
compare-and-swap(j,k)
end k
end j
It is clear that this algorithm correctly sorts a list of n names into alphabetical order.
Note that the number of compare-and-swap operations executed by the algorithm is
(n − 1) + (n − 2) + · · · + 1 = n(n − 1)/2. Thus the number of compare-and-swap
operations used by the algorithm is Θ(n2 ). Can we do better than this? It turns out that we
can. Algorithms such as ‘heapsort’ are known which run using O(n log n) compare-andswap operations. Furthermore, in Exercise 3.15 you’ll work through a simple counting
argument that shows any algorithm based upon the compare-and-swap operation requires
Ω(n log n) such operations. Thus, the sorting problem requires Θ(n log n) compare-andswap operations, in general.
Exercise 3.15: (Lower bound for compare-and-swap based sorts) Suppose an n
element list is sorted by applying some sequence of compare-and-swap
operations to the list. There are n! possible initial orderings of the list. Show that
after k of the compare-and-swap operations have been applied, at most 2k of the
possible initial orderings will have been sorted into the correct order. Conclude
that Ω(n log n) compare-and-swap operations are required to sort all possible
initial orderings into the correct order.
3.2.2 Computational complexity
The idea that there won’t be an algorithm to solve it – this is something fundamental that won’t ever change – that idea appeals to me.
– Stephen Cook
Sometimes it is good that some things are impossible. I am happy there are
many things that nobody can do to me.
– Leonid Levin
It should not come as a surprise that our choice of polynomial algorithms as
the mathematical concept that is supposed to capture the informal notion of
‘practically efficient computation’ is open to criticism from all sides. [. . . ] Ultimately, our argument for our choice must be this: Adopting polynomial
worst-case performance as our criterion of efficiency results in an
elegant and useful theory that says something meaningful about
practical computation, and would be impossible without this simplification.
– Christos Papadimitriou
What time and space resources are required to perform a computation? In many cases
these are the most important questions we can ask about a computational problem. Problems like addition and multiplication of numbers are regarded as efficiently solvable
because we have fast algorithms to perform addition and multiplication, which consume
The analysis of computational problems
139
little space when running. Many other problems have no known fast algorithm, and are
effectively impossible to solve, not because we can’t find an algorithm to solve the problem, but because all known algorithms consume such vast quantities of space or time as
to render them practically useless.
Computational complexity is the study of the time and space resources required to
solve computational problems. The task of computational complexity is to prove lower
bounds on the resources required by the best possible algorithm for solving a problem,
even if that algorithm is not explicitly known. In this and the next two sections, we
give an overview of computational complexity, its major concepts, and some of the more
important results of the field. Note that computational complexity is in a sense complementary to the field of algorithm design; ideally, the most efficient algorithms we could
design would match perfectly with the lower bounds proved by computational complexity. Unfortunately, this is often not the case. As already noted, in this book we won’t
examine classical algorithm design in any depth.
One difficulty in formulating a theory of computational complexity is that different
computational models may require different resources to solve the same problem. For instance, multiple-tape Turing machines can solve many problems substantially faster than
single-tape Turing machines. This difficulty is resolved in a rather coarse way. Suppose
a problem is specified by giving n bits as input. For instance, we might be interested in
whether a particular n-bit number is prime or not. The chief distinction made in computational complexity is between problems which can be solved using resources which
are bounded by a polynomial in n, or which require resources which grow faster than
any polynomial in n. In the latter case we usually say that the resources required are
exponential in the problem size, abusing the term exponential, since there are functions
like nlog n which grow faster than any polynomial (and thus are ‘exponential’ according to this convention), yet which grow slower than any true exponential. A problem
is regarded as easy, tractable or feasible if an algorithm for solving the problem using
polynomial resources exists, and as hard, intractable or infeasible if the best possible
algorithm requires exponential resources.
As a simple example, suppose we have two numbers with binary expansions x1 . . . xm1
and y1 . . . ym2 , and we wish to determine the sum of the two numbers. The total size of
the input is n ≡ m1 + m2 . It’s easy to see that the two numbers can be added using a
number of elementary operations that scales as Θ(n); this algorithm uses a polynomial
(indeed, linear) number of operations to perform its tasks. By contrast, it is believed
(though it has never been proved!) that the problem of factoring an integer into its prime
factors is intractable. That is, the belief is that there is no algorithm which can factor
an arbitrary n-bit integer using O(p(n)) operations, where p is some fixed polynomial
function of n. We will later give many other examples of problems which are believed to
be intractable in this sense.
The polynomial versus exponential classification is rather coarse. In practice, an algorithm that solves a problem using 2n/1000 operations is probably more useful than one
which runs in n1000 operations. Only for very large input sizes (n ≈ 108 ) will the ‘efficient’ polynomial algorithm be preferable to the ‘inefficient’ exponential algorithm, and
for many purposes it may be more practical to prefer the ‘inefficient’ algorithm.
Nevertheless, there are many reasons to base computational complexity primarily on
the polynomial versus exponential classification. First, historically, with few exceptions,
polynomial resource algorithms have been much faster than exponential algorithms. We
140
Introduction to computer science
might speculate that the reason for this is lack of imagination: coming up with algorithms
requiring n, n2 or some other low degree polynomial number of operations is often much
easier than finding a natural algorithm which requires n1000 operations, although examples
like the latter do exist. Thus, the predisposition for the human mind to come up with
relatively simple algorithms has meant that in practice polynomial algorithms usually do
perform much more efficiently than their exponential cousins.
A second and more fundamental reason for emphasizing the polynomial versus exponential classification is derived from the strong Church–Turing thesis. As discussed in
Section 1.1, it was observed in the 1960s and 1970s that probabilistic Turing machines
appear to be the strongest ‘reasonable’ model of computation. More precisely, researchers
consistently found that if it was possible to compute a function using k elementary operations in some model that was not the probabilistic Turing machine model of computation,
then it was always possible to compute the same function in the probabilistic Turing machine model, using at most p(k) elementary operations, where p(·) is some polynomial
function. This statement is known as the strong Church–Turing thesis:
Strong Church–Turing thesis: Any model of computation can be simulated
on a probabilistic Turing machine with at most a polynomial increase in the
number of elementary operations required.
The strong Church–Turing thesis is great news for the theory of computational complexity, for it implies that attention may be restricted to the probabilistic Turing machine
model of computation. After all, if a problem has no polynomial resource solution on
a probabilistic Turing machine, then the strong Church–Turing thesis implies that it
has no efficient solution on any computing device. Thus, the strong Church–Turing
thesis implies that the entire theory of computational complexity will take on an elegant, model-independent form if the notion of efficiency is identified with polynomial
resource algorithms, and this elegance has provided a strong impetus towards acceptance
of the identification of ‘solvable with polynomial resources’ and ‘efficiently solvable’. Of
course, one of the prime reasons for interest in quantum computers is that they cast
into doubt the strong Church–Turing thesis, by enabling the efficient solution of a problem which is believed to be intractable on all classical computers, including probabilistic
Turing machines! Nevertheless, it is useful to understand and appreciate the role the
strong Church–Turing thesis has played in the search for a model-independent theory
of computational complexity.
Finally, we note that, in practice, computer scientists are not only interested in the
polynomial versus exponential classification of problems. This is merely the first and
coarsest way of understanding how difficult a computational problem is. However, it
is an exceptionally important distinction, and illustrates many broader points about the
nature of resource questions in computer science. For most of this book, it will be our
central concern in evaluating the efficiency of a given algorithm.
Having examined the merits of the polynomial versus exponential classification, we
now have to confess that the theory of computational complexity has one remarkable
outstanding failure: it seems very hard to prove that there are interesting classes of problems which require exponential resources to solve. It is quite easy to give non-constructive
proofs that most problems require exponential resources (see Exercise 3.16, below), and
furthermore many interesting problems are conjectured to require exponential resources
for their solution, but rigorous proofs seem very hard to come by, at least with the present
The analysis of computational problems
141
state of knowledge. This failure of computational complexity has important implications
for quantum computation, because it turns out that the computational power of quantum
computers can be related to some major open problems in classical computational complexity theory. Until these problems are resolved, it cannot be stated with certainty how
computationally powerful a quantum computer is, or even whether it is more powerful
than a classical computer!
Exercise 3.16: (Hard-to-compute functions exist) Show that there exist Boolean
functions on n inputs which require at least 2n / log n logic gates to compute.
3.2.3 Decision problems and the complexity classes P and NP
Many computational problems are most cleanly formulated as decision problems – problems with a yes or no answer. For example, is a given number m a prime number or not?
This is the primality decision problem. The main ideas of computational complexity are
most easily and most often formulated in terms of decision problems, for two reasons:
the theory takes its simplest and most elegant form in this form, while still generalizing
in a natural way to more complex scenarios; and historically computational complexity
arose primarily from the study of decision problems.
Although most decision problems can easily be stated in simple, familiar language,
discussion of the general properties of decision problems is greatly helped by the terminology of formal languages. In this terminology, a language L over the alphabet Σ is a
subset of the set Σ∗ of all (finite) strings of symbols from Σ. For example, if Σ = {0, 1},
then the set of binary representations of even numbers, L = {0, 10, 100, 110, . . .} is a
language over Σ.
Decision problems may be encoded in an obvious way as problems about languages.
For instance, the primality decision problem can be encoded using the binary alphabet
Σ = {0, 1}. Strings from Σ∗ can be interpreted in a natural way as non-negative integers.
For example, 0010 can be interpreted as the number 2. The language L is defined to
consist of all binary strings such that the corresponding number is prime.
To solve the primality decision problem, what we would like is a Turing machine
which, when started with a given number n on its input tape, eventually outputs some
equivalent of ‘yes’ if n is prime, and outputs ‘no’ if n is not prime. To make this idea
precise, it is convenient to modify our old Turing machine definition (of Section 3.1.1)
slightly, replacing the halting state qh with two states qY and qN to represent the answers
‘yes’ and ‘no’ respectively. In all other ways the machine behaves in the same way, and
it still halts when it enters the state qY or qN . More generally, a language L is decided
by a Turing machine if the machine is able to decide whether an input x on its tape is
a member of the language of L or not, eventually halting in the state qY if x ∈ L, and
eventually halting in the state qN if x ∈ L. We say that the machine has accepted or
rejected x depending on which of these two cases comes about.
How quickly can we determine whether or not a number is prime? That is, what is the
fastest Turing machine which decides the language representing the primality decision
problem? We say that a problem is in TIME(f (n)) if there exists a Turing machine
which decides whether a candidate x is in the language in time O(f (n)), where n is the
length of x. A problem is said to be solvable in polynomial time if it is in TIME(nk )
for some finite k. The collection of all languages which are in TIME(nk ), for some k,
is denoted P. P is our first example of a complexity class. More generally, a complexity
142
Introduction to computer science
class is defined to be a collection of languages. Much of computational complexity theory
is concerned with the definition of various complexity classes, and understanding the
relationship between different complexity classes.
Not surprisingly, there are problems which cannot be solved in polynomial time.
Unfortunately, proving that any given problem can’t be solved in polynomial time seems
to be very difficult, although conjectures abound! A simple example of an interesting
decision problem which is believed not to be in P is the factoring decision problem:
: Given a composite integer m and l < m, does m have a non-trivial
factor less than l?
An interesting property of factoring is that if somebody claims that the answer is ‘yes,
m does have a non-trivial factor less than l’ then they can establish this by exhibiting
such a factor, which can then be efficiently checked by other parties, simply by doing
long-division. We call such a factor a witness to the fact that m has a factor less than l.
This idea of an easily checkable witness is the key idea in the definition of the complexity
class NP, below. We have phrased factoring as a decision problem, but you can easily
verify that the decision problem is equivalent to finding the factors of a number:
Exercise 3.17: Prove that a polynomial-time algorithm for finding the factors of a
number m exists if and only if the factoring decision problem is in P.
Factoring is an example of a problem in an important complexity class known as NP.
What distinguishes problems in NP is that ‘yes’ instances of a problem can easily be
verified with the aid of an appropriate witness. More rigorously, a language L is in NP
if there is a Turing machine M with the following properties:
(1) If x ∈ L then there exists a witness string w such that M halts in the state qY after
a time polynomial in |x| when the machine is started in the state x-blank-w.
(2) If x ∈ L then for all strings w which attempt to play the role of a witness, the
machine halts in state qN after a time polynomial in |x| when M is started in the
state x-blank-w.
There is an interesting asymmetry in the definition of NP. While we have to be able
to quickly decide whether a possible witness to x ∈ L is truly a witness, there is no such
need to produce a witness to x ∈ L. For instance, in the factoring problem, we have
an easy way of proving that a given number has a factor less than m, but exhibiting a
witness to prove that a number has no factors less than m is more daunting. This suggests
defining coNP, the class of languages which have witnesses to ‘no’ instances; obviously
the languages in coNP are just the complements of languages in NP.
How are P and NP related? It is clear that P is a subset of NP. The most famous
open problem in computer science is whether or not there are problems in NP which are
not in P, often abbreviated as the P = NP problem. Most computer scientists believe
that P = NP, but despite decades of work nobody has been able to prove this, and the
possibility remains that P = NP.
Exercise 3.18: Prove that if coNP = NP then P = NP.
Upon first acquaintance it’s tempting to conclude that the conjecture P = NP ought
to be pretty easy to resolve. To see why it’s actually rather subtle it helps to see couple of
The analysis of computational problems
143
examples of problems that are in P and NP. We’ll draw the examples from graph theory,
a rich source of decision problems with surprisingly many practical applications. A graph
is a finite collection of vertices {v1 , . . . , vn } connected by edges, which are pairs (vi , vj )
of vertices. For now, we are only concerned with undirected graphs, in which the order
of the vertices (in each edge pair) does not matter; similar ideas can be investigated for
directed graphs in which the order of vertices does matter. A typical graph is illustrated
in Figure 3.9.
70 61 52 43
70 61 52 43
❥ ❥ ❥
⑧
❥ ❥ ❥ ❥ ⑧⑧⑧
❥
⑧⑧ ⑧
❥
❥
⑧
❥ ❥
⑧⑧
⑧⑧
❥ ❥ ❥ ❥
⑧⑧
⑧⑧
❥ ❥ ❥ ❥
❥
⑧
❥
❥
⑧⑧ ⑧
⑧
❥
⑧
⑧⑧
❥ ❥ ❥ ❥
⑧⑧ ❥ ❥ ❥ ❥
⑧⑧
70 61 52 43 ❥ ❥
70 16 25 34
⑧ ❄❄❄
❄❄
⑧⑧
❄❄
⑧⑧ ⑧
❄❄
⑧⑧ ⑧
❄❄
❄❄
⑧⑧
❄
⑧⑧ ⑧
70 61 52 43
70 61 52 43
Figure 3.9. A graph.
A cycle in a graph is a sequence v1 , . . . , vm of vertices such that each pair (vj , vj+1 ) is
an edge, as is (v1 , vm ). A simple cycle is a cycle in which none of the vertices is repeated,
except for the first and last vertices. A Hamiltonian cycle is a simple cycle which visits
every vertex in the graph. Examples of graphs with and without Hamiltonian cycles are
shown in Figure 3.10.
70 61 52 43
70 61 52 43
❥ ❥ ❥
⑧⑧
❥ ❥ ❥ ❥ ⑧⑧⑧
❥
❥
❥
⑧⑧ ⑧
⑧
❥ ❥
⑧
⑧⑧
t❥ ❥ ❥ ❥
⑧ ? ?⑧
⑧⑧
❥ ❥ ❥ t❥
❥
⑧
❥
❥
⑧⑧ ⑧
⑧
❥
⑧
⑧⑧
❥ ❥ ❥ ❥
⑧⑧ ❥ ❥ ❥ ❥
⑧⑧
70 61 52 43 ❥ ❥
70 61 52 43
O
⑧ ❄❄❄
❄❄
⑧⑧ ⑧
❄❄
⑧
❄ ❄
? ?⑧ ⑧
❄❄
⑧⑧ ⑧
❄❄
⑧⑧
❄
⑧⑧
70 61 52 43
70 61 52 43
70 61 52 43
⑧⑧
❥
⑧⑧ ⑧
❥ ❥ ❥ ❥
⑧
❥ ❥ ❥ ❥
❥
❥
⑧⑧ ⑧
❥
⑧
❥ ❥ ❥ ❥
⑧⑧
❥ ❥ ❥ ❥
⑧❥⑧ ❥ ❥ ❥ ❥ ❥
⑧⑧
70 61 52 43
70 61 52 43
⑧⑧
⑧⑧
⑧⑧ ⑧
⑧
⑧⑧
⑧⑧ ⑧
⑧
70 61 52 43
70 61 52 43
❥ ❥ ❥ ❥⑧
❥ ❥ ❥ ❥ ⑧⑧⑧
⑧⑧
⑧⑧
⑧⑧ ⑧
⑧
70 61 52 43
Figure 3.10. The graph on the left contains a Hamiltonian cycle, 0, 1, 2, 3, 4, 5, 0. The graph on the right contains
no Hamiltonian cycle, as can be verified by inspection.
The Hamiltonian cycle problem ( ) is to determine whether a given graph contains
is a decision problem in NP, since if a given graph has a
a Hamiltonian cycle or not.
Hamiltonian cycle, then that cycle can be used as an easily checkable witness. Moreover,
has no known polynomial time algorithm. Indeed,
is a problem in the class of
so-called NP-complete problems, which can be thought of as the ‘hardest’ problems in
in time t allows any other problem in NP to be solved
NP, in the sense that solving
in time O(poly(t)). This also means that if any NP-complete problem has a polynomial
time solution then it will follow that P = NP.
There is a problem, the Euler cycle decision problem, which is superficially similar to
, but which has astonishingly different properties. An Euler cycle is an ordering of
the edges of a graph G so that every edge in the graph is visited exactly once. The Euler
144
Introduction to computer science
cycle decision problem ( ) is to determine, given a graph G on n vertices, whether that
is, in fact, exactly the same problem as , only
graph contains an Euler cycle or not.
the path visits edges, rather than vertices. Consider the following remarkable theorem,
to be proven in Exercise 3.20:
Theorem 3.1: (Euler’s theorem) A connected graph contains an Euler cycle if and
only if every vertex has an even number of edges incident upon it.
Euler’s theorem gives us a method for efficiently solving . First, check to see whether
the graph is connected; this is easily done with O(n2 ) operations, as shown in Exercise 3.19. If the graph is not connected, then obviously no Euler cycle exists. If the graph
is connected then for each vertex check whether there is an even number of edges incident
upon the vertex. If a vertex is found for which this is not the case, then there is no Euler
cycle, otherwise an Euler cycle exists. Since there are n vertices, and at most n(n − 1)/2
is in P! Somehow,
edges, this algorithm requires O(n3 ) elementary operations. Thus
there is a structure present in the problem of visiting each edge that can be exploited
to provide an efficient algorithm for , yet which does not seem to be reflected in the
problem of visiting each vertex; it is not at all obvious why such a structure should be
present in one case, but not in the other, if indeed it is absent for the
problem.
Exercise 3.19: The
problem is to determine whether there is a path
can be
between two specified vertices in a graph. Show that
solved using O(n) operations if the graph has n vertices. Use the solution to
to show that it is possible to decide whether a graph is connected
in O(n2 ) operations.
Exercise 3.20: (Euler’s theorem) Prove Euler’s theorem. In particular, if each
vertex has an even number of incident edges, give a constructive procedure for
finding an Euler cycle.
The equivalence between the factoring decision problem and the factoring problem
proper is a special instance of one of the most important ideas in computer science, an
idea known as reduction. Intuitively, we know that some problems can be viewed as
special instances of other problems. A less trivial example of reduction is the reduction
of
to the traveling salesman decision problem (
). The traveling salesman decision
problem is as follows: we are given n cities 1, 2, . . . , n and a non-negative integer distance
dij between each pair of cities. Given a distance d the problem is to determine if there
is a tour of all the cities of distance less than d.
to
goes as follows. Suppose we have a graph containing n
The reduction of
by thinking of each vertex of the graph as a
vertices. We turn this into an instance of
‘city’ and defining the distance dij between cities i and j to be one if vertices i and j are
connected, and the distance to be two if the vertices are unconnected. Then a tour of the
cities of distance less than n + 1 must be of distance n, and be a Hamiltonian cycle for the
graph. Conversely, if a Hamiltonian cycle exists then a tour of the cities of distance less
, we can convert
than n + 1 must exist. In this way, given an algorithm for solving
without much overhead. Two consequences can be
it into an algorithm for solving
is a tractable problem, then
is also tractable. Second, if
inferred from this. First, if
is hard then
must also be hard. This is an example of a general technique known
The analysis of computational problems
145
as reduction: we’ve reduced the problem
to the problem
. This is a technique we
will use repeatedly throughout this book.
A more general notion of reduction is illustrated in Figure 3.11. A language B is
said to be reducible to another language A if there exists a Turing machine operating
in polynomial time such that given as input x it outputs R(x), and x ∈ B if and only
if R(x) ∈ A. Thus, if we have an algorithm for deciding A, then with a little extra
overhead we can decide the language B. In this sense, the language B is essentially no
more difficult to decide than the language A.
Is x ∈ B
Compute R (x)
in polynomial time
(
)
Is R x ∈ A ?
‘‘Yes’’ or ‘‘No’’
Figure 3.11. Reduction of B to A.
Exercise 3.21: (Transitive property of reduction) Show that if a language L1 is
reducible to the language L2 and the language L2 is reducible to L3 then the
language L1 is reducible to the language L3 .
Some complexity classes have problems which are complete with respect to that complexity class, meaning there is a language L in the complexity class which is the ‘most
difficult’ to decide, in the sense that every other language in the complexity class can
be reduced to L. Not all complexity classes have complete problems, but many of the
complexity classes we are concerned with do have complete problems. A trivial example
is provided by P. Let L be any language in P which is not empty or equal to the set
of all words. That is, there exists a string x1 such that x1 ∈ L and a string x2 such
that x2 ∈ L. Then any other language L′ in P can be reduced to L using the following
reduction: given an input x, use the polynomial time decision procedure to determine
whether x ∈ L′ or not. If it is not, then set R(x) = x1 , otherwise set R(x) = x2 .
Exercise 3.22: Suppose L is complete for a complexity class, and L′ is another
language in the complexity class such that L reduces to L′ . Show that L′ is
complete for the complexity class.
Less trivially, NP also contains complete problems. An important example of such a
problem and the prototype for all other NP-complete problems is the circuit satisfiability
: given a Boolean circuit composed of
,
and
gates, is there
problem or
an assignment of values to the inputs to the circuit that results in the circuit outputting 1,
is known
that is, is the circuit satisfiable for some input? The NP-completeness of
as the Cook–Levin theorem, for which we now outline a proof.
146
Introduction to computer science
Theorem 3.2: (Cook–Levin)
is NP-complete.
Proof
The proof has two parts. The first part of the proof is to show that
is in NP, and
. Both parts
the second part is to show that any language in NP can be reduced to
of the proof are based on simulation techniques: the first part of the proof is essentially
showing that a Turing machine can efficiently simulate a circuit, while the second part of
the proof is essentially showing that a circuit can efficiently simulate a Turing machine.
Both parts of the proof are quite straightforward; for the purposes of illustration we give
the second part in some detail.
is in NP. Given a circuit containing
The first part of the proof is to show that
n circuit elements, and a potential witness w, it is obviously easy to check in polynomial
time on a Turing machine whether or not w satisfies the circuit, which establishes that
is in NP.
The second part of the proof is to show that any language L ∈ NP can be reduced to
. That is, we aim to show that there is a polynomial time computable reduction R
such that x ∈ L if and only if R(x) is a satisfiable circuit. The idea of the reduction is
to find a circuit which simulates the action of the machine M which is used to check
instance-witness pairs, (x, w), for the language L. The input variables for the circuit
will represent the witness; the idea is that finding a witness which satisfies the circuit is
equivalent to M accepting (x, w) for some specific witness w. Without loss of generality
we may make the following assumptions about M to simplify the construction:
(1) M ’s tape alphabet is ⊲,0,1 and the blank symbol.
(2) M runs using time at most t(n) and total space at most s(n) where t(n) and s(n)
are polynomials in n.
(3) Machine M can actually be assumed to run using time exactly t(n) for all inputs of
size n. This is done by adding the lines qY , x, qY , x, 0, and qN , x, qN , x, 0 for
each of x = ⊲, 0, 1 and the blank, artificially halting the machine after exactly t(n)
steps.
The basic idea of the construction to simulate M is outlined in Figure 3.12. Each
internal state of the Turing machine is represented by a single bit in the circuit. We
name the corresponding bits q̃s , q̃1 , . . . , q̃m , q̃Y , q̃N . Initially, q̃s is set to one, and all the
other bits representing internal states are set to zero. Each square on the Turing machine
tape is represented by three bits: two bits to represent the letter of the alphabet (⊲, 0, 1
or blank) currently residing on the tape, and a single ‘flag’ bit which is set to one if the
read-write head is pointing to the square, and set to zero otherwise. We denote the bits
representing the tape contents by (u1 , v1 ), . . . , (us(n) , vs(n) ) and the corresponding flag
bits by f1 , . . . , fs(n) . Initially the uj and vj bits are set to represent the inputs x and w,
as appropriate, while f1 = 1 and all other fj = 0. There is also a lone extra ‘global flag’
bit, F , whose function will be explained later. F is initially set to zero. We regard all the
bits input to the circuit as fixed, except for those representing the witness w, which are
the variable bits for the circuit.
The action of M is obtained by repeating t(n) times a ‘simulation step’ which
simulates the execution of a single program line for the Turing machine. Each
simulation step may be broken up into a sequence of steps corresponding in turn to the
respective program lines, with a final step which resets the global flag F to zero, as
147
The analysis of computational problems
m+3
fixed
input bits
3n + 6
⎧
⎪
⎪
q˜s
⎪
⎪
⎪
⎪
⎪
⎪
q
˜1
⎪
⎪
⎪
⎨ ..
.
⎪
q˜m
⎪
⎪
⎪
⎪
⎪
q˜Y
⎪
⎪
⎪
⎪
⎪
⎩ q˜N
⎧
⎪
⎪
⎪
✄
⎪
⎪
⎪
⎪
⎪
⎪
⎨
x
fixed
⎪
⎪
⎪
input bits ⎪
⎪
⎪
3w(n)
⎪
⎪
⎪
⎩
⎧
⎪
⎪
⎪
⎪
⎨
⎧
⎪
⎪
⎪
⎪
⎨
b.
.
fixed
⎪ .
⎪
⎪
input bits ⎪
⎩ b
1 fixed
input bit
/
m+3
Simulation
Step
⎧
⎨
⎩ F
/
Simulation
Step
···
3w(n)
···
3s(n)
···
/
output
bit
···
3n+6
b
variable
w
⎪
⎪
input bits ⎪
⎪
⎩
3s(n)
/
q̃Y
Simulation
Step
···
"
#
t (n) simulation steps
!
Figure 3.12. Outline of the procedure used to simulate a Turing machine using a circuit.
illustrated in Figure 3.13. To complete the simulation, we only need to simulate a
program line of the form qi , x, qj , x′ , s. For convenience, we assume qi = qj , but a
similar construction works in the case when qi = qj . The procedure is as follows:
(1) Check to see that q̃i = 1, indicating that the current state of the machine is qi .
(2) For each tape square:
(a) Check to see that the global flag bit is set to zero, indicating that no action has
yet been taken by the Turing machine.
(b) Check that the flag bit is set to one, indicating that the tape head is at this tape
square.
(c) Check that the simulated tape contents at this point are x.
(d) If all conditions check out, then perform the following steps:
1. Set q̃i = 0 and q̃j = 1.
2. Update the simulated tape contents at this tape square to x′ .
3. Update the flag bit of this and adjacent ‘squares’ as appropriate, depending
on whether s = +1, 0, −1, and whether we are at the left hand end of the
tape.
4. Set the global flag bit to one, indicating that this round of computation has
been completed.
148
Introduction to computer science
This is a fixed procedure which involves a constant number of bits, and by the universality
result of Section 3.1.2 can be performed using a circuit containing a constant number of
gates.
Figure 3.13. Outline of the simulation step used to simulate a Turing machine using a circuit.
The total number of gates in the entire circuit is easily seen to be O(t(n)(s(n) + n)),
which is polynomial in size. At the end of the circuit, it is clear that q̃Y = 1 if and only
if the machine M accepts (x, w). Thus, the circuit is satisfiable if and only if there exists
w such that machine M accepts (x, w), and we have found the desired reduction from
.
L to
gives us a foot in the door which enables us to easily prove that many other
problems are NP-complete. Instead of directly proving that a problem is NP-complete,
reduces to it, so by Exercise 3.22 the
we can instead prove that it is in NP and that
problem must be NP-complete. A small sample of NP-complete problems is discussed
in Box 3.3. An example of another NP-complete problem is the satisfiability problem
( ), which is phrased in terms of a Boolean formula. Recall that a Boolean formula
ϕ is composed of the following elements: a set of Boolean variables, x1 , x2 , . . .; Boolean
connectives, that is, a Boolean function with one or two inputs and one output, such as
∧ (AND), ∨ (OR), and ¬ (NOT); and parentheses. The truth or falsity of a Boolean
formula for a given set of Boolean variables is decided according to the usual laws of
Boolean algebra. For example, the formula ϕ = x1 ∨ ¬x2 has the satisfying assignment
x1 = 0 and x2 = 0, while x1 = 0 and x2 = 1 is not a satisfying assignment. The
satisfiability problem is to determine, given a Boolean formula ϕ, whether or not it is
satisfiable by any set of possible inputs.
is NP-complete by first showing that
is in NP, and
Exercise 3.23: Show that
reduces to
. (Hint: for the reduction it may help to
then showing that
by different variables in a
represent each distinct wire in an instance of
Boolean formula.)
is also NP-complete, the 3-satisfiability problem
An important restricted case of
), which is concerned with formulae in 3-conjunctive normal form. A formula is
(
said to be in conjunctive normal form if it is the AND of a collection of clauses, each of
which is the OR of one or more literals, where a literal is an expression is of the form x
or ¬x. For example, the formula (x1 ∨ ¬x2 ) ∧ (x2 ∨ x3 ∨ ¬x4 ) is in conjunctive normal
form. A formula is in 3-conjunctive normal form or 3-CNF if each clause has exactly
three literals. For example, the formula (¬x1 ∨x2 ∨¬x2 )∧(¬x1 ∨x3 ∨¬x4 )∧(x2 ∨x3 ∨x4 )
is in 3-conjunctive normal form. The 3-satisfiability problem is to determine whether a
formula in 3-conjunctive normal form is satisfiable or not.
is NP-complete is straightforward, but is a little too lengthy to
The proof that
and
,
is in some sense
justify inclusion in this overview. Even more than
The analysis of computational problems
149
the NP-complete problem, and it is the basis for countless proofs that other problems
are NP-complete. We conclude our discussion of NP-completeness with the surprising
, the analogue of
in which every clause has two literals, can be solved
fact that
in polynomial time:
Exercise 3.24: (
has an efficient solution) Suppose ϕ is a Boolean formula in
conjunctive normal form, in which each clause contains only two literals.
(1) Construct a (directed) graph G(ϕ) with directed edges in the following way:
the vertices of G correspond to variables xj and their negations ¬xj in ϕ.
There is a (directed) edge (α, β) in G if and only if the clause (¬α ∨ β) or
the clause (β ∨ ¬α) is present in ϕ. Show that ϕ is not satisfiable if and only
if there exists a variable x such that there are paths from x to ¬x and from
¬x to x in G(ϕ).
(2) Show that given a directed graph G containing n vertices it is possible to
determine whether two vertices v1 and v2 are connected in polynomial time.
.
(3) Find an efficient algorithm to solve
Box 3.3: A zoo of NP-complete problems
The importance of the class NP derives, in part, from the enormous number of
computational problems that are known to be NP-complete. We can’t possibly hope
to survey this topic here (see ‘History and further reading’), but the following examples, taken from many distinct areas of mathematics, give an idea of the delicious
melange of problems known to be NP-complete.
•
(graph theory): A clique in an undirected graph G is a subset of
vertices, each pair of which is connected by an edge. The size of a clique is the
number of vertices it contains. Given an integer m and a graph G, does G have
a clique of size m?
(arithmetic): Given a finite collection S of positive integers and a
•
target t, is there any subset of S which sums to t?
(linear programming): Given an integer m × n
•
matrix A and an m-dimensional vector y with integer values, does there exist
an n-dimensional vector x with entries in the set {0, 1} such that Ax ≤ y?
•
(graph theory): A vertex cover for an undirected graph G is a
′
set of vertices V such that every edge in the graph has one or both vertices
contained in V ′ . Given an integer m and a graph G, does G have a vertex
cover V ′ containing m vertices?
Assuming that P = NP it is possible to prove that there is a non-empty class of
problems NPI (NP intermediate) which are neither solvable with polynomial resources,
nor are NP-complete. Obviously, there are no problems known to be in NPI (otherwise
we would know that P = NP) but there are several problems which are regarded as
being likely candidates. Two of the strongest candidates are the factoring and graph
isomorphism problems:
150
Introduction to computer science
: Suppose G and G′ are two undirected graphs over the
vertices V ≡ {v1 , . . . , vn }. Are G and G′ isomorphic? That is, does there exist a
one-to-one function ϕ : V → V such that the edge (vi , vj ) is contained in G if
and only if (ϕ(vi ), ϕ(vj )) is contained in G?
Problems in NPI are interesting to researchers in quantum computation and quantum
information for two reasons. First, it is desirable to find fast quantum algorithms to solve
problems which are not in P. Second, many suspect that quantum computers will not
be able to efficiently solve all problems in NP, ruling out NP-complete problems. Thus,
it is natural to focus on the class NPI. Indeed, a fast quantum algorithm for factoring
has been discovered (Chapter 5), and this has motivated the search for fast quantum
algorithms for other problems suspected to be in NPI.
3.2.4 A plethora of complexity classes
We have investigated some of the elementary properties of some important complexity
classes. A veritable pantheon of complexity classes exists, and there are many non-trivial
relationships known or suspected between these classes. For quantum computation and
quantum information, it is not necessary to understand all the different complexity classes
that have been defined. However, it is useful to have some appreciation for the more
important of the complexity classes, many of which have natural analogues in the study
of quantum computation and quantum information. Furthermore, if we are to understand
how powerful quantum computers are, then it behooves us to understand how the class
of problems solvable on a quantum computer fits into the zoo of complexity classes which
may be defined for classical computers.
There are essentially three properties that may be varied in the definition of a complexity class: the resource of interest (time, space, . . . ), the type of problem being considered
(decision problem, optimization problem, . . . ), and the underlying computational model
(deterministic Turing machine, probabilistic Turing machine, quantum computer, . . . ).
Not surprisingly, this gives us an enormous range to define complexity classes. In this
section, we briefly review a few of the more important complexity classes and some of
their elementary properties. We begin with a complexity class defined by changing the
resource of interest from time to space.
The most natural space-bounded complexity class is the class PSPACE of decision
problems which may be solved on a Turing machine using a polynomial number of
working bits, with no limitation on the amount of time that may be used by the machine
(see Exercise 3.25). Obviously, P is included in PSPACE, since a Turing machine that
halts after polynomial time can only traverse polynomially many squares, but it is also true
that NP is a subset of PSPACE. To see this, suppose L is any language in NP. Suppose
problems of size n have witnesses of size at most p(n), where p(n) is some polynomial
in n. To determine whether or not the problem has a solution, we may sequentially
test all 2p(n) possible witnesses. Each test can be run in polynomial time, and therefore
polynomial space. If we erase all the intermediate working between tests then we can test
all the possibilities using polynomial space.
Unfortunately, at present it is not even known whether PSPACE contains problems
which are not in P! This is a pretty remarkable situation – it seems fairly obvious that
having unlimited time and polynomial spatial resources must be more powerful than
having only a polynomial amount of time. However, despite considerable effort and in-
The analysis of computational problems
151
genuity, this has never been shown. We will see later that the class of problems solvable
on a quantum computer in polynomial time is a subset of PSPACE, so proving that a
problem efficiently solvable on a quantum computer is not efficiently solvable on a classical computer would establish that P = PSPACE, and thus solve a major outstanding
problem of computer science. An optimistic way of looking at this result is that ideas
from quantum computation might be useful in proving that P = PSPACE. Pessimistically, one might conclude that it will be a long time before anyone rigorously proves that
quantum computers can be used to efficiently solve problems that are intractable on a
classical computer. Even more pessimistically, it is possible that P = PSPACE, in which
case quantum computers offer no advantage over classical computers! However, very few
(if any) computational complexity theorists believe that P = PSPACE.
Exercise 3.25: (PSPACE ⊆ EXP) The complexity class EXP (for exponential time)
contains all decision problems which may be decided by a Turing machine
k
running in exponential time, that is time O(2n ), where k is any constant. Prove
that PSPACE ⊆ EXP. (Hint: If a Turing machine has l internal states, an m
letter alphabet, and uses space p(n), argue that the machine can exist in one of at
most lmp(n) different states, and that if the Turing machine is to avoid infinite
loops then it must halt before revisiting a state.)
Exercise 3.26: (L ⊆ P) The complexity class L (for logarithmic space) contains all
decision problems which may be decided by a Turing machine running in
logarithmic space, that is, in space O(log(n)). More precisely, the class L is
defined using a two-tape Turing machine. The first tape contains the problem
instance, of size n, and is a read-only tape, in the sense that only program lines
which don’t change the contents of the first tape are allowed. The second tape is
a working tape which initially contains only blanks. The logarithmic space
requirement is imposed on the second, working tape only. Show that L ⊆ P.
Does allowing more time or space give greater computational power? The answer
to this question is yes in both cases. Roughly speaking, the time hierarchy theorem
states that TIME(f (n)) is a proper subset of TIME(f (n) log2 (f (n))). Similarly, the space
hierarchy theorem states that SPACE(f (n)) is a proper subset of SPACE(f (n) log(f (n))),
where SPACE(f (n)) is, of course, the complexity class consisting of all languages that
can be decided with spatial resources O(f (n)). The hierarchy theorems have interesting
implications with respect to the equality of complexity classes. We know that
L ⊆ P ⊆ NP ⊆ PSPACE ⊆ EXP.
(3.1)
Unfortunately, although each of these inclusions is widely believed to be strict, none of
them has ever been proved to be strict. However, the time hierarchy theorem implies
that P is a strict subset of EXP, and the space hierarchy theorem implies that L is a strict
subset of PSPACE! So we can conclude that at least one of the inclusions in (3.1) must
be strict, although we do not know which one.
What should we do with a problem once we know that it is NP-complete, or that
some other hardness criterion holds? It turns out that this is far from being the end of
the story in problem analysis. One possible line of attack is to identify special cases of
the problem which may be amenable to attack. For example, in Exercise 3.24 we saw
problem has an efficient solution, despite the NP-completeness of
.
that the
152
Introduction to computer science
Another approach is to change the type of problem which is being considered, a tactic
which typically results in the definition of new complexity classes. For example, instead
of finding exact solutions to an NP-complete problem, we can instead try to find good
algorithms for finding approximate solutions to a problem. For example, the
problem is an NP-complete problem, yet in Exercise 3.27 we show that it is
possible to efficiently find an approximation to the minimal vertex cover which is correct
to within a factor two! On the other hand, in Problem 3.6 we show that it is not possible
correct to within any factor, unless P = NP!
to find approximations to solutions of
) Let G = (V, E)
Exercise 3.27: (Approximation algorithm for
be an undirected graph. Prove that the following algorithm finds a vertex cover
for G that is within a factor two of being a minimal vertex cover:
VC =∅
E′ = E
do until E ′ = ∅
let (α, β) be any edge of E ′
V C = V C ∪ {α, β}
remove from E ′ every edge incident on α or β
return V C.
Why is it possible to approximate the solution of one NP-complete problem, but
not another? After all, isn’t it possible to efficiently transform from one problem to
another? This is certainly true, however it is not necessarily true that this transformation
preserves the notion of a ‘good approximation’ to a solution. As a result, the computational
complexity theory of approximation algorithms for problems in NP has a structure that
goes beyond the structure of NP proper. An entire complexity theory of approximation
algorithms exists, which unfortunately is beyond the scope of this book. The basic idea,
however, is to define a notion of reduction that corresponds to being able to efficiently
reduce one approximation problem to another, in such a way that the notion of good
approximation is preserved. With such a notion, it is possible to define complexity classes
such as MAXSNP by analogy to the class NP, as the set of problems for which it is
possible to efficiently verify approximate solutions to the problem. Complete problems
exist for MAXSNP, just as for NP, and it is an interesting open problem to determine
how the class MAXSNP compares to the class of approximation problems which are
efficiently solvable.
We conclude our discussion with a complexity class that results when the underlying
model of computation itself is changed. Suppose a Turing machine is endowed with
the ability to flip coins, using the results of the coin tosses to decide what actions to
take during the computation. Such a Turing machine may only accept or reject inputs
with a certain probability. The complexity class BPP (for bounded-error probabilistic
time) contains all languages L with the property that there exists a probabilistic Turing
machine M such that if x ∈ L then M accepts x with probability at least 3/4, and if
x ∈ L, then M rejects x with probability at least 3/4. The following exercise shows that
the choice of the constant 3/4 is essentially arbitrary:
The analysis of computational problems
153
Exercise 3.28: (Arbitrariness of the constant in the definition of BPP) Suppose
k is a fixed constant, 1/2 < k ≤ 1. Suppose L is a language such that there
exists a Turing machine M with the property that whenever x ∈ L, M accepts
x with probability at least k, and whenever x ∈ L, M rejects x with probability
at least k. Show that L ∈ BPP.
Indeed, the Chernoff bound, discussed in Box 3.4, implies that with just a few repetitions
of an algorithm deciding a language in BPP the probability of success can be amplified
to the point where it is essentially equal to one, for all intents and purposes. For this
reason, BPP even more than P is the class of decision problems which is usually regarded
as being efficiently solvable on a classical computer, and it is the quantum analogue of
BPP, known as BQP, that is most interesting in our study of quantum algorithms.
3.2.5 Energy and computation
Computational complexity studies the amount of time and space required to solve a
computational problem. Another important computational resource is energy. In this
section, we study the energy requirements for computation. Surprisingly, it turns out that
computation, both classical and quantum, can in principle be done without expending
any energy! Energy consumption in computation turns out to be deeply linked to the
gate, which takes as
reversibility of the computation. Consider a gate like the
input two bits, and produces a single bit as output. This gate is intrinsically irreversible
because, given the output of the gate, the input is not uniquely determined. For example,
if the output of the
gate is 1, then the input could have been any one of 00, 01,
gate is an example of a reversible logic gate because,
or 10. On the other hand, the
gate, it is possible to infer what the input must have been.
given the output of the
Another way of understanding irreversibility is to think of it in terms of information
erasure. If a logic gate is irreversible, then some of the information input to the gate is lost
irretrievably when the gate operates – that is, some of the information has been erased by
the gate. Conversely, in a reversible computation, no information is ever erased, because
the input can always be recovered from the output. Thus, saying that a computation is
reversible is equivalent to saying that no information is erased during the computation.
What is the connection between energy consumption and irreversibility in computation? Landauer’s principle provides the connection, stating that, in order to erase
information, it is necessary to dissipate energy. More precisely, Landauer’s principle
may be stated as follows:
Landauer’s principle (first form): Suppose a computer erases a single bit of
information. The amount of energy dissipated into the environment is at least
kB T ln 2, where kB is a universal constant known as Boltzmann’s constant, and T
is the temperature of the environment of the computer.
According to the laws of thermodynamics, Landauer’s principle can be given an alternative form stated not in terms of energy dissipation, but rather in terms of entropy:
Landauer’s principle (second form): Suppose a computer erases a single bit of
information. The entropy of the environment increases by at least kB ln 2, where
kB is Boltzmann’s constant.
Justifying Landauer’s principle is a problem of physics that lies beyond the scope of this
154
Introduction to computer science
Box 3.4: BPP and the Chernoff bound
Suppose we have an algorithm for a decision problem which gives the correct answer
with probability 1/2 + ǫ, and the wrong answer with probability 1/2 − ǫ. If we run
the algorithm n times, then it seems reasonable to guess that the correct answer is
whichever appeared most frequently. How reliably does this procedure work? The
Chernoff bound is a simple result from elementary probability which answers this
question.
Theorem 3.3: (The Chernoff bound) Suppose X1 , . . . , Xn are independent and
identically distributed random variables, each taking the value 1 with
probability 1/2 + ǫ, and the value 0 with probability 1/2 − ǫ. Then
n
p
i=1
2
≤ e−2ǫ n .
Xi ≤ n/2
(3.2)
Proof
Consider any sequence (x1 , . . . , xn ) containing at most n/2 ones. The probability
of such a sequence occurring is maximized when it contains ⌊n/2⌋ ones, so
n2
n2
1
1
p(X1 = x1 , . . . , Xn = xn ) ≤
−ǫ
+ǫ
(3.3)
2
2
n
(1 − 4ǫ2 ) 2
.
(3.4)
=
2n
There can be at most 2n such sequences, so
n
(1 − 4ǫ2 ) 2
n
= (1 − 4ǫ2 ) 2 .
Xi ≤ n/2 ≤ 2n ×
(3.5)
p
n
2
i
Finally, by calculus, 1 − x ≤ exp(−x), so
p
i
Xi ≤ n/2
2
2
≤ e−4ǫ n/2 = e−2ǫ n .
(3.6)
What this tells us is that for fixed ǫ, the probability of making an error decreases
exponentially quickly in the number of repetitions of the algorithm. In the case of
BPP we have ǫ = 1/4, so it takes only a few hundred repetitions of the algorithm
to reduce the probability of error below 10−20 , at which point an error in one of
the computer’s components becomes much more likely than an error due to the
probabilistic nature of the algorithm.
book – see the end of chapter ‘History and further reading’ if you wish to understand why
Landauer’s principle holds. However, if we accept Landauer’s principle as given, then it
raises a number of interesting questions. First of all, Landauer’s principle only provides
a lower bound on the amount of energy that must be dissipated to erase information.
The analysis of computational problems
155
How close are existing computers to this lower bound? Not very, turns out to be the
answer – computers circa the year 2000 dissipate roughly 500kB T ln 2 in energy for each
elementary logical operation.
Although existing computers are far from the limit set by Landauer’s principle, it is
still an interesting problem of principle to understand how much the energy consumption
can be reduced. Aside from the intrinsic interest of the problem, a practical reason for the
interest follows from Moore’s law: if computer power keeps increasing then the amount
of energy dissipated must also increase, unless the energy dissipated per operation drops
at least as fast as the rate of increase in computing power.
If all computations could be done reversibly, then Landauer’s principle would imply no
lower bound on the amount of energy dissipated by the computer, since no bits at all are
erased during a reversible computation. Of course, it is possible that some other physical
principle might require that energy be dissipated during the computation; fortunately,
this turns out not to be the case. But is it possible to perform universal computation
without erasing any information? Physicists can cheat on this problem to see in advance
that the answer to this question must be yes, because our present understanding of the
laws of physics is that they are fundamentally reversible. That is, if we know the final
state of a closed physical system, then the laws of physics allow us to work out the initial
state of the system. If we believe that those laws are correct, then we must conclude that
and
, there must be some underlying
hidden in the irreversible logic gates like
reversible computation. But where is this hidden reversibility, and can we use it to
construct manifestly reversible computers?
We will use two different techniques to give reversible circuit-based models capable
of universal computation. The first model, a computer built entirely of billiard balls and
mirrors, gives a beautiful concrete realization of the principles of reversible computation.
The second model, based on a reversible logic gate known as the Toffoli gate (which we
first encountered in Section 1.4.1), is a more abstract view of reversible computation that
will later be of great use in our discussion of quantum computation. It is also possible to
build reversible Turing machines that are universal for computation; however, we won’t
study these here, since the reversible circuit models turn out to be much more useful for
quantum computation.
The basic idea of the billiard ball computer is illustrated in Figure 3.14. Billiard ball
‘inputs’ enter the computer from the left hand side, bouncing off mirrors and each other,
before exiting as ‘outputs’ on the right hand side. The presence or absence of a billiard
ball at a possible input site is used to indicate a logical 1 or a logical 0, respectively. The
fascinating thing about this model is that it is manifestly reversible, insofar as its operation
is based on the laws of classical mechanics. Furthermore, this model of computation turns
out to be universal in the sense that it can be used to simulate an arbitrary computation
in the standard circuit model of computation.
Of course, if a billiard ball computer were ever built it would be highly unstable. As
any billiards player can attest, a billiard ball rolling frictionlessly over a smooth surface is
easily knocked off course by small perturbations. The billiard ball model of computation
depends on perfect operation, and the absence of external perturbations such as those
caused by thermal noise. Periodic corrections can be performed, but information gained
by doing this would have to be erased, requiring work to be performed. Expenditure of
energy thus serves the purpose of reducing this susceptibility to noise, which is necessary
for a practical, real-world computational machine. For the purposes of this introduction,
156
Introduction to computer science
.
?
?
>
>
=
=
Figure 3.14. A simple billiard ball computer, with three input bits and three output bits, shown entering on the left
and leaving on the right, respectively. The presence or absence of a billiard ball indicates a 1 or a 0, respectively.
Empty circles illustrate potential paths due to collisions. This particular computer implements the Fredkin classical
reversible logic gate, discussed in the text.
we will ignore the effects of noise on the billiard ball computer, and concentrate on
understanding the essential elements of reversible computation.
The billiard ball computer provides an elegant means for implementing a reversible
universal logic gate known as the Fredkin gate. Indeed, the properties of the Fredkin gate
provide an informative overview of the general principles of reversible logic gates and
circuits. The Fredkin gate has three input bits and three output bits, which we refer to
as a, b, c and a′ , b′ , c′ , respectively. The bit c is a control bit, whose value is not changed
by the action of the Fredkin gate, that is, c′ = c. The reason c is called the control bit
is because it controls what happens to the other two bits, a and b. If c is set to 0 then a
and b are left alone, a′ = a, b′ = b. If c is set to 1, a and b are swapped, a′ = b, b′ = a.
The explicit truth table for the Fredkin gate is shown in Figure 3.15. It is easy to see
that the Fredkin gate is reversible, because given the output a′ , b′ , c′ , we can determine
the inputs a, b, c. In fact, to recover the original inputs a, b and c we need only apply
another Fredkin gate to a′ , b′ , c′ :
Exercise 3.29: (Fredkin gate is self-inverse) Show that applying two consecutive
Fredkin gates gives the same outputs as inputs.
Examining the paths of the billiard balls in Figure 3.14, it is not difficult to verify that
this billiard ball computer implements the Fredkin gate:
Exercise 3.30: Verify that the billiard ball computer in Figure 3.14 computes the
Fredkin gate.
In addition to reversibility, the Fredkin gate also has the interesting property that
the number of 1s is conserved between the input and output. In terms of the billiard
ball computer, this corresponds to the number of billiard balls going into the Fredkin
gate being equal to the number coming out. Thus, it is sometimes referred to as being
a conservative reversible logic gate. Such reversibility and conservative properties are
interesting to a physicist because they can be motivated by fundamental physical princi-
The analysis of computational problems
Inputs
a b
0 0
0 0
0 1
0 1
1 0
1 0
1 1
1 1
c
0
1
0
1
0
1
0
1
157
Outputs
a b′ c′
0 0 0
0 0 1
0 1 0
1 0 1
1 0 0
0 1 1
1 1 0
1 1 1
′
Figure 3.15. Fredkin gate truth table and circuit representation. The bits a and b are swapped if the control bit c is
set, and otherwise are left alone.
ples. The laws of Nature are reversible, with the possible exception of the measurement
postulate of quantum mechanics, discussed in Section 2.2.3 on page 84. The conservative
property can be thought of as analogous to properties such as conservation of mass, or
conservation of energy. Indeed, in the billiard ball model of computation the conservative
property corresponds exactly to conservation of mass.
Figure 3.16. Fredkin gate configured to perform the elementary gates
(left),
(middle), and a primitive
routing function, the
(right). The middle gate also serves to perform the
operation, since it
produces two copies of x at the output. Note that each of these configurations requires the use of extra ‘ancilla’ bits
gate – and in general the output
prepared in standard states – for example, the 0 input on the first line of the
contains ‘garbage’ not needed for the remainder of the computation.
The Fredkin gate is not only reversible and conservative, it’s a universal logic gate
as well! As illustrated in Figure 3.16, the Fredkin gate can be configured to simulate
,
,
and
functions, and thus can be cascaded to simulate any
classical circuit whatsoever.
using the Fredkin gate, we made use of two
To simulate irreversible gates such as
ideas. First, we allowed the input of ‘ancilla’ bits to the Fredkin gate, in specially prepared
states, either 0 or 1. Second, the output of the Fredkin gate contained extraneous ‘garbage’
not needed for the remainder of the computation. These ancilla and garbage bits are not
directly important to the computation. Their importance lies in the fact that they make
and
may
the computation reversible. Indeed the irreversibility of gates like the
be viewed as a consequence of the ancilla and garbage bits being ‘hidden’. Summarizing,
given any classical circuit computing a function f (x), we can build a reversible circuit
made entirely of Fredkin gates, which on input of x, together with some ancilla bits
158
Introduction to computer science
in a standard state a, computes f (x), together with some extra ‘garbage’ output, g(x).
Therefore, we represent the action of the computation as (x, a) → (f (x), g(x)).
We now know how to compute functions reversibly. Unfortunately, this computation
produces unwanted garbage bits. With some modifications it turns out to be possible
to perform the computation so that any garbage bits produced are in a standard state.
This construction is crucial for quantum computation, because garbage bits whose value
depends upon x will in general destroy the interference properties crucial to quantum
gate
computation. To understand how this works it is convenient to assume that the
is available in our repertoire of reversible gates, so we may as well assume that the ancilla
bits a all start out as 0s, with
gates being added where necessary to turn the ancilla
gate is
0s into 1s. It will also be convenient to assume that the classical controlledavailable, defined in a manner analogous to the quantum definition of Section 1.3.2, that
is, the inputs (c, t) are taken to (c, t ⊕ c), where ⊕ denotes addition modulo 2. Notice
can be thought of as a reversible
that t = 0 gives (c, 0) → (c, c), so the controlled, which leaves no garbage bits at the output.
copying gate or
gates appended at the beginning of the circuit, the action
With the additional
of the computation may be written as (x, 0) → (f (x), g(x)). We could also have added
gates to the beginning of the circuit, in order to create a copy of x which is not
changed during the subsequent computation. With this modification, the action of the
circuit may be written
(x, 0, 0) → (x, f (x), g(x)) .
(3.7)
Equation (3.7) is a very useful way of writing the action of the reversible circuit, because
it allows an idea known as uncomputation to be used to get rid of the garbage bits, for a
small cost in the running time of the computation. The idea is the following. Suppose we
start with a four register computer in the state (x, 0, 0, y). The second register is used to
store the result of the computation, and the third register is used to provide workspace for
the computation, that is, the garbage bits g(x). The use of the fourth register is described
shortly, and we assume it starts in an arbitrary state y.
We begin as before, by applying a reversible circuit to compute f , resulting in the state
(x, f (x), g(x), y). Next, we use
s to add the result f (x) bitwise to the fourth register,
leaving the machine in the state (x, f (x), g(x), y ⊕ f (x)). However, all the steps used to
compute f (x) were reversible and did not affect the fourth register, so by applying the
reverse of the circuit used to compute f we come to the state (x, 0, 0, y ⊕f (x)). Typically,
we omit the ancilla 0s from the description of the function evaluation, and just write the
action of the circuit as
(x, y) → (x, y ⊕ f (x)) .
(3.8)
In general we refer to this modified circuit computing f as the reversible circuit computing
f , even though in principle there are many other reversible circuits which could be used
to compute f .
What resource overhead is involved in doing reversible computation? To analyze this
question, we need to count the number of extra ancilla bits needed in a reversible circuit,
and compare the gate counts with classical models. It ought to be clear that the number of
gates in a reversible circuit is the same as in an irreversible circuit to within the constant
factor which represents the number of Fredkin gates needed to simulate a single element
of the irreversible circuit, and an additional factor of two for uncomputation, with an
The analysis of computational problems
159
overhead for the extra
operations used in reversible computation which is linear in
the number of bits involved in the circuit. Similarly, the number of ancilla bits required
scales at most linearly with the number of gates in the irreversible circuit, since each
element in the irreversible circuit can be simulated using a constant number of ancilla
bits. As a result, natural complexity classes such as P and NP are the same no matter
whether a reversible or irreversible model of computation is used. For more elaborate
complexity classes like PSPACE the situation is not so immediately clear; see Problem 3.9
and ‘History and further reading’ for a discussion of some such subtleties.
Exercise 3.31: (Reversible half-adder) Construct a reversible circuit which, when
two bits x and y are input, outputs (x, y, c, x ⊕ y), where c is the carry bit when
x and y are added.
The Fredkin gate and its implementation using the billiard ball computer offers a
beautiful paradigm for reversible computation. There is another reversible logic gate, the
Toffoli gate, which is also universal for classical computation. While the Toffoli gate does
not have quite the same elegant physical simplicity as the billiard ball implementation of
the Fredkin gate, it will be more useful in the study of quantum computation. We have
already met the Toffoli gate in Section 1.4.1, but for convenience we review its properties
here.
The Toffoli gate has three input bits, a, b and c. a and b are known as the first and
second control bits, while c is the target bit. The gate leaves both control bits unchanged,
flips the target bit if both control bits are set, and otherwise leaves the target bit alone.
The truth table and circuit representation for the Toffoli gate are shown in Figure 3.17.
Inputs
a b c
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
Outputs
a′ b′ c′
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 1
1 1 0
Figure 3.17. Truth table and circuit representation of the Toffoli gate.
How can the Toffoli gate be used to do universal computation? Suppose we wish to
the bits a and b. To do this using the Toffoli gate, we input a and b as control
bits, and send in an ancilla bit set to 1 as the target bit, as shown in Figure 3.18. The
of a and b is output as the target bit. As expected from our study of the Fredkin
requires the use of a special ancilla input,
gate, the Toffoli gate simulation of a
and some of the outputs from the simulation are garbage bits.
operation by inputting
The Toffoli gate can also be used to implement the
an ancilla 1 to the first control bit, and a to the second control bit, producing the output
and
are together
1, a, a. This is illustrated in Figure 3.19. Recalling that
160
Introduction to computer science
Figure 3.18. Implementing a
gate using a Toffoli gate. The top two bits represent the input to the
,
while the third bit is prepared in the standard state 1, sometimes known as an ancilla state. The output from the
is on the third bit.
universal for computation, we see that an arbitrary circuit can be efficiently simulated
using a reversible circuit consisting only of Toffoli gates and ancilla bits, and that useful
additional techniques such as uncomputation may be achieved using the same methods
as were employed with the Fredkin gate.
Figure 3.19.
with the Toffoli gate, with the second bit being the input to the
appears on the second and third bits.
bits standard ancilla states. The output from
, and the other two
Our interest in reversible computation was motivated by our desire to understand the
energy requirements for computation. It is clear that the noise-free billiard ball model
of computation requires no energy for its operation; what about models based upon
the Toffoli gate? This can only be determined by examining specific models for the
computation of the Toffoli gate. In Chapter 7, we examine several such implementations,
and it turns out that, indeed, the Toffoli gate can be implemented in a manner which
does not require the expenditure of energy.
There is a significant caveat attached to the idea that computation can be done without
the expenditure of energy. As we noted earlier, the billiard ball model of computation is
highly sensitive to noise, and this is true of many other models of reversible computation.
To nullify the effects of noise, some form of error-correction needs to be done. Such
error-correction typically involves the performance of measurements on the system to
determine whether the system is behaving as expected, or if an error has occurred.
Because the computer’s memory is finite, the bits used to store the measurement results
utilized in error-correction must eventually be erased to make way for new measurement
results. According to Landauer’s principle, this erasure carries an associated energy cost
Perspectives on computer science
161
that must be accounted for when tallying the total energy cost of the computation. We
analyze the energy cost associated with error-correction in more detail in Section 12.4.4.
What can we conclude from our study of reversible computation? There are three
key ideas. First, reversibility stems from keeping track of every bit of information; irreversibility occurs only when information is lost or erased. Second, by doing computation
reversibly, we obviate the need for energy expenditure during computation. All computations can be done, in principle, for zero cost in energy. Third, reversible computation can
be done efficiently, without the production of garbage bits whose value depends upon the
input to the computation. That is, if there is an irreversible circuit computing a function
f , then there is an efficient simulation of this circuit by a reversible circuit with action
(x, y) → (x, y ⊕ f (x)).
What are the implications of these results for physics, computer science, and for
quantum computation and quantum information? From the point of view of a physicist
or hardware engineer worried about heat dissipation, the good news is that, in principle,
it is possible to make computation dissipation-free by making it reversible, although in
practice energy dissipation is required for system stability and immunity from noise. At
an even more fundamental level, the ideas leading to reversible computation also lead to
the resolution of a century-old problem in the foundations of physics, the famous problem
of Maxwell’s demon. The story of this problem and its resolution is outlined in Box 3.5
on page 162. From the point of view of a computer scientist, reversible computation
validates the use of irreversible elements in models of computation such as the Turing
machine (since using them or not gives polynomially equivalent models). Moreover, since
the physical world is fundamentally reversible, one can argue that complexity classes
based upon reversible models of computation are more natural than complexity classes
based upon irreversible models, a point revisited in Problem 3.9 and ‘History and further
reading’. From the point of view of quantum computation and quantum information,
reversible computation is enormously important. To harness the full power of quantum
computation, any classical subroutines in a quantum computation must be performed
reversibly and without the production of garbage bits depending on the classical input.
Exercise 3.32: (From Fredkin to Toffoli and back again) What is the smallest
number of Fredkin gates needed to simulate a Toffoli gate? What is the smallest
number of Toffoli gates needed to simulate a Fredkin gate?
3.3 Perspectives on computer science
In a short introduction such as this chapter, it is not remotely possible to cover in detail
all the great ideas of a field as rich as computer science. We hope to have conveyed
to you something of what it means to think like a computer scientist, and provided a
basic vocabulary and overview of some of the fundamental concepts important in the
understanding of computation. To conclude this chapter, we briefly touch on some more
general issues, in order to provide some perspective on how quantum computation and
quantum information fits into the overall picture of computer science.
Our discussion has revolved around the Turing machine model of computation. How
does the computational power of unconventional models of computation such as massively
parallel computers, DNA computers and analog computers compare with the standard
162
Introduction to computer science
Box 3.5: Maxwell’s demon
The laws of thermodynamics govern the amount of work that can be performed by
a physical system at thermodynamic equilibrium. One of these laws, the second law
of thermodynamics, states that the entropy in a closed system can never decrease.
In 1871, James Clerk Maxwell proposed the existence of a machine that apparently
violated this law. He envisioned a miniature little ‘demon’, like that shown in the
figure below, which could reduce the entropy of a gas cylinder initially at equilibrium
by individually separating the fast and slow molecules into the two halves of the
cylinder. This demon would sit at a little door at the middle partition. When a
fast molecule approaches from the left side the demon opens a door between the
partitions, allowing the molecule through, and then closes the door. By doing this
many times the total entropy of the cylinder can be decreased, in apparent violation
of the second law of thermodynamics.
The resolution to the Maxwell’s demon paradox lies in the fact that the demon must
perform measurements on the molecules moving between the partitions, in order
to determine their velocities. The result of this measurement must be stored in the
demon’s memory. Because any memory is finite, the demon must eventually begin
erasing information from its memory, in order to have space for new measurement
results. By Landauer’s principle, this act of erasing information increases the total
entropy of the combined system – demon, gas cylinder, and their environments. In
fact, a complete analysis shows that Landauer’s principle implies that the entropy of
the combined system is increased at least as much by this act of erasing information
as the entropy of the combined system is decreased by the actions of the demon,
thus ensuring that the second law of thermodynamics is obeyed.
Turing machine model of computation and, implicitly, with quantum computation? Let’s
begin with parallel computing architectures. The vast majority of computers in existence
are serial computers, processing instructions one at a time in some central processing
unit. By contrast, parallel computers can process more than one instruction at a time,
leading to a substantial savings in time and money for some applications. Nevertheless,
parallel processing does not offer any fundamental advantage over the standard Turing
machine model when issues of efficiency are concerned, because a Turing machine can
simulate a parallel computer with polynomially equivalent total physical resources – the
total space and time used by the computation. What a parallel computer gains in time,
Perspectives on computer science
163
it loses in the total spatial resources required to perform the computation, resulting in a
net of no essential change in the power of the computing model.
An interesting specific example of massively parallel computing is the technique of
DNA computing. A strand of DNA, deoxyribonucleic acid, is a molecule composed
of a sequence (a polymer) of four kinds of nucleotides distinguished by the bases they
carry, denoted by the letter A (adenine), C (cytosine), G (guanine) and T (thymine).
Two strands, under certain circumstances, can anneal to form a double strand, if the
respective base pairs form complements of each other (A matches T and G matches
C). The ends are also distinct and must match appropriately. Chemical techniques can
be used to amplify the number of strands beginning or ending with specific sequences
(polymerase chain reaction), separate the strands by length (gel electrophoresis), dissolve
double strands into single strands (changing temperature and pH), read the sequence on
a strand, cut strands at a specific position (restriction enzymes), and detect if a certain
sequence of DNA is in a test tube. The procedure for using these mechanisms in a robust
manner is rather involved, but the basic idea can be appreciated from an example.
The directed Hamiltonian path problem is a simple and equivalently hard variant of
the Hamiltonian cycle problem of Section 3.2.2, in which the goal is to determine if a
path exists or not between two specified vertices j1 and jN in a directed graph G of N
vertices, entering each vertex exactly once, and following only allowed edge directions.
This problem can be solved with a DNA computer using the following five steps, in
which xj are chosen to be unique sequences of bases (and x̄j their complements), DNA
strands xj xk encode edges, and strands x̄j x̄j encode vertices. (1) Generate random paths
through G, by combining a mixture of all possible vertex and edge DNA strands, and
waiting for the strands to anneal. (2) Keep only the paths beginning with j1 and ending
with jN , by amplifying only the double strands beginning with x̄j1 and ending with x̄jN .
(3) Select only paths of length N , by separating the strands according to their length. (4)
Select only paths which enter each vertex at least once, by dissolving the DNA into single
strands, and annealing with all possible vertex strands one at a time and filtering out only
those strands which anneal. And (5) detect if any strands have survived the selection
steps; if so, then a path exists, and otherwise, it does not. To ensure the answer is correct
with sufficiently high probability, xj may be chosen to contain many (≈ 30) bases, and
a large number (≈ 1014 or more are feasible) of strands are used in the reaction.
Heuristic methods are available to improve upon this basic idea. Of course, exhaustive
search methods such as this only work as long as all possible paths can be generated
efficiently, and thus the number of molecules used must grow exponentially as the size of
the problem (the number of vertices in the example above). DNA molecules are relatively
small and readily synthesized, and the huge number of DNA combinations one can fit
into a test tube can stave off the exponential complexity cost increase for a while – up
to a few dozen vertices – but eventually the exponential cost limits the applicability of
this method. Thus, while DNA computing offers an attractive and physically realizable
model of computation for the solution of certain problems, it is a classical computing
technique and offers no essential improvement in principle over a Turing machine.
Analog computers offer a yet another paradigm for performing computation. A computer is analog when the physical representation of information it uses for computation
is based on continuous degrees of freedom, instead of zeroes and ones. For example,
a thermometer is an analog computer. Analog circuitry, using resistors, capacitors, and
amplifiers, is also said to perform analog computation. Such machines have an infinite
164
Introduction to computer science
resource to draw upon in the ideal limit, since continuous variables like position and
voltage can store an unlimited amount of information. But this is only true in the absence
of noise. The presence of a finite amount of noise reduces the number of distinguishable
states of a continuous variable to a finite number – and thus restricts analog computers
to the representation of a finite amount of information. In practice, noise reduces analog
computers to being no more powerful than conventional digital computers, and through
them Turing machines. One might suspect that quantum computers are just analog computers, because of the use of continuous parameters in describing qubit states; however,
it turns out that the effects of noise on a quantum computer can effectively be digitized.
As a result, their computational advantages remain even in the presence of a finite amount
of noise, as we shall see in Chapter 10.
What of the effects of noise on digital computers? In the early days of computation,
noise was a very real problem for computers. In some of the original computers a vacuum
tube would malfunction every few minutes. Even today, noise is a problem for computational devices such as modems and hard drives. Considerable effort was devoted to the
problem of understanding how to construct reliable computers from unreliable components. It was proven by von Neumann that this is possible with only a polynomial increase
in the resources required for computation. Ironically, however, modern computers use
none of those results, because the components of modern computers are fantastically
reliable. Failure rates of 10−17 and even less are common in modern electronic components. For this reason, failures happen so rarely that the extra effort required to protect
against them is not regarded as being worth making. On the other hand, we shall find
that quantum computers are very delicate machines, and will likely require substantial
application of error-correction techniques.
Different architectures may change the effects of noise. For example, if the effect of
noise is ignored, then changing to a computer architecture in which many operations are
performed in parallel may not change the number of operations which need to be done.
However, a parallel system may be substantially more resistant to noise, because the effects
of noise have less time to accumulate. Therefore, in a realistic analysis, the parallel version
of an algorithm may have some substantial advantages over a serial implementation.
Architecture design is a well developed field of study for classical computers. Hardly
anything similar has been developed along the same lines for quantum computers, but
the study of noise already suggests some desirable traits for future quantum computer
architectures, such as a high level of parallelism.
A fourth model of computation is distributed computation, in which two or more
spatially separated computational units are available to solve a computational problem.
Obviously, such a model of computation is no more powerful than the Turing machine
model in the sense that it can be efficiently simulated on a Turing machine. However,
distributed computation gives rise to an intriguing new resource challenge: how best to
utilize multiple computational units when the cost of communication between the units is
high. This problem of distributed computation becomes especially interesting as computers are connected through high speed networks; although the total computational capacity
of all the computers on a network might be extremely large, utilization of that potential
is difficult. Most interesting problems do not divide easily into independent chunks that
can be solved separately, and may frequently require global communication between different computational subsystems to exchange intermediate results or synchronize status.
The field of communication complexity has been developed to address such issues, by
Chapter problems
165
quantifying the cost of communication requirements in solving problems. When quantum resources are available and can be exchanged between distributed computers, the
communication costs can sometimes be greatly reduced.
A recurring theme through these concluding thoughts and through the entire book is
that despite the traditional independence of computer science from physical constraints,
ultimately physical laws have tremendous impact not only upon how computers are
realized, but also the class of problems they are capable of solving. The success of quantum
computation and quantum information as a physically reasonable alternative model of
computation questions closely held tenets of computer science, and thrusts notions of
computer science into the forefront of physics. The task of the remainder of this book is
to stir together ideas from these disparate fields, and to delight in what results!
Problem 3.1: (Minsky machines) A Minsky machine consists of a finite set of
registers, r1 , r2 , . . . , rk , each capable of holding an arbitrary non-negative
integer, and a program, made up of orders of one of two types. The first type has
the form:
O N M LH I J K /
/
The interpretation is that at point m in the program register rj is incremented
by one, and execution proceeds to point n in the program. The second type of
order has the form:
/
♥♥
♥♥♥
6
♥
♥
♥
♥♥♥
O N M HL ♥ I J K
PPP
P P P(
P( P P
PPP
The interpretation is that at point m in the program, register rj is decremented
if it contains a positive integer, and execution proceeds to point n in the
program. If register rj is zero then execution simply proceeds to point p in the
program. The program for the Minsky machine consists of a collection of such
orders, of a form like:
/
O N M LH I J K
❋ ❋❋❋
❋ ❋b
❋❋
/
❋❋
❋❋
❋❋
O N M LH I J K
The starting and all possible halting points for the program are conventionally
labeled zero. This program takes the contents of register r1 and adds them to
register r2 , while decrementing r1 to zero.
166
Introduction to computer science
(1) Prove that all (Turing) computable functions can be computed on a Minsky
machine, in the sense that given a computable function f (·) there is a
Minsky machine program that when the registers start in the state
(n, 0, . . . , 0) gives as output (f (n), 0, . . . , 0).
(2) Sketch a proof that any function which can be computed on a Minsky
machine, in the sense just defined, can also be computed on a Turing
machine.
Problem 3.2: (Vector games) A vector game is specified by a finite list of vectors,
all of the same dimension, and with integer co-ordinates. The game is to start
with a vector x of non-negative integer co-ordinates and to add to x the first
vector from the list which preserves the non-negativity of all the components,
and to repeat this process until it is no longer possible. Prove that for any
computable function f (·) there is a vector game which when started with the
vector (n, 0, . . . , 0) reaches (f (n), 0, . . . , 0). (Hint: Show that a vector game in
k + 2 dimensions can simulate a Minsky machine containing k registers.)
Problem 3.3: (Fractran) A Fractran program is defined by a list of positive rational
numbers q1 , . . . , qn . It acts on a positive integer m by replacing it by qi m, where
i is the least number such that qi m is an integer. If there is ever a time when
there is no i such that qi m is an integer, then execution stops. Prove that for any
computable function f (·) there is a Fractran program which when started with
2n reaches 2f (n) without going through any intermediate powers of 2. (Hint: use
the previous problem.)
Problem 3.4: (Undecidability of dynamical systems) A Fractran program is
essentially just a very simple dynamical system taking positive integers to
positive integers. Prove that there is no algorithm to decide whether such a
dynamical system ever reaches 1.
Problem 3.5: (Non-universality of two bit reversible logic) Suppose we are
trying to build circuits using only one and two bit reversible logic gates, and
ancilla bits. Prove that there are Boolean functions which cannot be computed in
this fashion. Deduce that the Toffoli gate cannot be simulated using one and two
bit reversible gates, even with the aid of ancilla bits.
Problem 3.6: (Hardness of approximation of
) Let r ≥ 1 and suppose that
which is guaranteed to find the
there is an approximation algorithm for
shortest tour among n cities to within a factor r. Let G = (V, E) be any graph on
by identifying cities with vertices in V , and
n vertices. Define an instance of
defining the distance between cities i and j to be 1 if (i, j) is an edge of G, and to
be ⌈r⌉|V | + 1 otherwise. Show that if the approximation algorithm is applied to
then it returns a Hamiltonian cycle for G if one exists, and
this instance of
otherwise returns a tour of length more than ⌈r⌉|V |. From the NP-completeness
it follows that no such approximation algorithm can exist unless P = NP.
of
Problem 3.7: (Reversible Turing machines)
(1) Explain how to construct a reversible Turing machine that can compute the
same class of functions as is computable on an ordinary Turing machine.
(Hint: It may be helpful to use a multi-tape construction.)
167
History and further reading
(2) Give general space and time bounds for the operation of your reversible
Turing machine, in terms of the time t(x) and space s(x) required on an
ordinary single-tape Turing machine to compute a function f (x).
Problem 3.8: (Find a hard-to-compute class of functions (Research)) Find a
natural class of functions on n inputs which requires a super-polynomial number
of Boolean gates to compute.
Problem 3.9: (Reversible PSPACE = PSPACE) It can be shown that the problem
, is PSPACE-complete. That is, every other
‘quantified satisfiability’, or
in polynomial time. The language
language in PSPACE can be reduced to
is defined to consist of all Boolean formulae ϕ in n variables x1 , . . . , xn ,
and in conjunctive normal form, such that:
∃x1 ∀x2 ∃x3 . . . ∀xn ϕ if n is even;
∃x1 ∀x2 ∃x3 . . . ∃xn ϕ if n is odd.
(3.9)
(3.10)
Prove that a reversible Turing machine operating in polynomial space can be
. Thus, the class of languages decidable by a computer
used to solve
operating reversibly in polynomial space is equal to PSPACE.
Problem 3.10: (Ancilla bits and efficiency of reversible computation) Let pm
be the mth prime number. Outline the construction of a reversible circuit which,
upon input of m and n such that n > m, outputs the product pm pn , that is
(m, n) → (pm pn , g(m, n)), where g(m, n) is the final state of the ancilla bits
used by the circuit. Estimate the number of ancilla qubits your circuit requires.
Prove that if a polynomial (in log n) size reversible circuit can be found that uses
O(log(log n)) ancilla bits then the problem of factoring a product of two prime
numbers is in P.
History and further reading
Computer science is a huge subject with many interesting subfields. We cannot hope
for any sort of completeness in this brief space, but instead take the opportunity to
recommend a few titles of general interest, and some works on subjects of specific interest
in relation to topics covered in this book, with the hope that they may prove stimulating.
Modern computer science dates to the wonderful 1936 paper of Turing[Tur36]. The
Church–Turing thesis was first stated by Church[Chu36] in 1936, and was then given
a more complete discussion from a different point of view by Turing. Several other
researchers found their way to similar conclusions at about the same time. Many of
these contributions and a discussion of the history may be found in a volume edited
by Davis[Dav65]. Provocative discussions of the Church–Turing thesis and undecidability
may be found in Hofstadter[Hof79] and Penrose[Pen89].
There are many excellent books on algorithm design. We mention only three. First,
there is the classic series by Knuth[Knu97, Knu98a, Knu98b] which covers an enormous portion
of computer science. Second, there is the marvelous book by Cormen, Leiserson, and
Rivest[CLR90]. This huge book contains a plethora of well-written material on many areas
168
Introduction to computer science
of algorithm design. Finally, the book of Motwani and Raghavan[MR95] is an excellent
survey of the field of randomized algorithms.
The modern theory of computational complexity was especially influenced by the
papers of Cook[Coo71] and Karp[Kar72]. Many similar ideas were arrived at independently
in Russia by Levin[Lev73], but unfortunately took time to propagate to the West. The
classic book by Garey and Johnson[GJ79] has also had an enormous influence on the
field. More recently, Papadimitriou[Pap94] has written a beautiful book that surveys many
of the main ideas of computational complexity theory. Much of the material in this
chapter is based upon Papadimitriou’s book. In this chapter we considered only one type
of reducibility between languages, polynomial time reducibility. There are many other
notions of reductions between languages. An early survey of these notions was given by
Ladner, Lynch and Selman[LLS75]. The study of different notions of reducibility later
blossomed into a subfield of research known as structural complexity, which has been
reviewed by Balcázar, Diaz, and Gabarró[BDG88a, BDG88b].
The connection between information, energy dissipation, and computation has a long
history. The modern understanding is due to a 1961 paper by Landauer[Lan61], in which
Landauer’s principle was first formulated. A paper by Szilard[Szi29] and a 1949 lecture
by von Neumann[von66] (page 66) arrive at conclusions close to Landauer’s principle, but
do not fully grasp the essential point that it is the erasure of information that requires
dissipation.
Reversible Turing machines were invented by Lecerf[Lec63] and later, but independently, in an influential paper by Bennett[Ben73]. Fredkin and Toffoli[FT82] introduced
reversible circuit models of computation. Two interesting historical documents are Barton’s May, 1978 MIT 6.895 term paper[Bar78], and Ressler’s 1981 Master’s thesis[Res81],
which contain designs for a reversible PDP-10! Today, reversible logic is potentially
important in implementations of low-power CMOS circuitry[YK95].
Maxwell’s demon is a fascinating subject, with a long and intricate history. Maxwell
proposed his demon in 1871[Max71]. Szilard published a key paper in 1929[Szi29] which anticipated many of the details of the final resolution of the problem of Maxwell’s demon.
In 1965 Feynman[FLS65b] resolved a special case of Maxwell’s demon. Bennett, building on Landauer’s work[Lan61], wrote two beautiful papers on the subject[BBBW82, Ben87]
which completed the resolution of the problem. An interesting book about the history of
Maxwell’s demon and its exorcism is the collection of papers by Leff and Rex[LR90].
DNA computing was invented by Adleman, and the solution of the directed Hamiland
tonian path problem we describe is his[Adl94]. Lipton has also shown how
circuit satisfiability can be solved in this model[Lip95]. A good general article is Adleman’s
Scientific American article[Adl98]; for an insightful look into the universality of DNA
operations, see Winfree[Win98]. An interesting place to read about performing reliable
computation in the presence of noise is the book by Winograd and Cowan[WC67]. This
topic will be addressed again in Chapter 10. A good textbook on computer architecture
is by Hennessey, Goldberg, and Patterson.[HGP96].
Problems 3.1 through 3.4 explore a line of thought originated by Minsky (in his
beautiful book on computational machines[Min67]) and developed by Conway[Con72, Con86].
The Fractran programming language is certainly one of the most beautiful and elegant
universal computational models known, as demonstrated by the following example, known
History and further reading
169
as PRIMEGAME[Con86]. PRIMEGAME is defined by the list of rational numbers:
17 78 19 23 29 77 95 77 1 11 13 15 1 55
; ; ; ; ; ; ; ; ; ; ; ; ; .
(3.11)
91 85 51 38 33 29 23 19 17 13 11 2 7 1
Amazingly, when PRIMEGAME is started at 2, the other powers of 2 that appear,
namely, 22 , 23 , 25 , 27 , 211 , 213 , . . . , are precisely the prime powers of 2, with the powers
stepping through the prime numbers, in order. Problem 3.9 is a special case of the more
general subject of the spatial requirements for reversible computation. See the papers by
Bennett[Ben89], and by Li, Tromp and Vitanyi[LV96, LTV98].
II Quantum computation
4 Quantum circuits
The theory of computation has traditionally been studied almost entirely in
the abstract, as a topic in pure mathematics. This is to miss the point of it.
Computers are physical objects, and computations are physical processes. What
computers can or cannot compute is determined by the laws of physics alone,
and not by pure mathematics.
– David Deutsch
Like mathematics, computer science will be somewhat different from the other
sciences, in that it deals with artificial laws that can be proved, instead of
natural laws that are never known with certainty.
– Donald Knuth
The opposite of a profound truth may well be another profound truth.
– Niels Bohr
This chapter begins Part II of the book, in which we explore quantum computation in
detail. The chapter develops the fundamental principles of quantum computation, and
establishes the basic building blocks for quantum circuits, a universal language for describing sophisticated quantum computations. The two fundamental quantum algorithms
known to date are constructed from these circuits in the following two chapters. Chapter 5 presents the quantum Fourier transform and its applications to phase estimation,
order-finding and factoring. Chapter 6 describes the quantum search algorithm, and its
applications to database search, counting and speedup of solutions to NP-complete problems. Chapter 7 concludes Part II with a discussion of how quantum computation may
one day be experimentally realized. Two other topics of great interest for quantum computation, quantum noise and quantum error-correction, are deferred until Part III of the
book, in view of their wide interest also outside quantum computation.
There are two main ideas introduced in this chapter. First, we explain in detail the
fundamental model of quantum computation, the quantum circuit model. Second, we
demonstrate that there exists a small set of gates which are universal, that is, any quantum
computation whatsoever can be expressed in terms of those gates. Along the way we also
have occasion to describe many other basic results of quantum computation. Section 4.1
begins the chapter with an overview of quantum algorithms, focusing on what algorithms
are known, and the unifying techniques underlying their construction. Section 4.2 is a
detailed study of single qubit operations. Despite their simplicity, single qubit operations
offer a rich playground for the construction of examples and techniques, and it is essential
to understand them in detail. Section 4.3 shows how to perform multi-qubit controlled
unitary operations, and Section 4.4 discusses the description of measurement in the
quantum circuits model. These elements are then brought together in Section 4.5 for the
statement and proof of the universality theorem. We summarize all the basic elements
172
Quantum circuits
of quantum computation in Section 4.6, and discuss possible variants of the model, and
the important question of the relationship in computational power between classical and
quantum computers. Section 4.7 concludes the chapter with an important and instructive
application of quantum computation to the simulation of real quantum systems.
This chapter is perhaps the most reader-intensive of all the chapters in the book, with
a high density of exercises for you to complete, and it is worth explaining the reason for
this intensity. Obtaining facility with the basic elements of the quantum circuit model
of computation is quite easy, but requires assimilating a large number of simple results
and techniques that must become second nature if one is to progress to the more difficult
problem of designing quantum algorithms. For this reason we take an example-oriented
approach in this chapter, and ask you to fill in many of the details, in order to acquire
such a facility. A less intensive, but somewhat superficial overview of the basic elements
of quantum computation may be obtained by skipping to Section 4.6.
4.1 Quantum algorithms
What is a quantum computer good for? We’re all familiar with the frustration of needing
more computer resources to solve a computational problem. Practically speaking, many
interesting problems are impossible to solve on a classical computer, not because they
are in principle insoluble, but because of the astronomical resources required to solve
realistic cases of the problem.
The spectacular promise of quantum computers is to enable new algorithms which
render feasible problems requiring exorbitant resources for their solution on a classical
computer. At the time of writing, two broad classes of quantum algorithms are known
which fulfill this promise. The first class of algorithms is based upon Shor’s quantum
Fourier transform, and includes remarkable algorithms for solving the factoring and discrete logarithm problems, providing a striking exponential speedup over the best known
classical algorithms. The second class of algorithms is based upon Grover’s algorithm
for performing quantum searching. These provide a less striking but still remarkable
quadratic speedup over the best possible classical algorithms. The quantum searching
algorithm derives its importance from the widespread use of search-based techniques in
classical algorithms, which in many instances allows a straightforward adaptation of the
classical algorithm to give a faster quantum algorithm.
Figure 4.1 sketches the state of knowledge about quantum algorithms at the time of
writing, including some sample applications of those algorithms. Naturally, at the core of
the diagram are the quantum Fourier transform and the quantum searching algorithm.
Of particular interest in the figure is the quantum counting algorithm. This algorithm is
a clever combination of the quantum searching and Fourier transform algorithms, which
can be used to estimate the number of solutions to a search problem more quickly than
is possible on a classical computer.
The quantum searching algorithm has many potential applications, of which but a few
are illustrated. It can be used to extract statistics, such as the minimal element, from
an unordered data set, more quickly than is possible on a classical computer. It can be
used to speed up algorithms for some problems in NP – specifically, those problems for
which a straightforward search for a solution is the best algorithm known. Finally, it can
be used to speed up the search for keys to cryptosystems such as the widely used Data
Encryption Standard (DES). These and other applications are explained in Chapter 6.
Quantum algorithms
Quantum
search
Statistics
mean,median,min
173
Fourier o / Hidden subgroup
problem
transform
OOO
22
KKK
s
o
OOO
KKssss
ooo
22
O'
wooo
ss KKKK
22
s
KKK
ss
Quantum
22
yss
%
22
counting
Order-finding
Discrete
log
22
'' GG
ww
22
'' GGGG
ww
22
#
{ww
'
22
'' Factoring
22
''
22
''
22
''
22
''
22
'
Speed up for some
Search for
Break cryptosystems
crypto keys
NP problems
(RSA)
Figure 4.1. The main quantum algorithms and their relationships, including some notable applications.
The quantum Fourier transform also has many interesting applications. It can be used
to solve the discrete logarithm and factoring problems. These results, in turn, enable a
quantum computer to break many of the most popular cryptosystems now in use, including the RSA cryptosystem. The Fourier transform also turns out to be closely related
to an important problem in mathematics, finding a hidden subgroup (a generalization of
finding the period of a periodic function). The quantum Fourier transform and several of
its applications, including fast quantum algorithms for factoring and discrete logarithm,
are explained in Chapter 5.
Why are there so few quantum algorithms known which are better than their classical
counterparts? The answer is that coming up with good quantum algorithms seems to be
a difficult problem. There are at least two reasons for this. First, algorithm design, be
it classical or quantum, is not an easy business! The history of algorithms shows us that
considerable ingenuity is often required to come up with near optimal algorithms, even for
apparently very simple problems, like the multiplication of two numbers. Finding good
quantum algorithms is made doubly difficult because of the additional constraint that we
want our quantum algorithms to be better than the best known classical algorithms. A
second reason for the difficulty of finding good quantum algorithms is that our intuitions
are much better adapted to the classical world than they are to the quantum world. If
we think about problems using our native intuition, then the algorithms which we come
up with are going to be classical algorithms. It takes special insights and special tricks to
come up with good quantum algorithms.
Further study of quantum algorithms will be postponed until the next chapter. In this
chapter we provide an efficient and powerful language for describing quantum algorithms,
the language of quantum circuits – assemblies of discrete sets of components which
describe computational procedures. This construction will enable us to quantify the cost
of an algorithm in terms of things like the total number of gates required, or the circuit
depth. The circuit language also comes with a toolbox of tricks that simplifies algorithm
design and provides ready conceptual understanding.
174
Quantum circuits
4.2 Single qubit operations
The development of our quantum computational toolkit begins with operations on the
simplest quantum system of all – a single qubit. Single qubit gates were introduced in
Section 1.3.1. Let us quickly summarize what we learned there; you may find it useful
to refer to the notes on notation on page xxiii as we go along.
A single qubit is a vector |ψ = a|0 + b|1 parameterized by two complex numbers
satisfying |a|2 + |b|2 = 1. Operations on a qubit must preserve this norm, and thus are
described by 2×2 unitary matrices. Of these, some of the most important are the Pauli
matrices; it is useful to list them again here:
0 1
1 0
0 −i
; Z≡
; Y ≡
.
(4.1)
X≡
i 0
1 0
0 −1
Three other quantum gates will play a large part in what follows, the Hadamard gate
(denoted H), phase gate (denoted S), and π/8 gate (denoted T ):
1
1
0
1 0
1 1
.
(4.2)
; T =
; S=
H=√
0 exp(iπ/4)
0 i
2 1 −1
√
A couple of useful algebraic facts to keep in mind are that H = (X + Z)/ 2 and S = T 2 .
You might wonder why the T gate is called the π/8 gate when it is π/4 that appears in
the definition. The reason is that the gate has historically often been referred to as the
π/8 gate, simply because up to an unimportant global phase T is equal to a gate which
has exp(±iπ/8) appearing on its diagonals.
exp(−iπ/8)
0
T = exp(iπ/8)
.
(4.3)
0
exp(iπ/8)
Nevertheless, the nomenclature is in some respects rather unfortunate, and we often refer
to this gate as the T gate.
Recall also that a single qubit in the state a|0 + b|1 can be visualized as a point (θ, ϕ)
on the unit sphere, where a = cos(θ/2), b = eiϕ sin(θ/2), and a can be taken to be real
because the overall phase of the state is unobservable. This is called the Bloch sphere
representation, and the vector (cos ϕ sin θ, sin ϕ sin θ, cos θ) is called the Bloch vector.
We shall return to this picture often as an aid to intuition.
Exercise 4.1: In Exercise 2.11, which you should do now if you haven’t already done
it, you computed the eigenvectors of the Pauli matrices. Find the points on the
Bloch sphere which correspond to the normalized eigenvectors of the different
Pauli matrices.
The Pauli matrices give rise to three useful classes of unitary matrices when they are
exponentiated, the rotation operators about the x̂, ŷ, and ẑ axes, defined by the equations:
θ
θ
cos θ2
−i sin θ2
−iθX/2
(4.4)
= cos I − i sin X =
Rx (θ) ≡ e
cos θ2
−i sin θ2
2
2
θ
θ
cos θ2 − sin θ2
Ry (θ) ≡ e−iθY /2 = cos I − i sin Y =
(4.5)
cos θ2
sin θ2
2
2
−iθ/2
θ
θ
e
0
−iθZ/2
Rz (θ) ≡ e
.
(4.6)
= cos I − i sin Z =
0
eiθ/2
2
2
175
Single qubit operations
Exercise 4.2: Let x be a real number and A a matrix such that A2 = I. Show that
exp(iAx) = cos(x)I + i sin(x)A.
(4.7)
Use this result to verify Equations (4.4) through (4.6).
Exercise 4.3: Show that, up to a global phase, the π/8 gate satisfies T = Rz (π/4).
Exercise 4.4: Express the Hadamard gate H as a product of Rx and Rz rotations and
eiϕ for some ϕ.
If n̂ = (nx , ny , nz ) is a real unit vector in three dimensions then we generalize the
previous definitions by defining a rotation by θ about the n̂ axis by the equation
θ
θ
I − i sin
(nx X + ny Y + nz Z) , (4.8)
Rn̂ (θ) ≡ exp(−iθ n̂ · σ/2) = cos
2
2
where σ denotes the three component vector (X, Y, Z) of Pauli matrices.
Exercise 4.5: Prove that (n̂ · σ)2 = I, and use this to verify Equation (4.8).
Exercise 4.6: (Bloch sphere interpretation of rotations) One reason why the
Rn̂ (θ) operators are referred to as rotation operators is the following fact, which
you are to prove. Suppose a single qubit has a state represented by the Bloch
vector λ. Then the effect of the rotation Rn̂ (θ) on the state is to rotate it by an
angle θ about the n̂ axis of the Bloch sphere. This fact explains the rather
mysterious looking factor of two in the definition of the rotation matrices.
Exercise 4.7: Show that XY X = −Y and use this to prove that
XRy (θ)X = Ry (−θ).
Exercise 4.8: An arbitrary single qubit unitary operator can be written in the form
U = exp(iα)Rn̂ (θ)
(4.9)
for some real numbers α and θ, and a real three-dimensional unit vector n̂.
1. Prove this fact.
2. Find values for α, θ, and n̂ giving the Hadamard gate H.
3. Find values for α, θ, and n̂ giving the phase gate
1 0
.
S=
0 i
(4.10)
An arbitrary unitary operator on a single qubit can be written in many ways as a
combination of rotations, together with global phase shifts on the qubit. The following
theorem provides a means of expressing an arbitrary single qubit rotation that will be
particularly useful in later applications to controlled operations.
Theorem 4.1: (Z-Y decomposition for a single qubit) Suppose U is a unitary
operation on a single qubit. Then there exist real numbers α, β, γ and δ such that
U = eiα Rz (β)Ry (γ)Rz (δ).
(4.11)
176
Quantum circuits
Proof
Since U is unitary, the rows and columns of U are orthonormal, from which it follows
that there exist real numbers α, β, γ,and δ such that
i(α−β/2−δ/2)
e
cos γ2 −ei(α−β/2+δ/2) sin γ2
U=
.
(4.12)
ei(α+β/2+δ/2) cos γ2
ei(α+β/2−δ/2) sin γ2
Equation (4.11) now follows immediately from the definition of the rotation matrices and
matrix multiplication.
Exercise 4.9: Explain why any single qubit unitary operator may be written in the
form (4.12).
Exercise 4.10: (X-Y decomposition of rotations) Give a decomposition
analogous to Theorem 4.1 but using Rx instead of Rz .
Exercise 4.11: Suppose m̂ and n̂ are non-parallel real unit vectors in three
dimensions. Use Theorem 4.1 to show that an arbitrary single qubit unitary U
may be written
U = eiα Rn̂ (β)Rm̂ (γ)Rn̂ (δ),
(4.13)
for appropriate choices of α, β, γ and δ.
The utility of Theorem 4.1 lies in the following mysterious looking corollary, which
is the key to the construction of controlled multi-qubit unitary operations, as explained
in the next section.
Corollary 4.2: Suppose U is a unitary gate on a single qubit. Then there exist unitary
operators A, B, C on a single qubit such that ABC = I and U = eiα AXBXC,
where α is some overall phase factor.
Proof
In the notation of Theorem 4.1, set A ≡ Rz (β)Ry (γ/2), B ≡ Ry (−γ/2)Rz (−(δ + β)/2)
and C ≡ Rz ((δ − β)/2). Note that
' γ(
'γ (
δ+β
δ−β
Ry −
Rz −
Rz
=I.
(4.14)
ABC = Rz (β)Ry
2
2
2
2
Since X 2 = I, and using Exercise 4.7, we see that
' γ(
'γ (
δ+β
δ+β
XXRz −
X = Ry
Rz
.
XBX = XRy −
2
2
2
2
Thus
'γ (
'γ (
Ry
Rz
2
2
= Rz (β)Ry (γ)Rz (δ) .
AXBXC = Rz (β)Ry
δ+β
2
Rz
Thus U = eiα AXBXC and ABC = I, as required.
Exercise 4.12: Give A, B, C, and α for the Hadamard gate.
δ−β
2
(4.15)
(4.16)
(4.17)
177
Controlled operations
Exercise 4.13: (Circuit identities) It is useful to be able to simplify circuits by
inspection, using well-known identities. Prove the following three identities:
HXH = Z; HY H = −Y ; HZH = X.
(4.18)
Exercise 4.14: Use the previous exercise to show that HT H = Rx (π/4), up to a
global phase.
Exercise 4.15: (Composition of single qubit operations) The Bloch
representation gives a nice way to visualize the effect of composing two rotations.
(1) Prove that if a rotation through an angle β1 about the axis n̂1 is followed by a
rotation through an angle β2 about an axis n̂2 , then the overall rotation is
through an angle β12 about an axis n̂12 given by
c12 = c1 c2 − s1 s2 n̂1 · n̂2
(4.19)
s12 n̂12 = s1 c2 n̂1 + c1 s2 n̂2 − s1 s2 n̂2 × n̂1 ,
(4.20)
where ci = cos(βi /2), si = sin(βi /2), c12 = cos(β12 /2), and s12 = sin(β12 /2).
(2) Show that if β1 = β2 and n̂1 = ẑ these equations simplify to
c12 = c2 − s2 ẑ · n̂2
(4.21)
2
s12 n̂12 = sc(ẑ + n̂2 ) − s n̂2 × ẑ ,
(4.22)
where c = c1 and s = s1 .
Symbols for the common single qubit gates are shown in Figure 4.2. Recall the basic
properties of quantum circuits: time proceeds from left to right; wires represent qubits,
and a ‘/’ may be used to indicate a bundle of qubits.
Hadamard
Pauli-X
Pauli-Y
Pauli-Z
Phase
π/8
1 1 1
√
2 1 −1
0 1
1 0
0 −i
i 0
1 0
0 −1
1 0
0 i
1
0
0 eiπ/4
Figure 4.2. Names, symbols, and unitary matrices for the common single qubit gates.
4.3 Controlled operations
‘If A is true, then do B’. This type of controlled operation is one of the most useful in
computing, both classical and quantum. In this section we explain how complex controlled
operations may be implemented using quantum circuits built from elementary operations.
178
Quantum circuits
The prototypical controlled operation is the controlled, which we met in Sec, is a quantum gate
tion 1.2.1. Recall that this gate, which we’ll often refer to as
with two input qubits, known as the control qubit and target qubit, respectively. It is
drawn as shown in Figure 4.3. In terms of the computational basis, the action of the
is given by |c|t → |c|t ⊕ c; that is, if the control qubit is set to |1 then the
target qubit is flipped, otherwise the target qubit is left alone. Thus, in the computational
is
basis |control, target the matrix representation of
⎤
⎡
1 0 0 0
⎢ 0 1 0 0 ⎥
⎥
⎢
(4.23)
⎣ 0 0 0 1 .
0 0 1 0
Figure 4.3. Circuit representation for the controlledbottom line the target qubit.
gate. The top line represents the control qubit, the
More generally, suppose U is an arbitrary single qubit unitary operation. A controlledU operation is a two qubit operation, again with a control and a target qubit. If the control
qubit is set then U is applied to the target qubit, otherwise the target qubit is left alone;
that is, |c|t → |cU c |t. The controlled-U operation is represented by the circuit shown
in Figure 4.4.
Figure 4.4. Controlled-U operation. The top line is the control qubit, and the bottom line is the target qubit. If the
control qubit is set then U is applied to the target, otherwise it is left alone.
Exercise 4.16: (Matrix representation of multi-qubit gates) What is the 4×4
unitary matrix for the circuit
in the computational basis? What is the unitary matrix for the circuit
Controlled operations
179
in the computational basis?
from controlled-Z gates) Construct a
gate
Exercise 4.17: (Building
from one controlled-Z gate, that is, the gate whose action in the computational
basis is specified by the unitary matrix
⎡
⎤
1 0 0 0
⎢0 1 0 0 ⎥
⎢
⎥
⎣0 0 1 0 ,
0 0 0 −1
and two Hadamard gates, specifying the control and target qubits.
Exercise 4.18: Show that
Exercise 4.19: (
action on density matrices) The
gate is a simple
permutation whose action on a density matrix ρ is to rearrange the elements in
the matrix. Write out this action explicitly in the computational basis.
basis transformations) Unlike ideal classical gates, ideal
Exercise 4.20: (
quantum gates do not have (as electrical engineers say) ‘high-impedance’ inputs.
In fact, the role of ‘control’ and ‘target’ are arbitrary – they depend on what basis
behaves
you think of a device as operating in. We have described how the
with respect to the computational basis, and in this description the state of the
control qubit is not changed. However, if we work in a different basis then the
control qubit does change: we will show that its phase is flipped depending on
the state of the ‘target’ qubit! Show that
√
Introducing basis states |± ≡ (|0 ± |1)/ 2, use this circuit identity to show
with the first qubit as control and the second qubit as
that the effect of a
target is as follows:
|+|+ → |+|+
|−|+ → |−|+
|+|− → |−|−
|−|− → |+|−.
(4.24)
(4.25)
(4.26)
(4.27)
Thus, with respect to this new basis, the state of the target qubit is not changed,
while the state of the control qubit is flipped if the target starts as |−, otherwise
180
Quantum circuits
it is left alone. That is, in this basis, the target and control have essentially
interchanged roles!
Our immediate goal is to understand how to implement the controlled-U operation
gate. Our
for arbitrary single qubit U , using only single qubit operations and the
strategy is a two-part procedure based upon the decomposition U = eiα AXBXC given
in Corollary 4.2 on page 176.
Our first step will be to apply the phase shift exp(iα) on the target qubit, controlled
by the control qubit. That is, if the control qubit is |0, then the target qubit is left alone,
while if the control qubit is |1, a phase shift exp(iα) is applied to the target. A circuit
implementing this operation using just a single qubit unitary gate is depicted on the right
hand side of Figure 4.5. To verify that this circuit works correctly, note that the effect
of the circuit on the right hand side is
|00 → |00, |01 → |01,
|10 → eiα |10, |11 → eiα |11,
(4.28)
which is exactly what is required for the controlled operation on the left hand side.
Figure 4.5. Controlled phase shift gate and an equivalent circuit for two qubits.
We may now complete the construction of the controlled-U operation, as shown in
Figure 4.6. To understand why this circuit works, recall from Corollary 4.2 that U
may be written in the form U = eiα AXBXC, where A, B and C are single qubit
operations such that ABC = I. Suppose that the control qubit is set. Then the operation
eiα AXBXC = U is applied to the second qubit. If, on the other hand, the control qubit
is not set, then the operation ABC = I is applied to the second qubit; that is, no change
is made. That is, this circuit implements the controlled-U operation.
Now that we know how to condition on a single qubit being set, what about conditioning on multiple qubits? We’ve already met one example of multiple qubit conditioning,
the Toffoli gate, which flips the third qubit, the target qubit, conditioned on the first
two qubits, the control qubits, being set to one. More generally, suppose we have n + k
qubits, and U is a k qubit unitary operator. Then we define the controlled operation
C n (U ) by the equation
C n (U )|x1 x2 . . . xn |ψ = |x1 x2 . . . xn U x1 x2 ...xn |ψ ,
(4.29)
where x1 x2 . . . xn in the exponent of U means the product of the bits x1 , x2 , . . . , xn .
That is, the operator U is applied to the last k qubits if the first n qubits are all equal
to one, and otherwise, nothing is done. Such conditional operations are so useful that we
Controlled operations
181
Figure 4.6. Circuit implementing the controlled-U operation for single qubit U . α, A, B and C satisfy
U = exp(iα)AXBXC, ABC = I.
introduce a special circuit notation for them, illustrated in Figure 4.7. For the following
we assume that k = 1, for simplicity. Larger k can be dealt with using essentially the
same methods, however for k ≥ 2 there is the added complication that we don’t (yet)
know how to perform arbitrary operations on k qubits.
Figure 4.7. Sample circuit representation for the C n (U ) operation, where U is a unitary operator on k qubits, for
n = 4 and k = 3.
Suppose U is a single qubit unitary operator, and V is a unitary operator chosen so
that V 2 = U . Then the operation C 2 (U ) may be implemented using the circuit shown
in Figure 4.8.
Exercise 4.21: Verify that Figure 4.8 implements the C 2 (U ) operation.
Exercise 4.22: Prove that a C 2 (U ) gate (for any single qubit unitary U ) can be
s.
constructed using at most eight one-qubit gates, and six controlledExercise 4.23: Construct a C 1 (U ) gate for U = Rx (θ) and U = Ry (θ), using only
and single qubit gates. Can you reduce the number of single qubit gates
needed in the construction from three to two?
The familiar Toffoli gate is an especially important special case of the C 2 (U ) operation,
182
Quantum circuits
Figure 4.8. Circuit for the C 2 (U ) gate. V is any unitary operator satisfying V 2 = U . The special case
V ≡ (1 − i)(I + iX)/2 corresponds to the Toffoli gate.
the case C 2 (X). Defining V ≡ (1 − i)(I + iX)/2 and noting that V 2 = X, we see that
Figure 4.8 gives an implementation of the Toffoli gate in terms of one and two qubit
operations. From a classical viewpoint this is a remarkable result; recall from Problem 3.5
that one and two bit classical reversible gates are not sufficient to implement the Toffoli
gate, or, more generally, universal computation. By contrast, in the quantum case we see
that one and two qubit reversible gates are sufficient to implement the Toffoli gate, and
will eventually prove that they suffice for universal computation.
Ultimately we will show that any unitary operation can be composed to an arbitrarily
and π/8 gates.
good approximation from just the Hadamard, phase, controlledBecause of the great usefulness of the Toffoli gate it is interesting to see how it can be
built from just this gate set. Figure 4.9 illustrates a simple circuit for the Toffoli gate
made up of just Hadamard, phase, controlledand π/8 gates.
•
•
⊕
•
=
•
•
• T
T†
⊕
T† ⊕ S
H ⊕ T† ⊕ T ⊕ T† ⊕ T
H
•
•
Figure 4.9. Implementation of the Toffoli gate using Hadamard, phase, controlled-
and π/8 gates.
Exercise 4.24: Verify that Figure 4.9 implements the Toffoli gate.
Exercise 4.25: (Fredkin gate construction) Recall that the Fredkin
(controlled-swap) gate performs the transform
⎡1 0 0 0 0 0 0 0⎤
⎢0
⎢
⎢0
⎢
⎢0
⎢
⎢0
⎢
⎢0
⎢
⎣0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0⎥
⎥
0⎥
⎥
0⎥
⎥.
0⎥
⎥
0⎥
⎥
0
1
(4.30)
183
Controlled operations
(1) Give a quantum circuit which uses three Toffoli gates to construct the
Fredkin gate (Hint: think of the swap gate construction – you can control
each gate, one at a time).
gates.
(2) Show that the first and last Toffoli gates can be replaced by
(3) Now replace the middle Toffoli gate with the circuit in Figure 4.8 to obtain
a Fredkin gate construction using only six two-qubit gates.
(4) Can you come up with an even simpler construction, with only five
two-qubit gates?
Exercise 4.26: Show that the circuit:
•
•
Ry π/
⊕ Ry π/
•
⊕ Ry −π/
⊕ Ry −π/
differs from a Toffoli gate only by relative phases. That is, the circuit takes
|c1 , c2 , t to eiθ(c1 ,c2 ,t) |c1 , c2 , t ⊕ c1 · c2 , where eiθ(c1 ,c2 ,t) is some relative phase
factor. Such gates can sometimes be useful in experimental implementations,
where it may be much easier to implement a gate that is the same as the Toffoli
up to relative phases than it is to do the Toffoli directly.
s and Toffoli
Exercise 4.27: Using just
perform the transformation
⎡1 0 0 0
⎢0 0 0 0
⎢
⎢0 1 0 0
⎢
⎢0 0 1 0
⎢
⎢0 0 0 1
⎢
⎢0 0 0 0
⎢
⎣0 0 0 0
0 0 0 0
gates, construct a quantum circuit to
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
0⎤
1⎥
⎥
0⎥
⎥
0⎥
⎥.
0⎥
⎥
0⎥
⎥
0
0
(4.31)
This kind of partial cyclic permutation operation will be useful later, in
Chapter 7.
How may we implement C n (U ) gates using our existing repertoire of gates, where U
is an arbitrary single qubit unitary operation? A particularly simple circuit for achieving
this task is illustrated in Figure 4.10. The circuit divides up into three stages, and makes
use of a small number (n − 1) of working qubits, which all start and end in the state
|0. Suppose the control qubits are in the computational basis state |c1 , c2 , . . . , cn . The
all the control bits c1 , . . . , cn together to
first stage of the circuit is to reversibly
s c1 and
produce the product c1 · c2 . . . cn . To do this, the first gate in the circuit
c2 together, using a Toffoli gate, changing the state of the first work qubit to |c1 · c2 .
s c3 with the product c1 · c2 , changing the state of the second
The next Toffoli gate
work qubit to |c1 · c2 · c3 . We continue applying Toffoli gates in this fashion, until the
final work qubit is in the state |c1 · c2 . . . cn . Next, a U operation on the target qubit is
184
Quantum circuits
performed, conditional on the final work qubit being set to one. That is, U is applied if
and only if all of c1 through cn are set. Finally, the last part of the circuit just reverses
the steps of the first stage, returning all the work qubits to their initial state, |0. The
combined result, therefore, is to apply the unitary operator U to the target qubit, if and
only if all the control bits c1 through cn are set, as desired.
Figure 4.10. Network implementing the C n (U ) operation, for the case n = 5.
Exercise 4.28: For U = V 2 with V unitary, construct a C 5 (U ) gate analogous to that
in Figure 4.10, but using no work qubits. You may use controlled-V and
controlled-V † gates.
Exercise 4.29: Find a circuit containing O(n2 ) Toffoli,
and single qubit gates
which implements a C n (X) gate (for n > 3), using no work qubits.
Exercise 4.30: Suppose U is a single qubit unitary operation. Find a circuit
and single qubit gates which implements a
containing O(n2 ) Toffoli,
C n (U ) gate (for n > 3), using no work qubits.
In the controlled gates we have been considering, conditional dynamics on the target
qubit occurs if the control bits are set to one. Of course, there is nothing special about
one, and it is often useful to consider dynamics which occur conditional on the control
bit being set to zero. For instance, suppose we wish to implement a two qubit gate in
which the second (‘target’) qubit is flipped, conditional on the first (‘control’) qubit being
set to zero. In Figure 4.11 we introduce a circuit notation for this gate, together with an
equivalent circuit in terms of the gates we have already introduced. Generically we shall
use the open circle notation to indicate conditioning on the qubit being set to zero, while
a closed circle indicates conditioning on the qubit being set to one.
A more elaborate example of this convention, involving three control qubits, is illustrated in Figure 4.12. The operation U is applied to the target qubit if the first and third
qubits are set to zero, and the second qubit is set to one. It is easy to verify by inspection
that the circuit on the right hand side of the figure implements the desired operation.
More generally, it is easy to move between circuits which condition on qubits being set
Measurement
185
Figure 4.11. Controlled operation with a
qubit being set to zero.
gate being performed on the second qubit, conditional on the first
to one and circuits which condition on qubits being set to zero, by insertion of X gates
in appropriate locations, as illustrated in Figure 4.12.
gates to have
Another convention which is sometimes useful is to allow controlledmultiple targets, as shown in Figure 4.13. This natural notation means that when the
control qubit is 1, then all the qubits marked with a ⊕ are flipped, and otherwise nothing
happens. It is convenient to use, for example, in constructing classical functions such as
permutations, or in encoders and decoders for quantum error-correction circuits, as we
shall see in Chapter 10.
Exercise 4.31: (More circuit identities) Let subscripts denote which qubit an
with qubit 1 the control qubit and qubit 2
operator acts on, and let C be a
the target qubit. Prove the following identities:
CX1 C = X1 X2
(4.32)
CY1 C = Y1 X2
(4.33)
CZ1 C = Z1
(4.34)
CX2 C = X2
(4.35)
CY2 C = Z1 Y2
(4.36)
CZ2 C = Z1 Z2
(4.37)
Rz,1 (θ)C = CRz,1 (θ)
(4.38)
Rx,2 (θ)C = CRx,2 (θ).
(4.39)
4.4 Measurement
A final element used in quantum circuits, almost implicitly sometimes, is measurement.
In our circuits, we shall denote a projective measurement in the computational basis
(Section 2.2.5) using a ‘meter’ symbol, illustrated in Figure 4.14. In the theory of quantum circuits it is conventional to not use any special symbols to denote more general
measurements, because, as explained in Chapter 2, they can always be represented by
unitary transforms with ancilla qubits followed by projective measurements.
There are two important principles that it is worth bearing in mind about quantum circuits. Both principles are rather obvious; however, they are of such great utility that they
are worth emphasizing early. The first principle is that classically conditioned operations
can be replaced by quantum conditioned operations:
186
Quantum circuits
Figure 4.12. Controlled-U operation and its equivalent in terms of circuit elements we already know how to
implement. The fourth qubit has U applied if the first and third qubits are set to zero, and the second qubit is set
to one.
•
⊕
⊕
≡
Figure 4.13. Controlled-
• •
⊕
⊕
gate with multiple targets.
Principle of deferred measurement: Measurements can always be moved from
an intermediate stage of a quantum circuit to the end of the circuit; if the
measurement results are used at any stage of the circuit then the classically
controlled operations can be replaced by conditional quantum operations.
Often, quantum measurements are performed as an intermediate step in a quantum
circuit, and the measurement results are used to conditionally control subsequent quantum gates. This is the case, for example, in the teleportation circuit of Figure 1.13 on
page 27. However, such measurements can always be moved to the end of the circuit.
Figure 4.15 illustrates how this may be done by replacing all the classical conditional
operations by corresponding quantum conditional operations. (Of course, some of the
interpretation of this circuit as performing ‘teleportation’ is lost, because no classical information is transmitted from Alice to Bob, but it is clear that the overall action of the
two quantum circuits is the same, which is the key point.)
The second principle is even more obvious – and surprisingly useful!
❴✤✤ ❴ ❴ ❴ ❴ ❴ L ❴ ❴ ✤✤
✤✤
✤✤
✙✙
✤✤
✤
✙✙
✤❴ ❴ ❴ ❴ ❴✙ ✙ ❴ ❴ ❴ ✤✤
Figure 4.14. Symbol for projective measurement on a single qubit. In this circuit nothing further is done with the
measurement result, but in more general quantum circuits it is possible to change later parts of the quantum
circuit, conditional on measurement outcomes in earlier parts of the circuit. Such a usage of classical information is
depicted using wires drawn with double lines (not shown here).
187
Measurement
❴✤✤ ❴ ❴ ❴ ❴ ❴ L
✤✤
✙✙
✤✤
✙✙
✤❴ ❴ ❴ ❴ ❴ ✙ ✙ ❴
❴✤✤ ❴ ❴ ❴ ❴ ❴ L
✤✤
✙✙
✤✤
✙✙
✤❴ ❴ ❴ ❴ ❴ ✙ ✙ ❴
❴ ❴ ✤✤
✤✤
✤
✤
❴ ❴ ✤
❴ ❴ ✤✤
✤✤
✤
✤
❴ ❴ ✤
Figure 4.15. Quantum teleportation circuit in which measurements are done at the end, instead of in the middle of
the circuit. As in Figure 1.13, the top two qubits belong to Alice, and the bottom one to Bob.
Principle of implicit measurement: Without loss of generality, any
unterminated quantum wires (qubits which are not measured) at the end of a
quantum circuit may be assumed to be measured.
To understand why this is true, imagine you have a quantum circuit containing just
two qubits, and only the first qubit is measured at the end of the circuit. Then the
measurement statistics observed at this time are completely determined by the reduced
density matrix of the first qubit. However, if a measurement had also been performed on
the second qubit, then it would be highly surprising if that measurement could change
the statistics of measurement on the first qubit. You’ll prove this in Exercise 4.32 by
showing that the reduced density matrix of the first qubit is not affected by performing
a measurement on the second.
As you consider the role of measurements in quantum circuits, it is important to
keep in mind that in its role as an interface between the quantum and classical worlds,
measurement is generally considered to be an irreversible operation, destroying quantum
information and replacing it with classical information. In certain carefully designed cases,
however, this need not be true, as is vividly illustrated by teleportation and quantum
error-correction (Chapter 10). What teleportation and quantum error-correction have in
common is that in neither instance does the measurement result reveal any information
about the identity of the quantum state being measured. Indeed, we will see in Chapter 10
that this is a more general feature of measurement – in order for a measurement to be
reversible, it must reveal no information about the quantum state being measured!
Exercise 4.32: Suppose ρ is the density matrix describing a two qubit system.
Suppose we perform a projective measurement in the computational basis of the
second qubit. Let P0 = |00| and P1 = |11| be the projectors onto the |0 and
|1 states of the second qubit, respectively. Let ρ′ be the density matrix which
would be assigned to the system after the measurement by an observer who did
not learn the measurement result. Show that
ρ′ = P0 ρP0 + P1 ρP1 .
(4.40)
Also show that the reduced density matrix for the first qubit is not affected by
the measurement, that is, tr2 (ρ) = tr2 (ρ′ ).
Exercise 4.33: (Measurement in the Bell basis) The measurement model we have
specified for the quantum circuit model is that measurements are performed only
188
Quantum circuits
in the computational basis. However, often we want to perform a measurement
in some other basis, defined by a complete set of orthonormal states. To perform
this measurement, simply unitarily transform from the basis we wish to perform
the measurement in to the computational basis, then measure. For example,
show that the circuit
❴✤✤ ❴ ❴ ❴ ❴ ❴ L
✤✤
✙✙
✤✤
✙✙
✤❴ ❴ ❴ ❴ ❴ ✙ ✙ ❴
❴✤✤ ❴ ❴ ❴ ❴ ❴ L
✙✙
✤✤✤
✙✙
✤✤❴ ❴ ❴ ❴ ❴✙ ✙ ❴
❴ ❴ ✤✤
✤✤
✤
✤
❴ ❴ ✤
❴ ❴ ✤✤
✤✤
✤
✤
❴ ❴ ✤
performs a measurement in the basis of the Bell states. More precisely, show that
this circuit results in a measurement being performed with corresponding
POVM elements the four projectors onto the Bell states. What are the
corresponding measurement operators?
Exercise 4.34: (Measuring an operator) Suppose we have a single qubit operator
U with eigenvalues ±1, so that U is both Hermitian and unitary, so it can be
regarded both as an observable and a quantum gate. Suppose we wish to measure
the observable U . That is, we desire to obtain a measurement result indicating
one of the two eigenvalues, and leaving a post-measurement state which is the
corresponding eigenvector. How can this be implemented by a quantum circuit?
Show that the following circuit implements a measurement of U :
|0
H
|ψin
•
❴✤✤ ❴ ❴ ❴ ❴ ❴ L ❴ ❴ ✤✤
✤✤
✤✤
✙✙
✤
✤✤
✙✙
✤❴ ❴ ❴ ❴ ❴✙ ✙ ❴ ❴ ❴ ✤✤
H
|ψout
U
Exercise 4.35: (Measurement commutes with controls) A consequence of the
principle of deferred measurement is that measurements commute with quantum
gates when the qubit being measured is a control qubit, that is:
•
U
❴✤✤ ❴ ❴ ❴ ❴ ❴ L ❴ ❴
✤✤
✙✙
✤✤
✙✙
✤❴ ❴ ❴ ❴ ❴ ✙ ✙ ❴ ❴ ❴
✤✤✤
✤✤✤
✤
=
❴✤✤ ❴ ❴ ❴ ❴ ❴ L ❴ ❴ ✤✤
✤✤
✤✤
✙✙
✤
✤✤
✙✙
✤❴ ❴ ❴ ❴ ❴✙ ✙ ❴ ❴ ❴ ✤✤
•
U
=
❴✤✤ ❴ ❴ ❴ ❴ ❴ L ❴ ❴ ✤✤
✤✤
✤✤
✙✙
✤
✤✤
✙✙
✤❴ ❴ ❴ ❴ ❴✙ ✙ ❴ ❴ ❴ ✤✤
U
(Recall that the double lines represent classical bits in this diagram.) Prove the
first equality. The rightmost circuit is simply a convenient notation to depict the
use of a measurement result to classically control a quantum gate.
4.5 Universal quantum gates
A small set of gates (e.g.
,
,
) can be used to compute an arbitrary classical
function, as we saw in Section 3.1.2. We say that such a set of gates is universal for classical computation. In fact, since the Toffoli gate is universal for classical computation,
quantum circuits subsume classical circuits. A similar universality result is true for quantum computation, where a set of gates is said to be universal for quantum computation
if any unitary operation may be approximated to arbitrary accuracy by a quantum circuit
189
Universal quantum gates
involving only those gates. We now describe three universality constructions for quantum
computation. These constructions build upon each other, and culminate in a proof that
any unitary operation can be approximated to arbitrary accuracy using Hadamard, phase,
, and π/8 gates. You may wonder why the phase gate appears in this list, since it
can be constructed from two π/8 gates; it is included because of its natural role in the
fault-tolerant constructions described in Chapter 10.
The first construction shows that an arbitrary unitary operator may be expressed exactly as a product of unitary operators that each acts non-trivially only on a subspace
spanned by two computational basis states. The second construction combines the first
construction with the results of the previous section to show that an arbitrary unitary
gates. The third conoperator may be expressed exactly using single qubit and
struction combines the second construction with a proof that single qubit operation may
be approximated to arbitrary accuracy using the Hadamard, phase, and π/8 gates. This in
turn implies that any unitary operation can be approximated to arbitrary accuracy using
, and π/8 gates.
Hadamard, phase,
Our constructions say little about efficiency – how many (polynomially or exponentially many) gates must be composed in order to create a given unitary transform. In
Section 4.5.4 we show that there exist unitary transforms which require exponentially
many gates to approximate. Of course, the goal of quantum computation is to find interesting families of unitary transformations that can be performed efficiently.
Exercise 4.36: Construct a quantum circuit to add two two-bit numbers x and y
modulo 4. That is, the circuit should perform the transformation
|x, y → |x, x + y mod 4.
4.5.1 Two-level unitary gates are universal
Consider a unitary matrix U which acts on a d-dimensional Hilbert space. In this section
we explain how U may be decomposed into a product of two-level unitary matrices;
that is, unitary matrices which act non-trivially only on two-or-fewer vector components.
The essential idea behind this decomposition may be understood by considering the case
when U is 3×3, so suppose that U has the form
⎡
⎤
a d g
U = ⎣ b e h .
c f j
(4.41)
We will find two-level unitary matrices U1 , . . . , U3 such that
U3 U2 U1 U = I .
(4.42)
U = U1†U2† U3† .
(4.43)
It follows that
U1 , U2 and U3 are all two-level unitary matrices, and it is easy to see that their inverses,
U1† , U2† and U3† are also two-level unitary matrices. Thus, if we can demonstrate (4.42),
then we will have shown how to break U up into a product of two-level unitary matrices.
190
Quantum circuits
Use the following procedure to construct
⎡
1
U1 ≡ ⎣ 0
0
If b = 0 then set
⎡
⎢
U1 ≡ ⎢
⎣
a∗
|a|2 +|b|2
√ b2 2
|a| +|b|
√
0
Note that in either case U1 is a two-level
matrices out we get
⎡ ′
a
⎢
U1 U = ⎣ 0
′
c
U1 : if b = 0 then set
⎤
0 0
1 0 .
0 1
b∗
|a|2 +|b|2
√ −a
|a|2 +|b|2
√
0
(4.44)
⎤
⎥
0 ⎥
.
1
0
(4.45)
unitary matrix, and when we multiply the
′
d
′
e
′
f
⎤
′
g
′ ⎥
h .
′
j
(4.46)
The key point to note is that the middle entry in the left hand column is zero. We denote
the other entries in the matrix with a generic prime ′ ; their actual values do not matter.
Now apply a similar procedure to find a two-level matrix U2 such that U2 U1 U has no
′
entry in the bottom left corner. That is, if c = 0 we set
⎤
⎡ ′∗
a
0 0
⎥
⎢
(4.47)
U2 ≡ ⎣ 0 1 0 ,
0 0 1
′
while if c = 0 then we set
⎡
⎢
⎢
U2 ≡ ⎢
⎣
√
′∗
a
|a′ |2 +|c′ |2
0
√
0
1
′
c
|a′ |2 +|c′ |2
√
0
c
′∗
|a′ |2 +|c′ |2
0
√
′
−a
|a′ |2 +|c′ |2
⎤
⎥
⎥
⎥.
In either case, when we carry out the matrix multiplication we find that
⎡
⎤
′′
′′
1 d
g
′′
′′ ⎥
⎢
U2 U1 U = ⎣ 0 e
h .
′′
′′
0 f
j
(4.48)
(4.49)
′′
′′
Since U, U1 and U2 are unitary, it follows that U2 U1 U is unitary, and thus d = g = 0,
since the first row of U2 U1 U must have norm 1. Finally, set
⎡
⎤
1 0
0
′′ ∗
′′ ∗ ⎥
⎢
f
(4.50)
U3 ≡ ⎣ 0 e
.
∗
′′
′′ ∗
0 h
j
It is now easy to verify that U3 U2 U1 U = I, and thus U = U1† U2† U3†, which is a decomposition of U into two-level unitaries.
More generally, suppose U acts on a d-dimensional space. Then, in a similar fashion
to the 3×3 case, we can find two-level unitary matrices U1 , . . . , Ud−1 such that the matrix
Universal quantum gates
191
Ud−1 Ud−2 . . . U1 U has a one in the top left hand corner, and all zeroes elsewhere in the
first row and column. We then repeat this procedure for the d − 1 by d − 1 unitary
submatrix in the lower right hand corner of Ud−1 Ud−2 . . . U1 U , and so on, with the end
result that an arbitrary d×d unitary matrix may be written
U = V1 . . . Vk ,
(4.51)
where the matrices Vi are two-level unitary matrices, and k ≤ (d − 1) + (d − 2) + · · · + 1 =
d(d − 1)/2.
Exercise 4.37: Provide a decomposition of the transform
⎤
1 1
1
1
1⎢
i −1 −i ⎥
⎥
⎢1
⎣
2 1 −1 1 −1
1 −i −1 i
⎡
(4.52)
into a product of two-level unitaries. This is a special case of the quantum
Fourier transform, which we study in more detail in the next chapter.
A corollary of the above result is that an arbitrary unitary matrix on an n qubit system
may be written as a product of at most 2n−1 (2n − 1) two-level unitary matrices. For
specific unitary matrices, it may be possible to find much more efficient decompositions,
but as you will now show there exist matrices which cannot be decomposed as a product
of fewer than d − 1 two-level unitary matrices!
Exercise 4.38: Prove that there exists a d×d unitary matrix U which cannot be
decomposed as a product of fewer than d − 1 two-level unitary matrices.
gates are universal
4.5.2 Single qubit and
We have just shown that an arbitrary unitary matrix on a d-dimensional Hilbert space
may be written as a product of two-level unitary matrices. Now we show that single
gates together can be used to implement an arbitrary two-level unitary
qubit and
operation on the state space of n qubits. Combining these results we see that single qubit
and
gates can be used to implement an arbitrary unitary operation on n qubits,
and therefore are universal for quantum computation.
Suppose U is a two-level unitary matrix on an n qubit quantum computer. Suppose
in particular that U acts non-trivially on the space spanned by the computational basis
states |s and |t, where s = s1 . . . sn and t = t1 . . . tn are the binary expansions for s
and t. Let Ũ be the non-trivial 2×2 unitary submatrix of U ; Ũ can be thought of as a
unitary operator on a single qubit.
Our immediate goal is to construct a circuit implementing U , built from single qubit
gates. To do this, we need to make use of Gray codes. Suppose we have
and
distinct binary numbers, s and t. A Gray code connecting s and t is a sequence of binary
numbers, starting with s and concluding with t, such that adjacent members of the list
differ in exactly one bit. For instance, with s = 101001 and t = 110011 we have the Gray
192
Quantum circuits
code
1
1
1
1
0
0
0
1
1
1
0
0
0
0
0
0
0
1
1
1
1
1
1
1
(4.53)
Let g1 through gm be the elements of a Gray code connecting s and t, with g1 = s and
gm = t. Note that we can always find a Gray code such that m ≤ n + 1 since s and t can
differ in at most n locations.
The basic idea of the quantum circuit implementing U is to perform a sequence of gates
effecting the state changes |g1 → |g2 → . . . → |gm−1 , then to perform a controlled-Ũ
operation, with the target qubit located at the single bit where gm−1 and gm differ, and
then to undo the first stage, transforming |gm−1 → |gm−2 → . . . → |g1 . Each of these
steps can be easily implemented using operations developed earlier in this chapter, and
the final result is an implementation of U .
A more precise description of the implementation is as follows. The first step is to swap
the states |g1 and |g2 . Suppose g1 and g2 differ at the ith digit. Then we accomplish
the swap by performing a controlled bit flip on the ith qubit, conditional on the values
of the other qubits being identical to those in both g1 and g2 . Next we use a controlled
operation to swap |g2 and |g3 . We continue in this fashion until we swap |gm−2 with
|gm−1 . The effect of this sequence of m − 2 operations is to achieve the operation
|g1 → |gm−1
|g2 → |g1
(4.54)
(4.55)
|g3 → |g2
(4.56)
|gm−1 → |gm−2 .
(4.57)
... ... ...
All other computational basis states are left unchanged by this sequence of operations.
Next, suppose gm−1 and gm differ in the jth bit. We apply a controlled-Ũ operation
with the jth qubit as target, conditional on the other qubits having the same values as
appear in both gm and gm−1 . Finally, we complete the U operation by undoing the swap
operations: we swap |gm−1 with |gm−2 , then |gm−2 with |gm−3 and so on, until we
swap |g2 with |g1 .
A simple example illuminates the procedure further. Suppose we wish to implement
the two-level unitary transformation
⎡
⎤
a 0 0 0 0 0 0 c
⎢ 0 1 0 0 0 0 0 0 ⎥
⎢
⎥
⎢ 0 0 1 0 0 0 0 0 ⎥
⎢
⎥
⎢
⎥
⎢ 0 0 0 1 0 0 0 0 ⎥
(4.58)
U =⎢
⎥.
⎢ 0 0 0 0 1 0 0 0 ⎥
⎢
⎥
⎢ 0 0 0 0 0 1 0 0 ⎥
⎢
⎥
⎣ 0 0 0 0 0 0 1 0
b 0 0 0 0 0 0 d
a c
is a unitary matrix.
Here, a, b, c and d are any complex numbers such that Ũ ≡
b d
193
Universal quantum gates
Notice that U acts non-trivially only on the states |000 and |111. We write a Gray code
connecting 000 and 111:
A B C
0 0 0
0 0 1 .
0 1 1
1 1 1
(4.59)
From this we read off the required circuit, shown in Figure 4.16. The first two gates
shuffle the states so that |000 gets swapped with |011. Next, the operation Ũ is applied
to the first qubit of the states |011 and |111, conditional on the second and third qubits
being in the state |11. Finally, we unshuffle the states, ensuring that |011 gets swapped
back with the state |000.
Figure 4.16. Circuit implementing the two-level unitary operation defined by (4.58).
Returning to the general case, we see that implementing the two-level unitary operation
U requires at most 2(n − 1) controlled operations to swap |g1 with |gm−1 and then back
again. Each of these controlled operations can be realized using O(n) single qubit and
gates; the controlled-Ũ operation also requires O(n) gates. Thus, implementing
gates. We saw in the previous section that an
U requires O(n2 ) single qubit and
arbitrary unitary matrix on the 2n -dimensional state space of n qubits may be written as
a product of O(22n ) = O(4n ) two-level unitary operations. Combining these results, we
see that an arbitrary unitary operation on n qubits can be implemented using a circuit
gates. Obviously, this construction does not
containing O(n2 4n ) single qubit and
provide terribly efficient quantum circuits! However, we show in Section 4.5.4 that the
construction is close to optimal in the sense that there are unitary operations that require
an exponential number of gates to implement. Thus, to find fast quantum algorithms we
will clearly need a different approach than is taken in the universality construction.
Exercise 4.39: Find a quantum circuit using single qubit operations and
implement the transformation
⎤
⎡
1 0 0 0 0 0 0 0
⎢ 0 1 0 0 0 0 0 0 ⎥
⎥
⎢
⎢ 0 0 a 0 0 0 0 c ⎥
⎥
⎢
⎥
⎢
⎢ 0 0 0 1 0 0 0 0 ⎥
⎥,
⎢
⎢ 0 0 0 0 1 0 0 0 ⎥
⎥
⎢
⎢ 0 0 0 0 0 1 0 0 ⎥
⎥
⎢
⎣ 0 0 0 0 0 0 1 0
0 0 b 0 0 0 0 d
s to
(4.60)
194
Quantum circuits
where Ũ =
a c
b d
is an arbitrary 2×2 unitary matrix.
4.5.3 A discrete set of universal operations
and single qubit unitaries together form
In the previous section we proved that the
a universal set for quantum computation. Unfortunately, no straightforward method is
known to implement all these gates in a fashion which is resistant to errors. Fortunately,
in this section we’ll find a discrete set of gates which can be used to perform universal
quantum computation, and in Chaper 10 we’ll show how to perform these gates in an
error-resistant fashion, using quantum error-correcting codes.
Approximating unitary operators
Obviously, a discrete set of gates can’t be used to implement an arbitrary unitary operation
exactly, since the set of unitary operations is continuous. Rather, it turns out that a
discrete set can be used to approximate any unitary operation. To understand how this
works, we first need to study what it means to approximate a unitary operation. Suppose
U and V are two unitary operators on the same state space. U is the target unitary operator
that we wish to implement, and V is the unitary operator that is actually implemented
in practice. We define the error when V is implemented instead of U by
E(U, V ) ≡ max (U − V )|ψ,
(4.61)
|ψ
where the maximum is over all normalized quantum states |ψ in the state space. In
Box 4.1 on page 195 we show that this measure of error has the interpretation that if
E(U, V ) is small, then any measurement performed on the state V |ψ will give approximately the same measurement statistics as a measurement of U |ψ, for any initial state
|ψ. More precisely, we show that if M is a POVM element in an arbitrary POVM, and
PU (or PV ) is the probability of obtaining this outcome if U (or V ) were performed with
a starting state |ψ, then
|PU − PV | ≤ 2E(U, V ) .
(4.62)
Thus, if E(U, V ) is small, then measurement outcomes occur with similar probabilities,
regardless of whether U or V were performed. Also shown in Box 4.1 is that if we
perform a sequence of gates V1 , . . . , Vm intended to approximate some other sequence
of gates U1 , . . . , Um , then the errors add at most linearly,
m
E(Um Um−1 . . . U1 , Vm Vm−1 . . . V1 ) ≤
E(Uj , Vj ) .
(4.63)
j=1
The approximation results (4.62) and (4.63) are extremely useful. Suppose we wish
to perform a quantum circuit containing m gates, U1 through Um . Unfortunately, we
are only able to approximate the gate Uj by the gate Vj . In order that the probabilities
of different measurement outcomes obtained from the approximate circuit be within a
tolerance Δ > 0 of the correct probabilities, it suffices that E(Uj , Vj ) ≤ Δ/(2m), by the
results (4.62) and (4.63).
+ π/8 gates
Universality of Hadamard + phase +
We’re now in a good position to study the approximation of arbitrary unitary operations
by discrete sets of gates. We’re going to consider two different discrete sets of gates, both
195
Universal quantum gates
Box 4.1: Approximating quantum circuits
Suppose a quantum system starts in the state |ψ, and we perform either the unitary
operation U , or the unitary operation V . Following this, we perform a measurement.
Let M be a POVM element associated with the measurement, and let PU (or PV )
be the probability of obtaining the corresponding measurement outcome if the
operation U (or V ) was performed. Then
+
+
|PU − PV | = +ψ|U † M U |ψ − ψ|V † M V |ψ+ .
(4.64)
Let |Δ ≡ (U − V )|ψ. Simple algebra and the Cauchy–Schwarz inequality show
that
+
+
|PU − PV | = +ψ|U † M |Δ + Δ|M V |ψ+ .
(4.65)
≤ |ψ|U † M |Δ| + |Δ|M V |ψ|
≤ |Δ + |Δ
≤ 2E(U, V ).
(4.66)
(4.67)
(4.68)
The inequality |PU − PV | ≤ 2E(U, V ) gives quantitative expression to the idea
that when the error E(U, V ) is small, the difference in probabilities between measurement outcomes is also small.
Suppose we perform a sequence V1 , V2 , . . . , Vm of gates intended to approximate
some other sequence of gates, U1 , U2 , . . . , Um . Then it turns out that the error
caused by the entire sequence of imperfect gates is at most the sum of the errors
in the individual gates,
m
E(Um Um−1 . . . U1 , Vm Vm−1 . . . V1 ) ≤
E(Uj , Vj ).
(4.69)
j=1
To prove this we start with the case m = 2. Note that for some state |ψ we have
E(U2 U1 , V2 V1 ) = (U2 U1 − V2 V1 )|ψ
= (U2 U1 − V2 U1 )|ψ + (V2 U1 − V2 V1 )|ψ.
(4.70)
(4.71)
Using the triangle inequality |a + |b ≤ |a + |b, we obtain
E(U2 U1 , V2 V1 ) ≤ (U2 − V2 )U1 |ψ + V2 (U1 − V1 )|ψ
≤ E(U2 , V2 ) + E(U1 , V1 ),
(4.72)
(4.73)
which was the desired result. The result for general m follows by induction.
of which are universal. The first set, the standard set of universal gates, consists of the
and π/8 gates. We provide fault-tolerant constructions
Hadamard, phase, controlledfor these gates in Chapter 10; they also provide an exceptionally simple universality
construction. The second set of gates we consider consists of the Hadamard gate, phase
gate, and the Toffoli gate. These gates can also all be done faultgate, the controlledtolerantly; however, the universality proof and fault-tolerance construction for these gates
is a little less appealing.
We begin the universality proof by showing that the Hadamard and π/8 gates can be
196
Quantum circuits
used to approximate any single qubit unitary operation to arbitrary accuracy. Consider
the gates T and HT H. T is, up to an unimportant global phase, a rotation by π/4 radians
around the ẑ axis on the Bloch sphere, while HT H is a rotation by π/4 radians around
the x̂ axis on the Bloch sphere (Exercise 4.14). Composing these two operations gives,
up to a global phase,
π
π
π
π
π
π
exp −i Z exp −i X = cos I − i sin Z cos I − i sin X (4.74)
8
8
8
8
8
8
π
π
π
π
= cos2 I − i cos (X + Z) + sin Y sin .
8
8
8
8
(4.75)
This is a rotation of the Bloch sphere about an axis along n = (cos π8 , sin π8 , cos π8 ) with
corresponding unit vector n̂, and through an angle θ defined by cos(θ/2) ≡ cos2 π8 . That
is, using only the Hadamard and π/8 gates we can construct Rn̂ (θ). Moreover, this θ
can be shown to be an irrational multiple of 2π. Proving this latter fact is a little beyond
our scope; see the end of chapter ‘History and further reading’.
Next, we show that repeated iteration of Rn̂ (θ) can be used to approximate to arbitrary
accuracy any rotation Rn̂ (α). To see this, let δ > 0 be the desired accuracy, and let N be
an integer larger than 2π/δ. Define θk so that θk ∈ [0, 2π) and θk = (kθ)mod 2π. Then
the pigeonhole principle implies that there are distinct j and k in the range 1, . . . , N such
that |θk − θj | ≤ 2π/N < δ. Without loss of generality assume that k > j, so we have
|θk−j | < δ. Since j = k and θ is an irrational multiple of 2π we must have θk−j = 0. It
follows that the sequence θl(k−j) fills up the interval [0, 2π) as l is varied, so that adjacent
members of the sequence are no more than δ apart. It follows that for any ǫ > 0 there
exists an n such that
ǫ
E(Rn̂ (α), Rn̂ (θ)n ) < .
(4.76)
3
Exercise 4.40: For arbitrary α and β show that
E(Rn̂ (α), Rn̂ (α + β)) = |1 − exp(iβ/2)| ,
(4.77)
and use this to justify (4.76).
We are now in position to verify that any single qubit operation can be approximated to
arbitrary accuracy using the Hadamard and π/8 gates. Simple algebra implies that for
any α
HRn̂ (α)H = Rm̂ (α) ,
(4.78)
where m̂ is a unit vector in the direction (cos π8 , − sin π8 , cos π8 ), from which it follows
that
ǫ
E(Rm̂ (α), Rm̂ (θ)n ) < .
(4.79)
3
But by Exercise 4.11 an arbitrary unitary U on a single qubit may be written as
U = Rn̂ (β)Rm̂ (γ)Rn̂ (δ),
(4.80)
up to an unimportant global phase shift. The results (4.76) and (4.79), together with the
Universal quantum gates
197
chaining inequality (4.63) therefore imply that for suitable positive integers n1 , n2 , n3 ,
E(U, Rn̂ (θ)n1 HRn̂ (θ)n2 HRn̂ (θ)n3 ) < ǫ .
(4.81)
That is, given any single qubit unitary operator U and any ǫ > 0 it is possible to
approximate U to within ǫ using a circuit composed of Hadamard gates and π/8 gates
alone.
Since the π/8 and Hadamard gates allow us to approximate any single qubit unitary operator, it follows from the arguments of Section 4.5.2 that we can approximate
any m gate quantum circuit, as follows. Given a quantum circuit containing m gates,
either
s or single qubit unitary gates, we may approximate it using Hadamard,
and π/8 gates (later, we will find that phase gates make it possible to do
controlledthe appoximation fault-tolerantly, but for the present universality argument they are not
strictly necessary). If we desire an accuracy of ǫ for the entire circuit, then this may be
achieved by approximating each single qubit unitary using the above procedure to within
ǫ/m and applying the chaining inequality (4.63) to obtain an accuracy of ǫ for the entire
circuit.
How efficient is this procedure for approximating quantum circuits using a discrete
set of gates? This is an important question. Suppose, for example, that approximating
an arbitrary single qubit unitary to within a distance ǫ were to require Ω(21/ǫ ) gates
from the discrete set. Then to approximate the m gate quantum circuit considered in
the previous paragraph would require Ω(m2m/ǫ ) gates, an exponential increase over
the original circuit size! Fortunately, the rate of convergence is much better than this.
Intuitively, it is plausible that the sequence of angles θk ‘fills in’ the interval [0, 2π) in a
more or less uniform fashion, so that to approximate an arbitrary single qubit gate ought
to take roughly Θ(1/ǫ) gates from the discrete set. If we use this estimate for the number
of gates required to approximate an arbitrary single qubit gate, then the number required
to approximate an m gate circuit to accuracy ǫ becomes Θ(m2 /ǫ). This is a quadratic
increase over the original size of the circuit, m, which for many applications may be
sufficient.
Rather remarkably, however, a much faster rate of convergence can be proved. The
Solovay–Kitaev theorem, proved in Appendix 3, implies that an arbitrary single qubit
gate may be approximated to an accuracy ǫ using O(log c (1/ǫ)) gates from our discrete set,
where c is a constant approximately equal to 2. The Solovay–Kitaev theorem therefore
implies that to approximate a circuit containing m
s and single qubit unitaries to
c
an accuracy ǫ requires O(m log (m/ǫ)) gates from the discrete set, a polylogarithmic
increase over the size of the original circuit, which is likely to be acceptable for virtually
all applications.
To sum up, we have shown that the Hadamard, phase, controlledand π/8 gates
s
are universal for quantum computation in the sense that given a circuit containing
and arbitrary single qubit unitaries it is possible to simulate this circuit to good accuracy
using only this discrete set of gates. Moreover, the simulation can be performed efficiently, in the sense that the overhead required to perform the simulation is polynomial
in log(m/ǫ), where m is the number of gates in the original circuit, and ǫ is the desired
accuracy of the simulation.
Exercise 4.41: This and the next two exercises develop a construction showing that
and Toffoli gates are universal. Show that
the Hadamard, phase, controlled-
198
Quantum circuits
❴✤✤ ❴ ❴ ❴ ❴ ❴ L ❴ ❴ ✤✤
✤✤
✤✤
✙✙
✤✤
✤
✙✙
✤❴ ❴ ❴ ❴ ❴✙ ✙ ❴ ❴ ❴ ✤✤
❴✤✤ ❴ ❴ ❴ ❴ ❴ L ❴ ❴ ✤✤
✤✤
✤✤
✙✙
✤
✤✤
✙✙
✤❴ ❴ ❴ ❴ ❴✙ ✙ ❴ ❴ ❴ ✤✤
Figure 4.17. Provided both measurement outcomes are 0 this circuit applies Rz (θ) to the target, where
cos θ = 3/5. If some other measurement outcome occurs then the circuit applies Z to the target.
the circuit in Figure 4.17 applies the operation Rz (θ) to the third (target) qubit if
the measurement outcomes are both 0, where cos θ = 3/5, and otherwise applies
Z to the target qubit. Show that the probability of both measurement outcomes
being 0 is 5/8, and explain how repeated use of this circuit and Z = S 2 gates
may be used to apply a Rz (θ) gate with probability approaching 1.
Exercise 4.42: (Irrationality of θ) Suppose cos θ = 3/5. We give a proof by
contradiction that θ is an irrational multiple of 2π.
(1) Using the fact that eiθ = (3 + 4i)/5, show that if θ is rational, then there
must exist a positive integer m such that (3 + 4i)m = 5m .
(2) Show that (3 + 4i)m = 3 + 4i (mod 5) for all m > 0, and conclude that no m
such that (3 + 4i)m = 5m can exist.
Exercise 4.43: Use the results of the previous two exercises to show that the
Hadamard, phase, controlledand Toffoli gates are universal for quantum
computation.
Exercise 4.44: Show that the three qubit gate G defined by the circuit:
•
•
iRx πα
is universal for quantum computation whenever α is irrational.
Exercise 4.45: Suppose U is a unitary transform implemented by an n qubit quantum
and Toffoli gates. Show that U is of the
circuit constructed from H, S,
form 2−k/2 M , for some integer k, where M is a 2n ×2n matrix with only
complex integer entries. Repeat this exercise with the Toffoli gate replaced by
the π/8 gate.
4.5.4 Approximating arbitrary unitary gates is generically hard
We’ve seen that any unitary transformation on n qubits can be built up out of a small set
of elementary gates. Is it always possible to do this efficiently? That is, given a unitary
transformation U on n qubits does there always exist a circuit of size polynomial in n
approximating U ? The answer to this question turns out to be a resounding no: in fact,
most unitary transformations can only be implemented very inefficiently. One way to see
Universal quantum gates
199
this is to consider the question: how many gates does it take to generate an arbitrary state
of n qubits? A simple counting argument shows that this requires exponentially many
operations, in general; it immediately follows that there are unitary operations requiring
exponentially many operations. To see this, suppose we have g different types of gates
available, and each gate works on at most f input qubits. These numbers, f and g,
are fixed by the computing hardware we have available, and may be considered to be
constants. Suppose we have a quantum circuit containing m gates, starting from the
computational basis state |0⊗n . For any particular gate in the circuit there are therefore
g
n
at most
= O(nf g ) possible choices. It follows that at most O(nf gm ) different
f
states may be computed using m gates.
Figure 4.18. Visualization of covering the set of possible states with patches of constant radius.
Suppose we wish to approximate a particular state, |ψ, to within a distance ǫ. The idea
of the proof is to cover the set of all possible states with a collection of ‘patches,’ each of
radius ǫ (Figure 4.18), and then to show that the number of patches required rises doubly
exponentially in n; comparing with the exponential number of different states that may
be computed using m gates will imply the result. The first observation we need is that the
space of state vectors of n qubits can be regarded as just the unit (2n+1 −1)-sphere. To see
this, suppose the n qubit state has amplitudes ψj = Xj + iYj , where Xj and Yj are the
real and imaginary parts, respectively, of the jth amplitude. The normalization condition
for quantum states can be written j (Xj2 + Yj2 ) = 1, which is just the condition for a
point to be on the unit sphere in 2n+1 real dimensions, that is, the unit (2n+1 − 1)-sphere.
Similarly, the surface area of radius ǫ near |ψ is approximately the same as the volume
of a (2n+1 − 2)-sphere of radius ǫ. Using the formula Sk (r) = 2π (k+1)/2 rk /Γ((k + 1)/2) for
the surface area of a k-sphere of radius r, and Vk (r) = 2π (k+1)/2 rk+1 /[(k + 1)Γ((k + 1)/2)]
for the volume of a k-sphere of radius r, we see that the number of patches needed to
200
Quantum circuits
cover the state space goes like
S2n+1 −1 (1)
=
V2n+1 −2 (ǫ)
√
πΓ(2n − 12 )(2n+1 − 1)
,
Γ(2n )ǫ2n+1 −1
(4.82)
where Γ is the usual generalization of the factorial function. But Γ(2n −1/2) ≥ Γ(2n )/2n ,
so the number of patches required to cover the space is at least
1
(4.83)
Ω 2n+1 −1 .
ǫ
Recall that the number of patches which can be reached in m gates is O(nf gm ), so in
order to reach all the ǫ-patches we must have
1
(4.84)
O nf gm ≥ Ω 2n+1 −1
ǫ
which gives us
m=Ω
2n log(1/ǫ)
.
log(n)
(4.85)
That is, there are states of n qubits which take Ω(2n log(1/ǫ)/ log(n)) operations to
approximate to within a distance ǫ. This is exponential in n, and thus is ‘difficult’,
in the sense of computational complexity introduced in Chapter 3. Furthermore, this
immediately implies that there are unitary transformations U on n qubits which take
Ω(2n log(1/ǫ)/ log(n)) operations to approximate by a quantum circuit implementing an
operation V such that E(U, V ) ≤ ǫ. By contrast, using our universality constructions
and the Solovay–Kitaev theorem it follows that an arbitrary unitary operation U on n
qubits may be approximated to within a distance ǫ using O(n2 4n logc (n2 4n /ǫ)) gates.
Thus, to within a polynomial factor the construction for universality we have given is
optimal; unfortunately, it does not address the problem of determining which families of
unitary operations can be computed efficiently in the quantum circuits model.
4.5.5 Quantum computational complexity
In Chapter 3 we described a theory of ‘computational complexity’ for classical computers that classified the resource requirements to solve computational problems on classical computers. Not surprisingly there is considerable interest in developing a theory of
quantum computational complexity, and relating it to classical computational complexity
theory. Although only first steps have been taken in this direction, it will doubtless be
an enormously fruitful direction for future researchers. We content ourselves with presenting one result about quantum complexity classes, relating the quantum complexity
class BQP to the classical complexity class PSPACE. Our discussion of this result is
rather informal; for more details you are referred to the paper of Bernstein and Vazirani
referenced in the end of chapter ‘History and further reading’.
Recall that PSPACE was defined in Chapter 3 as the class of decision problems which
can be solved on a Turing machine using space polynomial in the problem size and an
arbitrary amount of time. BQP is an essentially quantum complexity class consisting
of those decision problems that can be solved with bounded probability of error using
a polynomial size quantum circuit. Slightly more formally, we say a language L is in
BQP if there is a family of polynomial size quantum circuits which decides the language,
Universal quantum gates
201
accepting strings in the language with probability at least 3/4, and rejecting strings which
aren’t in the language with probability at least 3/4. In practice, what this means is that
the quantum circuit takes as input binary strings, and tries to determine whether they are
elements of the language or not. At the conclusion of the circuit one qubit is measured,
with 0 indicating that the string has been accepted, and 1 indicating rejection. By testing
the string a few times to determine whether it is in L, we can determine with very high
probability whether a given string is in L.
Of course, a quantum circuit is a fixed entity, and any given quantum circuit can only
decide whether strings up to some finite length are in L. For this reason, we use an
entire family of circuits in the definition of BQP; for every possible input length there is
a different circuit in the family. We place two restrictions on the circuit in addition to the
acceptance / rejection criterion already described. First, the size of the circuits should
only grow polynomially with the size of the input string x for which we are trying to
determine whether x ∈ L. Second, we require that the circuits be uniformly generated,
in a sense similar to that described in Section 3.1.2. This uniformity requirement arises
because, in practice, given a string x of some length n, somebody will have to build
a quantum circuit capable of deciding whether x is in L. To do so, they will need to
have a clear set of instructions – an algorithm – for building the circuit. For this reason,
we require that our quantum circuits be uniformly generated, that is, there is a Turing
machine capable of efficiently outputting a description of the quantum circuit. This
restriction may seem rather technical, and in practice is nearly always satisfied trivially,
but it does save us from pathological examples such as that described in Section 3.1.2.
(You might also wonder if it matters whether the Turing machine used in the uniformity
requirement is a quantum or classical Turing machine; it turns out that it doesn’t matter
– see ‘History and further reading’.)
One of the most significant results in quantum computational complexity is that BQP
⊆ PSPACE. It is clear that BPP ⊆ BQP, where BPP is the classical complexity class
of decision problems which can be solved with bounded probability of error using polynomial time on a classical Turing machine. Thus we have the chain of inclusions BPP
⊆ BQP ⊆ PSPACE. Proving that BQP = BPP – intuitively the statement that quantum computers are more powerful than classical computers – will therefore imply that
BPP = PSPACE. However, it is not presently known whether BPP = PSPACE,
and proving this would represent a major breakthrough in classical computer science! So
proving that quantum computers are more powerful than classical computers would have
some very interesting implications for classical computational complexity! Unfortunately,
it also means that providing such a proof may be quite difficult.
Why is it that BQP ⊆ PSPACE? Here is an intuitive outline of the proof (a rigorous
proof is left to the references in ‘History and further reading’). Suppose we have an n
qubit quantum computer, and do a computation involving a sequence of p(n) gates, where
p(n) is some polynomial in n. Supposing the quantum circuit starts in the state |0 we
will explain how to evaluate in polynomial space on a classical computer the probability
that it ends up in the state |y. Suppose the gates that are executed on the quantum
computer are, in order, U1 , U2 , . . . , Up(n) . Then the probability of ending up in the state
|y is the modulus squared of
y|Up(n) · · · U2 U1 |0 .
(4.86)
This quantity may be estimated in polynomial space on a classical computer. The basic
202
Quantum circuits
idea is to insert the completeness relation
obtaining
y|Up(n) · · · U2 U1 |0 =
x1 ,...,xp(n)−1
x
|xx| = I between each term in (4.86),
y|Up(n) |xp(n)−1 xp(n)−1 |Up(n)−2 . . . U2 |x1 x1 |U1 |0 .
(4.87)
Given that the individual unitary gates appearing in this sum are operations such as the
, and so on, it is clear that each term in the sum can be calculated
Hadamard gate,
to high accuracy using only polynomial space on a classical computer, and thus the sum
as a whole can be calculated using polynomial space, since individual terms in the sum
can be erased after being added to the running total. Of course, this algorithm is rather
slow, since there are exponentially many terms in the sum which need to be calculated
and added to the total; however, only polynomially much space is consumed, and thus
BQP ⊆ PSPACE, as we set out to show.
A similar procedure can be used to simulate an arbitrary quantum computation on
a classical computer, no matter the length of the quantum computation. Therefore, the
class of problems solvable on a quantum computer with unlimited time and space resources is no larger than the class of problems solvable on a classical computer. Stated
another way, this means that quantum computers do not violate the Church–Turing thesis that any algorithmic process can be simulated efficiently using a Turing machine. Of
course, quantum computers may be much more efficient than their classical counterparts,
thereby challenging the strong Church–Turing thesis that any algorithmic process can
be simulated efficiently using a probabilistic Turing machine.
4.6 Summary of the quantum circuit model of computation
In this book the term ‘quantum computer’ is synonymous with the quantum circuit
model of computation. This chapter has provided a detailed look at quantum circuits,
their basic elements, universal families of gates, and some applications. Before we move
on to more sophisticated applications, let us summarize the key elements of the quantum
circuit model of computation:
(1) Classical resources: A quantum computer consists of two parts, a classical part
and a quantum part. In principle, there is no need for the classical part of the
computer, but in practice certain tasks may be made much easier if parts of the
computation can be done classically. For example, many schemes for quantum
error-correction (Chapter 10) are likely to involve classical computations in order to
maximize efficiency. While classical computations can always be done, in principle,
on a quantum computer, it may be more convenient to perform the calculations on
a classical computer.
(2) A suitable state space: A quantum circuit operates on some number, n, of qubits.
The state space is thus a 2n -dimensional complex Hilbert space. Product states of
the form |x1 , . . . , xn , where xi = 0, 1, are known as computational basis states of
the computer. |x denotes a computational basis state, where x is the number
whose binary representation is x1 . . . xn .
(3) Ability to prepare states in the computational basis: It is assumed that any
computational basis state |x1 , . . . , xn can be prepared in at most n steps.
Summary of the quantum circuit model of computation
203
(4) Ability to perform quantum gates: Gates can be applied to any subset of qubits
as desired, and a universal family of gates can be implemented. For example, it
gate to any pair of qubits in the quantum
should be possible to apply the
and π/8 gates form a family of gates from
computer. The Hadamard, phase,
which any unitary operation can be approximated, and thus is a universal set of
gates. Other universal families exist.
(5) Ability to perform measurements in the computational basis:
Measurements may be performed in the computational basis of one or more of the
qubits in the computer.
The quantum circuit model of quantum computation is equivalent to many other
models of computation which have been proposed, in the sense that other models result
in essentially the same resource requirements for the same problems. As a simple example
which illustrates the basic idea, one might wonder whether moving to a design based
on three-level quantum systems, rather than the two-level qubits, would confer any
computational advantage. Of course, although there may be some slight advantage in
using three-level quantum systems (qutrits) over two-level systems, any difference will
be essentially negligible from the theoretical point of view. At a less trivial level, the
‘quantum Turing machine’ model of computation, a quantum generalization of the
classical Turing machine model, has been shown to be equivalent to the model based
upon quantum circuits. We do not consider that model of computation in this book, but
the reader interested in learning more about quantum Turing machines may consult the
references given in the end of chapter ‘History and further reading’.
Despite the simplicity and attraction of the quantum circuit model, it is useful to keep
in mind possible criticisms, modifications, and extensions. For example, it is by no means
clear that the basic assumptions underlying the state space and starting conditions in the
quantum circuit model are justified. Everything is phrased in terms of finite dimensional
state spaces. Might there be anything to be gained by using systems whose state space is
infinite dimensional? Assuming that the starting state of the computer is a computational
basis state is also not necessary; we know that many systems in Nature ‘prefer’ to sit in
highly entangled states of many systems; might it be possible to exploit this preference
to obtain extra computational power? It might be that having access to certain states
allows particular computations to be done much more easily than if we are constrained
to start in the computational basis. Likewise, the ability to efficiently perform entangling
measurements in multi-qubit bases might be as useful as being able to perform just
entangling unitary operations. Indeed, it may be possible to harness such measurements
to perform tasks intractable within the quantum circuit model.
A detailed examination and attempted justification of the physics underlying the quantum circuit model is outside the scope of the present discussion, and, indeed, outside the
scope of present knowledge! By raising these issues we wish to introduce the question
of the completeness of the quantum circuit model, and re-emphasize the fundamental
point that information is physical. In our attempts to formulate models for information
processing we should always attempt to go back to fundamental physical laws. For the
purposes of this book, we shall stay within the quantum circuit model of computation. It
offers a rich and powerful model of computation that exploits the properties of quantum
mechanics to perform amazing feats of information processing, without classical prece-
204
Quantum circuits
dent. Whether physically reasonable models of computation exist which go beyond the
quantum circuit model is a fascinating question which we leave open for you.
4.7 Simulation of quantum systems
Perhaps [...] we need a mathematical theory of quantum automata. [...] the
quantum state space has far greater capacity than the classical one: for a classical system with N states, its quantum version allowing superposition accommodates cN states. When we join two classical systems, their number of states
N1 and N2 are multiplied, and in the quantum case we get the exponential
growth cN1 N2 . [...] These crude estimates show that the quantum behavior of
the system might be much more complex than its classical simulation.
– Yu Manin (1980)[Man80], as translated in [Man99]
The quantum-mechanical computation of one molecule of methane requires 1042
grid points. Assuming that at each point we have to perform only 10 elementary operations, and that the computation is performed at the extremely low
temperature T = 3 × 10−3 K, we would still have to use all the energy produced
on Earth during the last century.
– R. P. Poplavskii (1975)[Pop75], as quoted by Manin
Can physics be simulated by a universal computer? [...] the physical world
is quantum mechanical, and therefore the proper problem is the simulation of
quantum physics [...] the full description of quantum mechanics for a large
system with R particles [...] has too many variables, it cannot be simulated
with a normal computer with a number of elements proportional to R [ ... but
it can be simulated with ] quantum computer elements. [...] Can a quantum
system be probabilistically simulated by a classical (probabilistic, I’d assume)
universal computer? [...] If you take the computer to be the classical kind I’ve
described so far [..] the answer is certainly, No!
– Richard P. Feynman (1982)[Fey82]
Let us close out this chapter by providing an interesting and useful application of the
quantum circuit model. One of the most important practical applications of computation
is the simulation of physical systems. For example, in the engineering design of a new
building, finite element analysis and modeling is used to ensure safety while minimizing
cost. Cars are made lightweight, structurally sound, attractive, and inexpensive, by using
computer aided design. Modern aeronautical engineering depends heavily on computational fluid dynamics simulations for aircraft designs. Nuclear weapons are no longer
exploded (for the most part), but rather, tested by exhaustive computational modeling.
Examples abound, because of the tremendous practical applications of predictive simulations. We begin by describing some instances of the simulation problem, then we present
a quantum algorithm for simulation and an illustrative example, concluding with some
perspective on this application.
4.7.1 Simulation in action
The heart of simulation is the solution of differential equations which capture the physical
laws governing the dynamical behavior of a system. Some examples include Newton’s
Simulation of quantum systems
205
law,
d
dx
m
=F,
dt
dt
(4.88)
· (k ∇
u) = Q
,
−∇
(4.89)
Poisson’s equation,
the electromagnetic vector wave equation,
2
E
= ǫ0 μ0 ∂ E ,
·∇
∇
∂t2
(4.90)
and the diffusion equation,
2 ψ = 1 ∂ψ ,
(4.91)
∇
a2 ∂t
just to name a very few. The goal is generally: given an initial state of the system,
what is the state at some other time and/or position? Solutions are usually obtained by
approximating the state with a digital representation, then discretizing the differential
equation in space and time such that an iterative application of a procedure carries the
state from the initial to the final conditions. Importantly, the error in this procedure is
bounded, and known not to grow faster than some small power of the number of iterations.
Furthermore, not all dynamical systems can be simulated efficiently: generally, only those
systems which can be described efficiently can be simulated efficiently.
Simulation of quantum systems by classical computers is possible, but generally only
very inefficiently. The dynamical behavior of many simple quantum systems is governed
by Schrödinger’s equation,
d
(4.92)
i |ψ = H|ψ .
dt
We will find it convenient to absorb into H, and use this convention for the rest of
this section. For a typical Hamiltonian of interest to physicists dealing with real particles
in space (rather than abstract systems such as qubits, which we have been dealing with!),
this reduces to
1 ∂2
∂
+
V
(x)
ψ(x) ,
(4.93)
i ψ(x) = −
∂t
2m ∂x2
using a convention known as the position representation x|ψ = ψ(x). This is an elliptical
equation very much like Equation (4.91). So just simulating Schrödinger’s equation is
not the especial difficulty faced in simulating quantum systems. What is the difficulty?
The key challenge in simulating quantum systems is the exponential number of
differential equations which must be solved. For one qubit evolving according to the
Schrödinger equation, a system of two differential equations must be solved; for two
qubits, four equations; and for n qubits, 2n equations. Sometimes, insightful approximations can be made which reduce the effective number of equations involved, thus making
classical simulation of the quantum system feasible. However, there are many physically
interesting quantum systems for which no such approximations are known.
Exercise 4.46: (Exponential complexity growth of quantum systems) Let ρ be
a density matrix describing the state of n qubits. Show that describing ρ requires
4n − 1 independent real numbers.
206
Quantum circuits
The reader with a physics background may appreciate that there are many important
quantum systems for which classical simulation is intractable. These include the Hubbard
model, a model of interacting fermionic particles with the Hamiltonian
n
t0 c∗kσ cjσ ,
V0 nk↑ nk↓ +
H=
k=1
(4.94)
k,j neighbors,σ
which is useful in the study of superconductivity and magnetism, the Ising model,
n
H=
k=1
σk · σk+1 ,
(4.95)
and many others. Solutions to such models give many physical properties such as the
dielectric constant, conductivity, and magnetic susceptibility of materials. More sophisticated models such as quantum electrodynamics (QED) and quantum chromodynamics
(QCD) can be used to compute constants such as the mass of the proton.
Quantum computers can efficiently simulate quantum systems for which there is no
known efficient classical simulation. Intuitively, this is possible for much the same reason
any quantum circuit can be constructed from a universal set of quantum gates. Moreover,
just as there exist unitary operations which cannot be efficiently approximated, it is
possible in principle to imagine quantum systems with Hamiltonians which cannot be
efficiently simulated on a quantum computer. Of course, we believe that such systems
aren’t actually realized in Nature, otherwise we’d be able to exploit them to do information
processing beyond the quantum circuit model.
4.7.2 The quantum simulation algorithm
Classical simulation begins with the realization that in solving a simple differential equation such as dy/dt = f (y), to first order, it is known that y(t + Δt) ≈ y(t) + f (y)Δt.
Similarly, the quantum case is concerned with the solution of id|ψ/dt = H|ψ, which,
for a time-independent H, is just
|ψ(t) = e−iHt |ψ(0) .
(4.96)
Since H is usually extremely difficult to exponentiate (it may be sparse, but it is also
exponentially large), a good beginning is the first order solution |ψ(t + Δt) ≈ (I −
iHΔt)|ψ(t). This is tractable, because for many Hamiltonians H it is straightforward to
compose quantum gates to efficiently approximate I − iHΔt. However, such first order
solutions are generally not very satisfactory.
Efficient approximation of the solution to Equation (4.96), to high order, is possible for
many classes of Hamiltonian. For example, in most physical systems, the Hamiltonian
can be written as a sum over many local interactions. Specifically, for a system of n
particles,
L
Hk ,
H=
(4.97)
k=1
where each Hk acts on at most a constant c number of systems, and L is a polynomial in
n. For example, the terms Hk are often just two-body interactions such as Xi Xj and onebody Hamiltonians such as Xi . Both the Hubbard and Ising models have Hamiltonians
of this form. Such locality is quite physically reasonable, and originates in many systems
Simulation of quantum systems
207
from the fact that most interactions fall off with increasing distance or difference in energy.
There are sometimes additional global symmetry constraints such as particle statistics;
we shall come to those shortly. The important point is that although e−iHt is difficult to
compute, e−iHk t acts on a much smaller subsystem, and is straightforward to approximate
,
using quantum circuits. But because [Hj , Hk ] = 0 in general, e−iHt = k e−iHk t ! How,
then, can e−iHk t be useful in constructing e−iHt ?
−iHt
Exercise 4.47: For H = L
= e−iH1 t e−iH2 t . . . e−iHL t for all t
k Hk , prove that e
if [Hj , Hk ] = 0, for all j, k.
Exercise 4.48: Show that the restriction of Hk to involve at most c particles implies
that in the sum (4.97), L is upper bounded by a polynomial in n.
The heart of quantum simulation algorithms is the following asymptotic approximation
theorem:
Theorem 4.3: (Trotter formula) Let A and B be Hermitian operators. Then for any
real t,
lim (eiAt/n eiBt/n )n = ei(A+B)t .
(4.98)
n→∞
Note that (4.98) is true even if A and B do not commute. Even more interestingly,
perhaps, it can be generalized to hold for A and B which are generators of certain kinds
of semigroups, which correspond to general quantum operations; we shall describe such
generators (the ‘Lindblad form’) in Section 8.4.1 of Chapter 8. For now, we only consider
the case of A and B being Hermitian matrices.
Proof
By definition,
e
iAt/n
1
= I + iAt + O
n
1
n2
and thus
e
iAt/n iBt/n
e
1
= I + i(A + B)t + O
n
,
(4.99)
1
n2
.
(4.100)
1
,
n
(4.101)
Taking products of these gives us
n
(eiAt/n eiBt/n )n = I +
k=1
n
k
1
i(A + B)t
nk
k
+O
1
1
= 1+O
/k!, this gives
and since
nk
n
n
(i(A + B)t)k
1
1
iAt/n iBt/n n
lim (e
1+O
+O
= ei(A+B)t .
e
) = lim
n→∞
n→∞
k!
n
n
k=0
(4.102)
n
k
Modifications of the Trotter formula provide the methods by which higher order
208
Quantum circuits
approximations can be derived for performing quantum simulations. For example, using
similar reasoning to the proof above, it can be shown that
ei(A+B)Δt = eiAΔt eiBΔt + O(Δt2 ) .
(4.103)
ei(A+B)Δt = eiAΔt/2 eiBΔt eiAΔt/2 + O(Δt3 ) .
(4.104)
Similarly,
An overview of the quantum simulation algorithm is given below, and an explicit example of simulating the one-dimensional non-relativistic Schrödinger equation is shown in
Box 4.2.
Algorithm: Quantum simulation
Inputs: (1) A Hamiltonian H = k Hk acting on an N -dimensional system,
where each Hk acts on a small subsystem of size independent of N , (2) an initial
state |ψ0 , of the system at t = 0, (3) a positive, non-zero accuracy δ, and (3) a
time tf at which the evolved state is desired.
Outputs: A state |ψ̃(tf ) such that |ψ̃(tf )|e−iHtf |ψ0 |2 ≥ 1 − δ.
Runtime: O(poly(1/δ)) operations.
Procedure: Choose a representation such that the state |ψ̃ of n = poly(log N )
qubits approximates the system and the operators e−iHk Δt have efficient
quantum circuit approximations. Select an approximation method (see for
example Equations (4.103)–(4.105)) and Δt such that the expected error is
acceptable (and jΔt = tf for an integer j), construct the corresponding quantum
circuit UΔt for the iterative step, and do:
1.
2.
3.
4.
|ψ̃0 ← |ψ0 ; j = 0
initialize state
→ |ψ̃j+1 = UΔt |ψ̃j
iterative update
→ j = j + 1 ; goto 2 until jΔt ≥ tf
loop
→ |ψ̃(tf ) = |ψ̃j
final result
Exercise 4.49: (Baker–Campbell–Hausdorf formula) Prove that
1
2
e(A+B)Δt = eAΔt eBΔt e− 2 [A,B]Δt + O(Δt3 ) ,
and also prove Equations (4.103) and (4.104).
Exercise 4.50: Let H = L
k Hk , and define
UΔt = e−iH1 Δt e−iH2 Δt . . . e−iHL Δt e−iHL Δt e−iHL−1 Δt . . . e−iH1 Δt .
(4.105)
(4.106)
(a) Prove that UΔt = e−2iHΔt + O(Δt3 ).
(b) Use the results in Box 4.1 to prove that for a positive integer m,
m −2miHΔt
E(UΔt
,e
) ≤ mαΔt3 ,
for some constant α.
(4.107)
Simulation of quantum systems
209
Box 4.2: Quantum simulation of Schrödinger’s equation
The methods and limitations of quantum simulation may be illustrated by the following example, drawn from the conventional models studied by physicists, rather
than the abstract qubit model. Consider a single particle living on a line, in a onedimensional potential V (x), governed by the Hamiltonian
p2
+ V (x) ,
(4.108)
2m
where p is the momentum operator and x is the position operator. The eigenvalues
of x are continuous, and the system state |ψ resides in an infinite dimensional
Hilbert space; in the x basis, it can be written as
- ∞
|xx|ψ dx .
(4.109)
|ψ =
H=
−∞
In practice, only some finite region is of interest, which we may take to be the
range −d ≤ x ≤ d. Furthermore, it is possible to choose a differential step size Δx
sufficiently small compared to the shortest wavelength in the system such that
d/Δx
|ψ̃ =
k=−d/Δx
ak |kΔx
(4.110)
provides a good physical approximation of |ψ. This state can be represented using
n = ⌈log(2d/Δx + 1)⌉ qubits; we simply replace the basis |kΔx (an eigenstate of
the x operator) with |k, a computational basis state of n qubits. Note that only
n qubits are required for this simulation, whereas classically 2n complex numbers
would have to be kept track of, thus leading to an exponential resource saving when
performing the simulation on a quantum computer.
Computation of |ψ̃(t) = e−iHt |ψ̃(0) must utilize one of the approximations of
Equations (4.103)–(4.105) because in general H1 = V (x) does not commute with
H0 = p2 /2m. Thus, we must be able to compute e−iH1 Δt and e−iH0 Δt . Because |ψ̃
is expressed in the eigenbasis of H1 , e−iH1 Δt is a diagonal transformation of the
form
|k → e−iV (kΔx)Δt |k .
(4.111)
It is straightforward to compute this, since we can compute V (kΔx)Δt. (See
also Problem 4.1.) The second term is also simple, because x and p are conju†
gate variables related by a quantum Fourier transform UFFT xUFFT
= p, and thus
−iH0 Δt
−ix2 Δt/2m †
−iH0 Δt
= UFFT e
UFFT ; to compute e
, do
e
2
†
|k .
|k → UFFT e−ix /2m UFFT
(4.112)
The construction of UFFT is discussed in Chapter 5.
4.7.3 An illustrative example
The procedure we have described for quantum simulations has concentrated on simulating Hamiltonians which are sums of local interations. However, this is not a fundamental
210
Quantum circuits
requirement! As the following example illustrates, efficient quantum simulations are possible even for Hamiltonians which act non-trivially on all or nearly all parts of a large
system.
Suppose we have the Hamiltonian
H = Z 1 ⊗ Z 2 ⊗ · · · ⊗ Zn ,
(4.113)
which acts on an n qubit system. Despite this being an interaction involving all of the
system, indeed, it can be simulated efficiently. What we desire is a simple quantum circuit
which implements e−iHΔt , for arbitrary values of Δt. A circuit doing precisely this, for
n = 3, is shown in Figure 4.19. The main insight is that although the Hamiltonian
involves all the qubits in the system, it does so in a classical manner: the phase shift
applied to the system is e−iΔt if the parity of the n qubits in the computational basis is
even; otherwise, the phase shift should be eiΔt . Thus, simple simulation of H is possible
by first classically computing the parity (storing the result in an ancilla qubit), then
applying the appropriate phase shift conditioned on the parity, then uncomputing the
parity (to erase the ancilla). This strategy clearly works not only for n = 3, but also for
arbitrary values of n.
•
•
•
•
•
|0
•
⊕ ⊕ ⊕ e−i∆tZ
⊕⊕⊕
|0
Figure 4.19. Quantum circuit for simulating the Hamiltonian H = Z1 ⊗ Z2 ⊗ Z3 for time Δt.
Furthermore, extending the same procedure allows us to simulate more complicated
extended Hamiltonians. Specifically, we can efficiently simulate any Hamiltonian of the
form
n
.
k
σc(k)
,
(4.114)
H=
k=1
k
where σc(k)
is a Pauli matrix (or the identity) acting on the kth qubit, with c(k) ∈
{0, 1, 2, 3} specifying one of {I, X, Y, Z}. The qubits upon which the identity operation
is performed can be disregarded, and X or Y terms can be transformed by single qubit
gates to Z operations. This leaves us with a Hamiltonian of the form of (4.113), which
is simulated as described above.
Exercise 4.51: Construct a quantum circuit to simulate the Hamiltonian
H = X1 ⊗ Y2 ⊗ Z3 ,
(4.115)
performing the unitary transform e−iΔtH for any Δt.
Using this procedure allows us to simulate a wide class of Hamiltonians containing
terms which are not local. In particular, it is possible to simulate a Hamiltonian of the form
Simulation of quantum systems
211
H= L
k=1 Hk where the only restriction is that the individual Hk have a tensor product
structure, and that L is polynomial in the total number of particles n. More generally, all
that is required is that there be an efficient circuit to simulate each Hk separately. As an
example, the Hamiltonian H = nk=1 Xk + Z ⊗n can easily be simulated using the above
techniques. Such Hamiltonians typically do not arise in Nature. However, they provide
a new and possibly valuable vista on the world of quantum algorithms.
4.7.4 Perspectives on quantum simulation
The quantum simulation algorithm is very similar to classical methods, but also differs
in a fundamental way. Each iteration of the quantum algorithm must completely replace
the old state with the new one; there is no way to obtain (non-trivial) information from
an intermediate step without significantly changing the algorithm, because the state is a
quantum one. Furthermore, the final measurement must be chosen cleverly to provide the
desired result, because it disturbs the quantum state. Of course, the quantum simulation
can be repeated to obtain statistics, but it is desirable to repeat the algorithm only at
most a polynomial number of times. It may be that even though the simulation can be
performed efficiently, there is no way to efficiently perform a desired measurement.
Also, there are Hamiltonians which simply can’t be simulated efficiently. In Section 4.5.4, we saw that there exist unitary transformations which quantum computers
cannot efficiently approximate. As a corollary, not all Hamiltonian evolutions can be efficiently simulated on a quantum computer, for if this were possible, then all unitary
transformations could be efficiently approximated!
Another difficult problem – one which is very interesting – is the simulation of equilibration processes. A system with Hamiltonian H in contact with an environment at
temperature T will generally come to thermal equilibrium in a state known as the Gibbs
state, ρtherm = e−H/kB T /Z, where kB is Boltzmann’s constant, and Z = tr e−H/kB T is
the usual partition function normalization, which ensures that tr(ρ) = 1. The process
by which this equilibration occurs is not very well understood, although certain requirements are known: the environment must be large, it must have non-zero population in
states with energies matching the eigenstates of H, and its coupling with the system
should be weak. Obtaining ρtherm for arbitrary H and T is generally an exponentially
difficult problem for a classical computer. Might a quantum computer be able to solve
this efficiently? We do not yet know.
On the other hand, as we discussed above many interesting quantum problems can
indeed be simulated efficiently with a quantum computer, even when they have extra
constraints beyond the simple algorithms presented here. A particular class of these
involve global symmetries originating from particle statistics. In the everyday world, we
are used to being able to identify different particles; tennis balls can be followed around a
tennis court, keeping track of which is which. This ability to keep track of which object is
which is a general feature of classical objects – by continuously measuring the position of a
classical particle it can be tracked at all times, and thus uniquely distinguished from other
particles. However, this breaks down in quantum mechanics, which prevents us from
following the motion of individual particles exactly. If the two particles are inherently
different, say a proton and an electron, then we can distinguish them by measuring the
sign of the charge to tell which particle is which. But in the case of identical particles,
like two electrons, it is found that they are truly indistinguishable.
Indistinguishability of particles places a constraint on the state vector of a system which
212
Quantum circuits
manifests itself in two ways. Experimentally, particles in Nature are found to come in
two distinct flavors, known as bosons and fermions. The state vector of a system of
bosons remains unchanged under permutation of any two constituents, reflecting their
fundamental indistinguishability. Systems of fermions, in contrast, experience a sign
change in their state vector under interchange of any two constituents. Both kinds of
systems can be simulated efficiently on a quantum computer. The detailed description
of how this is done is outside the scope of this book; suffice it to say the procedure is
fairly straightforward. Given an initial state of the wrong symmetry, it can be properly
symmetrized before the simulation begins. And the operators used in the simulation can
be constructed to respect the desired symmetry, even allowing for the effects of higher
order error terms. The reader who is interested in pursuing this and other topics further
will find pointers to the literature in ‘History and further reading,’ at the end of the
chapter.
Problem 4.1: (Computable phase shifts) Let m and n be positive integers.
Suppose f : {0, . . . , 2m − 1} → {0, . . . , 2n − 1} is a classical function from m to
n bits which may be computed reversibly using T Toffoli gates, as described in
Section 3.2.5. That is, the function (x, y) → (x, y ⊕ f (x)) may be implemented
using T Toffoli gates. Give a quantum circuit using 2T + n (or fewer) one, two,
and three qubit gates to implement the unitary operation defined by
−2iπf (x)
|x .
(4.116)
|x → exp
2n
Problem 4.2: Find a depth O(log n) construction for the C n (X) gate. (Comment:
The depth of a circuit is the number of distinct timesteps at which gates are
applied; the point of this problem is that it is possible to parallelize the C n (X)
construction by applying many gates in parallel during the same timestep.)
Problem 4.3: (Alternate universality construction) Suppose U is a unitary
matrix on n qubits. Define H ≡ i ln(U ). Show that
(1) H is Hermitian, with eigenvalues in the range 0 to 2π.
(2) H can be written
H=
hg g ,
(4.117)
g
where hg are real numbers and the sum is over all n-fold tensor products g
of the Pauli matrices {I, X, Y, Z}.
(3) Let Δ = 1/k, for some positive integer k. Explain how the unitary operation
exp(−ihg gΔ) may be implemented using O(n) one and two qubit operations.
(4) Show that
/
exp(−iHΔ) =
exp(−ihg gΔ) + O(4n Δ2 ) ,
(4.118)
g
where the product is taken with respect to any fixed ordering of the n-fold
tensor products of Pauli matrices, g.
Chapter problems
213
(5) Show that
U=
0
/
g
1k
exp(−ihg gΔ)
+ O(4n Δ).
(4.119)
(6) Explain how to approximate U to within a distance ǫ > 0 using O(n16n /ǫ)
one and two qubit unitary operations.
Problem 4.4: (Minimal Toffoli construction) (Research)
(1) What is the smallest number of two qubit gates that can be used to
implement the Toffoli gate?
gates that can be
(2) What is the smallest number of one qubit gates and
used to implement the Toffoli gate?
(3) What is the smallest number of one qubit gates and controlled-Z gates that
can be used to implement the Toffoli gate?
Problem 4.5: (Research) Construct a family of Hamiltonians, {Hn }, on n qubits,
such that simulating Hn requires a number of operations super-polynomial in n.
(Comment: This problem seems to be quite difficult.)
Problem 4.6: (Universality with prior entanglement) Controlledgates and
single qubit gates form a universal set of quantum logic gates. Show that an
alternative universal set of resources is comprised of single qubit unitaries, the
ability to perform measurements of pairs of qubits in the Bell basis, and the
ability to prepare arbitrary four qubit entangled states.
Summary of Chapter 4: Quantum circuits
• Universality: Any unitary operation on n qubits may be implemented exactly by
gates.
composing single qubit and controlled• Universality with a discrete set: The Hadamard gate, phase gate, controlledgate, and π/8 gate are universal for quantum computation, in the sense that
an arbitrary unitary operation on n qubits can be approximated to an arbitrary
accuracy ǫ > 0 using a circuit composed of only these gates. Replacing the π/8
gate in this list with the Toffoli gate also gives a universal family.
• Not all unitary operations can be efficiently implemented: There are unitary operations on n qubits which require Ω(2n log(1/ǫ)/ log(n)) gates to approximate to within a distance ǫ using any finite set of gates.
• Simulation: For a Hamiltonian H = k Hk which is a sum of polynomially
many terms Hk such that efficient quantum circuits for Hk can be constructed, a
quantum computer can efficiently simulate the evolution e−iHt and approximate
|ψ(t) = e−iHt |ψ(0), given |ψ(0).
214
Quantum circuits
History and further reading
The gate constructions in this chapter are drawn from a wide variety of sources. The
paper by Barenco, Bennett, Cleve, DiVincenzo, Margolus, Shor, Sleator, Smolin, and
+
Weinfurter[BBC 95] was the source of many of the circuit constructions in this chapter,
gates. Another useful
and for the universality proof for single qubit and controlledsource of insights about quantum circuits is the paper by Beckman, Chari, Devabhaktuni, and Preskill[BCDP96]. A gentle and accessible introduction has been provided by
DiVincenzo[DiV98]. The fact that measurements commute with control qubit terminals
was pointed out by Griffiths and Niu[GN96].
The universality proof for two-level unitaries is due to Reck, Zeilinger, Bernstein, and
and single qubit gates was proved
Bertani[RZBB94]. The universality of the controlledby DiVincenzo[DiV95b]. The universal gate G in Exercise 4.44 is sometimes known as the
Deutsch gate[Deu89]. Deutsch, Barenco, and Ekert[DBE95] and Lloyd[Llo95] independently
proved that almost any two qubit quantum logic gate is universal. That errors caused by
sequences of gates is at most the sum of the errors of the individual gates was proven by
Bernstein and Vazirani [BV97]. The specific universal set of gates we have focused on – the
and π/8 gates, was proved universal in Boykin, Mor,
Hadamard, phase, controlled+
Pulver, Roychowdhury, and Vatan[BMP 99], which also contains a proof that θ defined by
cos(θ/2) ≡ cos2 (π/8) is an irrational multiple of π. The bound in Section 4.5.4 is based
on a paper by Knill[Kni95], which does a much more detailed investigation of the hardness
of approximating arbitrary unitary operations using quantum circuits. In particular, Knill
obtains tighter and more general bounds than we do, and his analysis applies also to cases
where the universal set is a continuum of gates, not just a finite set, as we have considered.
The quantum circuit model of computation is due to Deutsch[Deu89], and was further
developed by Yao[Yao93]. The latter paper showed that the quantum circuit model of
computation is equivalent to the quantum Turing machine model. Quantum Turing
machines were introduced in 1980 by Benioff[Ben80], further developed by Deutsch[Deu85]
and Yao[Yao93], and their modern definition given by Bernstein and Vazirani[BV97]. The
latter two papers also take first steps towards setting up a theory of quantum computational
complexity, analogous to classical computational complexity theory. In particular, the
inclusion BQP ⊆ PSPACE and some slightly stronger results was proved by Bernstein
and Vazirani. Knill and Laflamme[KL99] develop some fascinating connections between
quantum and classical computational complexity. Other interesting work on quantum
computational complexity includes the paper by Adleman, Demarrais and Huang[ADH97],
and the paper by Watrous[Wat99]. The latter paper gives intriguing evidence to suggest
that quantum computers are more powerful than classical computers in the setting of
‘interactive proof systems’.
The suggestion that non-computational basis starting states may be used to obtain
computational power beyond the quantum circuits model was made by Daniel Gottesman
and Michael Nielson.
That quantum computers might simulate quantum systems more efficiently than classical computers was intimated by Manin[Man80] in 1980, and independently developed in
more detail by Feynman[Fey82] in 1982. Much more detailed investigations were subsequently carried out by Abrams and Lloyd[AL97], Boghosian and Taylor[BT97], Sornborger
and Stewart[SS99], Wiesner[Wie96], and Zalka[Zal98]. The Trotter formula is attributed to
Trotter[Tro59], and was also proven by Chernoff[Che68], although the simpler form for
History and further reading
215
unitary operators is much older, and goes back to the time of Sophus Lie. The third
order version of the Baker–Campbell–Hausdorff formula, Equation (4.104), was given by
Sornborger and Stewart[SS99]. Abrams and Lloyd[AL97] give a procedure for simulating
many-body Fermi systems on a quantum computer. Terhal and DiVincenzo address the
problem of simulating the equilibration of quantum systems to the Gibbs state[TD98].
The method used to simulate the Schrödinger equation in Box 4.2 is due to Zalka[Zal98]
and Wiesner[Wie96].
Exercise 4.25 is due to Vandersypen, and is related to work by Chau and Wilczek[CW95].
+
Exercise 4.45 is due to Boykin, Mor, Pulver, Roychowdhury, and Vatan[BMP 99]. Problem 4.2 is due to Gottesman. Problem 4.6 is due to Gottesman and Chuang[GC99].
5 The quantum Fourier transform and its applications
If computers that you build are quantum,
Then spies everywhere will all want ’em.
Our codes will all fail,
And they’ll read our email,
Till we get crypto that’s quantum, and daunt ’em.
– Jennifer and Peter Shor
To read our E-mail, how mean
of the spies and their quantum machine;
be comforted though,
they do not yet know
how to factorize twelve or fifteen.
– Volker Strassen
Computer programming is an art form, like the creation of poetry or music.
– Donald Knuth
The most spectacular discovery in quantum computing to date is that quantum computers can efficiently perform some tasks which are not feasible on a classical computer.
For example, finding the prime factorization of an n-bit integer is thought to require
exp(Θ(n1/3 log2/3 n)) operations using the best classical algorithm known at the time of
writing, the so-called number field sieve. This is exponential in the size of the number being factored, so factoring is generally considered to be an intractable problem on
a classical computer: it quickly becomes impossible to factor even modest numbers. In
contrast, a quantum algorithm can accomplish the same task using O(n2 log n log log n)
operations. That is, a quantum computer can factor a number exponentially faster than
the best known classical algorithms. This result is important in its own right, but perhaps the most exciting aspect is the question it raises: what other problems can be done
efficiently on a quantum computer which are infeasible on a classical computer?
In this chapter we develop the quantum Fourier transform, which is the key ingredient
for quantum factoring and many other interesting quantum algorithms. The quantum
Fourier transform, with which we begin in Section 5.1, is an efficient quantum algorithm
for performing a Fourier transform of quantum mechanical amplitudes. It does not speed
up the classical task of computing Fourier transforms of classical data. But one important
task which it does enable is phase estimation, the approximation of the eigenvalues of
a unitary operator under certain circumstances, as described in Section 5.2. This allows
us to solve several other interesting problems, including the order-finding problem and
the factoring problem, which are covered in Section 5.3. Phase estimation can also be
combined with the quantum search algorithm to solve the problem of counting solutions
to a search problem, as described in the next chapter. Section 5.4 concludes the chapter
with a discussion of how the quantum Fourier transform may be used to solve the hidden
The quantum Fourier transform
217
subgroup problem, a generalization of the phase estimation and order-finding problems
that has among its special cases an efficient quantum algorithm for the discrete logarithm
problem, another problem thought to be intractable on a classical computer.
5.1 The quantum Fourier transform
A good idea has a way of becoming simpler and solving problems other than
that for which it was intended.
– Robert Tarjan
One of the most useful ways of solving a problem in mathematics or computer science
is to transform it into some other problem for which a solution is known. There are a
few transformations of this type which appear so often and in so many different contexts
that the transformations are studied for their own sake. A great discovery of quantum
computation has been that some such transformations can be computed much faster on
a quantum computer than on a classical computer, a discovery which has enabled the
construction of fast algorithms for quantum computers.
One such transformation is the discrete Fourier transform. In the usual mathematical
notation, the discrete Fourier transform takes as input a vector of complex numbers,
x0 , . . . , xN −1 where the length N of the vector is a fixed parameter. It outputs the
transformed data, a vector of complex numbers y0 , . . . , yN −1 , defined by
1
yk ≡ √
N
N −1
xj e2πijk/N .
(5.1)
j=0
The quantum Fourier transform is exactly the same transformation, although the
conventional notation for the quantum Fourier transform is somewhat different. The
quantum Fourier transform on an orthonormal basis |0, . . . , |N − 1 is defined to be a
linear operator with the following action on the basis states,
1
|j −→ √
N
N −1
k=0
e2πijk/N |k .
(5.2)
Equivalently, the action on an arbitrary state may be written
N −1
j=0
xj |j −→
N −1
k=0
yk |k ,
(5.3)
where the amplitudes yk are the discrete Fourier transform of the amplitudes xj . It is not
obvious from the definition, but this transformation is a unitary transformation, and thus
can be implemented as the dynamics for a quantum computer. We shall demonstrate
the unitarity of the Fourier transform by constructing a manifestly unitary quantum
circuit computing the Fourier transform. It is also easy to prove directly that the Fourier
transform is unitary:
Exercise 5.1: Give a direct proof that the linear transformation defined by
Equation (5.2) is unitary.
Exercise 5.2: Explicitly compute the Fourier transform of the n qubit state |00 . . . 0.
218
The quantum Fourier transform and its applications
In the following, we take N = 2n , where n is some integer, and the basis |0, . . . , |2n −
1 is the computational basis for an n qubit quantum computer. It is helpful to write the
state |j using the binary representation j = j1 j2 . . . jn . More formally, j = j1 2n−1 +
j2 2n−2 + · · · + jn 20 . It is also convenient to adopt the notation 0.jl jl+1 . . . jm to represent
the binary fraction jl /2 + jl+1 /4 + · · · + jm /2m−l+1 .
With a little algebra the quantum Fourier transform can be given the following useful
product representation:
|0 + e2πi0.jn |1 |0 + e2πi0.jn−1 jn |1 · · · |0 + e2πi0.j1 j2 ···jn |1
.
|j1 , . . . , jn →
2n/2
(5.4)
This product representation is so useful that you may even wish to consider this to be the
definition of the quantum Fourier transform. As we explain shortly this representation
allows us to construct an efficient quantum circuit computing the Fourier transform, a
proof that the quantum Fourier transform is unitary, and provides insight into algorithms
based upon the quantum Fourier transform. As an incidental bonus we obtain the classical
fast Fourier transform, in the exercises!
The equivalence of the product representation (5.4) and the definition (5.2) follows
from some elementary algebra:
|j →
=
=
=
=
1
2n/2
n
2
−1
...
k1 =0
1
1
2n/2
k1 =0
1
1
2n/2
l=1
n
e
2πij
kn =0
...
1
n
n
l=1
kl 2−l
|k1 . . . kn
e2πijkl 2 |kl
−l
(5.6)
(5.7)
kn =0 l=1
n
1
1
2n/2
(5.5)
k=0
1
1
2n/2
n
e2πijk/2 |k
e2πijkl 2 |kl
−l
(5.8)
kl =0
|0 + e2πij2 |1
−l
(5.9)
l=1
|0 + e2πi0.jn |1
|0 + e2πi0.jn−1 jn |1 · · · |0 + e2πi0.j1 j2 ···jn |1
.(5.10)
2n/2
The product representation (5.4) makes it easy to derive an efficient circuit for the
quantum Fourier transform. Such a circuit is shown in Figure 5.1. The gate Rk denotes
the unitary transformation
=
Rk ≡
1
0
2πi/2k
0 e
.
(5.11)
To see that the pictured circuit computes the quantum Fourier transform, consider what
happens when the state |j1 . . . jn is input. Applying the Hadamard gate to the first bit
produces the state
1
2πi0.j1
|1
|j2 . . . jn ,
(5.12)
|0
+
e
21/2
The quantum Fourier transform
219
Figure 5.1. Efficient circuit for the quantum Fourier transform. This circuit is easily derived from the product
representation (5.4) for the quantum Fourier transform. Not shown
are swap gates at the end of the circuit which
√
reverse the order of the qubits, or normalization factors of 1/ 2 in the output.
since e2πi0.j1 = −1 when j1 = 1, and is +1 otherwise. Applying the controlled-R2 gate
produces the state
(
1 '
|0 + e2πi0.j1 j2 |1 |j2 . . . jn .
(5.13)
1/2
2
We continue applying the controlled-R3 , R4 through Rn gates, each of which adds an
extra bit to the phase of the co-efficient of the first |1. At the end of this procedure we
have the state
(
1 '
2πi0.j1 j2 ...jn
|0
+
e
|1
|j2 . . . jn .
(5.14)
21/2
Next, we perform a similar procedure on the second qubit. The Hadamard gate puts us
in the state
('
(
1 '
2πi0.j1 j2 ...jn
2πi0.j2
|0
+
e
|1
|0
+
e
|1
|j3 . . . jn ,
(5.15)
22/2
and the controlled-R2 through Rn−1 gates yield the state
('
(
1 '
2πi0.j1 j2 ...jn
2πi0.j2 ...jn
|0
+
e
|1
|0
+
e
|1
|j3 . . . jn .
(5.16)
22/2
We continue in this fashion for each qubit, giving a final state
('
(
'
(
1 '
2πi0.j1 j2 ...jn
2πi0.j2 ...jn
2πi0.jn
|0
+
e
|1
|0
+
e
|1
.
.
.
|0
+
e
|1
. (5.17)
2n/2
Swap operations (see Section 1.3.4 for a description of the circuit), omitted from Figure 5.1 for clarity, are then used to reverse the order of the qubits. After the swap
operations, the state of the qubits is
('
(
'
(
1 '
2πi0.jn
2πi0.jn−1 jn
2πi0.j1 j2 ···jn
|0
+
e
|1
|0
+
e
|1
.
.
.
|0
+
e
|1
. (5.18)
2n/2
Comparing with Equation (5.4) we see that this is the desired output from the quantum
Fourier transform. This construction also proves that the quantum Fourier transform is
unitary, since each gate in the circuit is unitary. An explicit example showing a circuit
for the quantum Fourier transform on three qubits is given in Box 5.1.
How many gates does this circuit use? We start by doing a Hadamard gate and n − 1
conditional rotations on the first qubit – a total of n gates. This is followed by a Hadamard
gate and n − 2 conditional rotations on the second qubit, for a total of n + (n − 1) gates.
Continuing in this way, we see that n + (n − 1) + · · · + 1 = n(n + 1)/2 gates are required,
220
The quantum Fourier transform and its applications
Box 5.1: Three qubit quantum Fourier transform
For concreteness it may help to look at the explicit circuit for the three qubit
quantum Fourier transform:
Recall that S and T are the phase and π/8 gates (see page xxiii). As a matrix the
quantum Fourier
√ transform in this instance may be written out explicitly, using
ω = e2πi/8 = i, as
⎡1 1
1
1
1
1
1
1 ⎤
⎢1
⎢
⎢1
⎢
1 ⎢
1
√ ⎢
⎢1
8⎢
⎢1
⎢
⎣1
1
ω
ω2
ω3
ω4
ω5
ω6
ω7
ω2
ω4
ω6
1
ω2
ω4
ω6
ω3
ω6
ω1
ω4
ω7
ω2
ω5
ω4
1
ω4
1
ω4
1
ω4
ω5
ω2
ω7
ω4
ω1
ω6
ω3
ω6
ω4
ω2
1
ω6
ω4
ω2
ω7 ⎥
⎥
ω6 ⎥
⎥
ω5 ⎥
⎥.
ω4 ⎥
⎥
ω3 ⎥
⎥
ω2
ω1
(5.19)
plus the gates involved in the swaps. At most n/2 swaps are required, and each swap
gates. Therefore, this circuit provides a
can be accomplished using three controlledΘ(n2 ) algorithm for performing the quantum Fourier transform.
In contrast, the best classical algorithms for computing the discrete Fourier transform
on 2n elements are algorithms such as the Fast Fourier Transform (FFT), which compute the discrete Fourier transform using Θ(n2n ) gates. That is, it requires exponentially
more operations to compute the Fourier transform on a classical computer than it does
to implement the quantum Fourier transform on a quantum computer.
At face value this sounds terrific, since the Fourier transform is a crucial step in so many
real-world data processing applications. For example, in computer speech recognition,
the first step in phoneme recognition is to Fourier transform the digitized sound. Can
we use the quantum Fourier transform to speed up the computation of these Fourier
transforms? Unfortunately, the answer is that there is no known way to do this. The
problem is that the amplitudes in a quantum computer cannot be directly accessed by
measurement. Thus, there is no way of determining the Fourier transformed amplitudes
of the original state. Worse still, there is in general no way to efficiently prepare the
original state to be Fourier transformed. Thus, finding uses for the quantum Fourier
transform is more subtle than we might have hoped. In this and the next chapter we
develop several algorithms based upon a more subtle application of the quantum Fourier
transform.
Phase estimation
221
Exercise 5.3: (Classical fast Fourier transform) Suppose we wish to perform a
Fourier transform of a vector containing 2n complex numbers on a classical
computer. Verify that the straightforward method for performing the Fourier
transform, based upon direct evaluation of Equation (5.1) requires Θ(22n )
elementary arithmetic operations. Find a method for reducing this to Θ(n2n )
operations, based upon Equation (5.4).
Exercise 5.4: Give a decomposition of the controlled-Rk gate into single qubit and
gates.
Exercise 5.5: Give a quantum circuit to perform the inverse quantum Fourier
transform.
Exercise 5.6: (Approximate quantum Fourier transform) The quantum circuit
construction of the quantum Fourier transform apparently requires gates of
exponential precision in the number of qubits used. However, such precision is
never required in any quantum circuit of polynomial size. For example, let U be
the ideal quantum Fourier transform on n qubits, and V be the transform which
results if the controlled-Rk gates are performed to a precision Δ = 1/p(n) for
some polynomial p(n). Show that the error E(U, V ) ≡ max|ψ (U − V )|ψ
scales as Θ(n2 /p(n)), and thus polynomial precision in each gate is sufficient to
guarantee polynomial accuracy in the output state.
5.2 Phase estimation
The Fourier transform is the key to a general procedure known as phase estimation,
which in turn is the key for many quantum algorithms. Suppose a unitary operator U
has an eigenvector |u with eigenvalue e2πiϕ , where the value of ϕ is unknown. The goal
of the phase estimation algorithm is to estimate ϕ. To perform the estimation we assume
that we have available black boxes (sometimes known as oracles) capable of preparing the
j
state |u and performing the controlled-U 2 operation, for suitable non-negative integers
j. The use of black boxes indicates that the phase estimation procedure is not a complete
quantum algorithm in its own right. Rather, you should think of phase estimation as a
kind of ‘subroutine’ or ‘module’ that, when combined with other subroutines, can be
used to perform interesting computational tasks. In specific applications of the phase
estimation procedure we shall do exactly this, describing how these black box operations
are to be performed, and combining them with the phase estimation procedure to do
genuinely useful tasks. For the moment, though, we will continue to imagine them as
black boxes.
The quantum phase estimation procedure uses two registers. The first register contains
t qubits initially in the state |0. How we choose t depends on two things: the number
of digits of accuracy we wish to have in our estimate for ϕ, and with what probability
we wish the phase estimation procedure to be successful. The dependence of t on these
quantities emerges naturally from the following analysis.
The second register begins in the state |u, and contains as many qubits as is necessary
to store |u. Phase estimation is performed in two stages. First, we apply the circuit shown
in Figure 5.2. The circuit begins by applying a Hadamard transform to the first register,
followed by application of controlled-U operations on the second register, with U raised
222
The quantum Fourier transform and its applications
to successive powers of two. The final state of the first register is easily seen to be:
('
(
'
(
1 '
2πi2t−1 ϕ
2πi2t−2 ϕ
2πi20 ϕ
|0
+
e
|1
|0
+
e
|1
.
.
.
|0
+
e
|1
2t/2
=
2t −1
1
2t/2
k=0
e2πiϕk |k .
(5.20)
We omit the second register from this description, since it stays in the state |u throughout
the computation.
⎧
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
First register ⎨
t qubits
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎩
⎧
⎪
⎨
Second register ⎩
|0
H
|0
H
|0
H
|0
H
|u
···
•
•
•
0
U2
1
U2
2
U2
t−1 ϕ)
|0
e2πi(2
···
|0
e2πi(2
···
|0
e2πi(2
···
|0
e2πi(2
···
•
t−1
U2
|1
2 ϕ)
|1
1 ϕ)
|1
0 ϕ)
|1
|u
√
Figure 5.2. The first stage of the phase estimation procedure. Normalization factors of 1/ 2 have been omitted, on
the right.
Exercise 5.7: Additional insight into the circuit in Figure 5.2 may be obtained by
showing, as you should now do, that the effect of the sequence of controlled-U
operations like that in Figure 5.2 is to take the state |j|u to |jU j |u. (Note
that this does not depend on |u being an eigenstate of U .)
The second stage of phase estimation is to apply the inverse quantum Fourier transform
on the first register. This is obtained by reversing the circuit for the quantum Fourier
transform in the previous section (Exercise 5.5), and can be done in Θ(t2 ) steps. The
third and final stage of phase estimation is to read out the state of the first register by
doing a measurement in the computational basis. We will show that this provides a pretty
good estimate of ϕ. An overall schematic of the algorithm is shown in Figure 5.3.
To sharpen our intuition as to why phase estimation works, suppose ϕ may be expressed exactly in t bits, as ϕ = 0.ϕ1 . . . ϕt . Then the state (5.20) resulting from the first
stage of phase estimation may be rewritten
('
(
'
(
1 '
2πi0.ϕt
2πi0.ϕt−1 ϕt
2πi0.ϕ1 ϕ2 ···ϕt
|0
+
e
|1
|0
+
e
|1
.
.
.
|0
+
e
|1
. (5.21)
2t/2
The second stage of phase estimation is to apply the inverse quantum Fourier transform.
But comparing the previous equation with the product form for the Fourier transform,
Equation (5.4), we see that the output state from the second stage is the product state
|ϕ1 . . . ϕt . A measurement in the computational basis therefore gives us ϕ exactly!
Phase estimation
223
❴✤✤ ❴ ❴ ❴ ❴ ❴ L ❴ ❴ ✤✤
✤✤
✤✤
✙✙
✤
✤✤
✙✙
✤❴ ❴ ❴ ❴ ❴✙ ✙ ❴ ❴ ❴ ✤✤
Figure 5.3. Schematic of the overall phase estimation procedure. The top t qubits (the ‘/’ denotes a bundle of
wires, as usual) are the first register, and the bottom qubits are the second register, numbering as many as required
2πiϕ . The output of the measurement is an
to perform U . |u is an eigenstate of2U with eigenvalue
3 e
1
bits, with probability of success at least 1 − ǫ.
approximation to ϕ accurate to t − log 2 + 2ǫ
Summarizing, the phase estimation algorithm allows one to estimate the phase ϕ of an
eigenvalue of a unitary operator U , given the corresponding eigenvector |u. An essential
feature at the heart of this procedure is the ability of the inverse Fourier transform to
perform the transformation
2t −1
1
2t/2
j=0
e2πiϕj |j|u → |ϕ̃|u ,
(5.22)
where |ϕ̃ denotes a state which is a good estimator for ϕ when measured.
5.2.1 Performance and requirements
The above analysis applies to the ideal case, where ϕ can be written exactly with a t
bit binary expansion. What happens when this is not the case? It turns out that the
procedure we have described will produce a pretty good approximation to ϕ with high
probability, as foreshadowed by the notation used in (5.22). Showing this requires some
careful manipulations.
Let b be the integer in the range 0 to 2t − 1 such that b/2t = 0.b1 . . . bt is the best t bit
approximation to ϕ which is less than ϕ. That is, the difference δ ≡ ϕ − b/2t between
ϕ and b/2t satisfies 0 ≤ δ ≤ 2−t . We aim to show that the observation at the end of
the phase estimation procedure produces a result which is close to b, and thus enables us
to estimate ϕ accurately, with high probability. Applying the inverse quantum Fourier
transform to the state (5.20) produces the state
1
2t
2t −1
e
−2πikl
2t
k,l=0
e2πiϕk |l .
(5.23)
Let αl be the amplitude of |(b + l)(mod 2t ),
1
αl ≡ t
2
2t −1 '
t
e2πi(ϕ−(b+l)/2
k=0
)
(k
.
This is the sum of a geometric series, so
t
1
1 − e2πi(2 ϕ−(b+l))
αl = t
2 1 − e2πi(ϕ−(b+l)/2t )
(5.24)
(5.25)
224
The quantum Fourier transform and its applications
1
= t
2
t
1 − e2πi(2 δ−l)
1 − e2πi(δ−l/2t )
.
(5.26)
Suppose the outcome of the final measurement is m. We aim to bound the probability of
obtaining a value of m such that |m − b| > e, where e is a positive integer characterizing
our desired tolerance to error. The probability of observing such an m is given by
p(|m − b| > e) =
−2t−1 <l≤−(e+1)
|αl |2 +
e+1≤l≤2t−1
|αl |2 .
(5.27)
But for any real θ, |1 − exp(iθ)| ≤ 2, so
|αl | ≤
2
.
2t |1 − e2πi(δ−l/2t ) |
(5.28)
By elementary geometry or calculus |1 − exp(iθ)| ≥ 2|θ|/π whenever −π ≤ θ ≤ π. But
when −2t−1 < l ≤ 2t−1 we have −π ≤ 2π(δ − l/2t ) ≤ π. Thus
|αl | ≤
2t+1 (δ
1
.
− l/2t )
Combining (5.27) and (5.29) gives
⎡
⎤
−(e+1)
2t−1
1
1
1⎣
.
+
p(|m − b| > e) ≤
4 l=−2t−1 +1 (l − 2t δ)2 l=e+1 (l − 2t δ)2
Recalling that 0 ≤ 2t δ ≤ 1, we obtain
⎡
1⎣
p(|m − b| > e) ≤
4
−(e+1)
l=−2t−1 +1
2t−1 −1
≤
1
2
≤
1
2
=
1
.
2(e − 1)
-
l=e
1
l2
2t−1 −1
e−1
⎤
2t−1
1
1
+
l2 l=e+1 (l − 1)2
(5.29)
(5.30)
(5.31)
(5.32)
dl
1
l2
(5.33)
(5.34)
Suppose we wish to approximate ϕ to an accuracy 2−n , that is, we choose e = 2t−n − 1.
By making use of t = n + p qubits in the phase estimation algorithm we see from (5.34)
that the probability of obtaining an approximation correct to this accuracy is at least
1 − 1/2(2p − 2). Thus to successfully obtain ϕ accurate to n bits with probability of
success at least 1 − ǫ we choose
4
5
1
.
(5.35)
t = n + log 2 +
2ǫ
In order to make use of the phase estimation algorithm, we need to be able to prepare an
eigenstate |u of U . What if we do not know how to prepare such an eigenstate? Suppose
that we prepare some other state |ψ in place of |u. Expanding this state in terms of
eigenstates |u of U gives |ψ = u cu |u. Suppose the eigenstate |u has eigenvalue
e2πiϕu . Intuitively, the result of running the phase estimation algorithm will be to give
Phase estimation
225
6u |u, where ϕ
6u is a pretty good approximation to the
as output a state close to u cu |ϕ
phase ϕu . Therefore, we expect that reading out the first register will give us a good
approximation to ϕu , where u is chosen at random with probability |cu |2 . Making this
argument rigorous is left for Exercise 5.8. This procedure allows us to avoid preparing
a (possibly unknown) eigenstate, at the cost of introducing some additional randomness
into the algorithm.
Exercise 5.8: Suppose the phase estimation algorithm takes the state |0|u to the
6u |u, so that given the input |0
state |ϕ
u cu |u , the algorithm outputs
6u |u. Show that if t is chosen according to (5.35), then the probability
u cu |ϕ
for measuring ϕu accurate to n bits at the conclusion of the phase estimation
algorithm is at least |cu |2 (1 − ǫ).
Why is phase estimation interesting? For its own sake, phase estimation solves a problem which is both non-trivial and interesting from a physical point of view: how to
estimate the eigenvalue associated to a given eigenvector of a unitary operator. Its real
use, though, comes from the observation that other interesting problems can be reduced
to phase estimation, as will be shown in subsequent sections. The phase estimation algorithm is summarized below.
Algorithm: Quantum phase estimation
Inputs: (1) A black box wich performs a controlled-U j operation,2 for integer j,
3
(2) an eigenstate |u of U with eigenvalue e2πiϕu , and (3) t = n + log 2 + 2ǫ1
qubits initialized to |0.
6u to ϕu .
Outputs: An n-bit approximation ϕ
Runtime: O(t2 ) operations and one call to controlled-U j black box. Succeeds
with probability at least 1 − ǫ.
Procedure:
1.
2.
3.
|0|u
1
→√
2t
1
→√
2t
initial state
2t −1
j=0
2t −1
j=0
1
=√
2t
4.
5.
6u |u
→ |ϕ
6u
→ϕ
|j|u
create superposition
|jU j |u
apply black box
2t −1
j=0
e2πijϕu |j|u
result of black box
apply inverse Fourier transform
measure first register
Exercise 5.9: Let U be a unitary transform with eigenvalues ±1, which acts on a state
|ψ. Using the phase estimation procedure, construct a quantum circuit to
collapse |ψ into one or the other of the two eigenspaces of U , giving also a
226
The quantum Fourier transform and its applications
classical indicator as to which space the final state is in. Compare your result
with Exercise 4.34.
5.3 Applications: order-finding and factoring
The phase estimation procedure can be used to solve a variety of interesting problems. We
now describe two of the most interesting of these problems: the order-finding problem,
and the factoring problem. These two problems are, in fact, equivalent to one another, so
in Section 5.3.1 we explain a quantum algorithm for solving the order-finding problem,
and in Section 5.3.2 we explain how the order-finding problem implies the ability to
factor as well.
To understand the quantum algorithms for factoring and order-finding requires a
little background in number theory. All the required materials are collected together in
Appendix 4. The description we give over the next two sections focuses on the quantum
aspects of the problem, and requires only a little familiarity with modular arithmetic to
be readable. Detailed proofs of the number-theoretic results we quote here may be found
in Appendix 4.
The fast quantum algorithms for order-finding and factoring are interesting for at least
three reasons. First, and most important in our opinion, they provide evidence for the idea
that quantum computers may be inherently more powerful than classical computers, and
provide a credible challenge to the strong Church–Turing thesis. Second, both problems
are of sufficient intrinsic worth to justify interest in any novel algorithm, be it classical
or quantum. Third, and most important from a practical standpoint, efficient algorithms
for order-finding and factoring can be used to break the RSA public-key cryptosystem
(Appendix 5).
5.3.1 Application: order-finding
For positive integers x and N , x < N , with no common factors, the order of x modulo N
is defined to be the least positive integer, r, such that xr = 1(mod N ). The order-finding
problem is to determine the order for some specified x and N . Order-finding is believed
to be a hard problem on a classical computer, in the sense that no algorithm is known
to solve the problem using resources polynomial in the O(L) bits needed to specify the
problem, where L ≡ ⌈log(N )⌉ is the number of bits needed to specify N . In this section
we explain how phase estimation may be used to obtain an efficient quantum algorithm
for order-finding.
Exercise 5.10: Show that the order of x = 5 modulo N = 21 is 6.
Exercise 5.11: Show that the order of x satisfies r ≤ N .
The quantum algorithm for order-finding is just the phase estimation algorithm applied
to the unitary operator
U |y ≡ |xy(mod N ) ,
(5.36)
with y ∈ {0, 1}L . (Note that here and below, when N ≤ y ≤ 2L − 1, we use the
convention that xy(mod N ) is just y again. That is, U only acts non-trivially when
227
Applications: order-finding and factoring
0 ≤ y ≤ N − 1.) A simple calculation shows that the states defined by
r−1
1
−2πisk
exp
|xk mod N ,
|us ≡ √
r k=0
r
for integer 0 ≤ s ≤ r − 1 are eigenstates of U , since
r−1
1
−2πisk
U |us = √
exp
|xk+1 mod N
r k=0
r
2πis
|us .
= exp
r
(5.37)
(5.38)
(5.39)
Using the phase estimation procedure allows us to obtain, with high accuracy, the corresponding eigenvalues exp(2πis/r), from which we can obtain the order r with a little
bit more work.
Exercise 5.12: Show that U is unitary (Hint: x is co-prime to N , and therefore has
an inverse modulo N ).
There are two important requirements for us to be able to use the phase estimation
j
procedure: we must have efficient procedures to implement a controlled-U 2 operation
for any integer j, and we must be able to efficiently prepare an eigenstate |us with a nontrivial eigenvalue, or at least a superposition of such eigenstates. The first requirement
is satisfied by using a procedure known as modular exponentiation, with which we
j
can implement the entire sequence of controlled-U 2 operations applied by the phase
estimation procedure using O(L3 ) gates, as described in Box 5.2.
The second requirement is a little tricker: preparing |us requires that we know r, so
this is out of the question. Fortunately, there is a clever observation which allows us to
circumvent the problem of preparing |us , which is that
1
√
r
r−1
s=0
|us = |1 .
(5.44)
2
3
In performing the phase estimation procedure, if we use t = 2L + 1 + log 2 + 2ǫ1
qubits in the first register (referring to Figure 5.3), and prepare the second register in
the state |1 – which is trivial to construct – it follows that for each s in the range 0
through r − 1, we will obtain an estimate of the phase ϕ ≈ s/r accurate to 2L + 1 bits,
with probability at least (1 − ǫ)/r. The order-finding algorithm is schematically depicted
in Figure 5.4.
Exercise 5.13: Prove (5.44). (Hint:
1
√
r
r−1
s=0
r−1
s=0
exp(−2πisk/r) = rδk0 .) In fact, prove that
e2πisk/r |us = |xk mod N .
(5.45)
Exercise 5.14: The quantum state produced in the order-finding algorithm, before
the inverse Fourier transform, is
|ψ =
2t −1
j=0
j
|jU |1 =
2t −1
j=0
|j|xj mod N ,
(5.46)
228
The quantum Fourier transform and its applications
Box 5.2: Modular exponentiation
j
How can we compute the sequence of controlled-U 2 operations used by the phase
estimation procedure as part of the order-finding algorithm? That is, we wish to
compute the transformation
t−1
|z|y → |zU zt2
t−1
= |z|xzt 2
z
0
. . . U z12 |y
(5.40)
0
× · · · × xz1 2 y(mod N )
= |z|x y(mod N ).
(5.41)
(5.42)
j
Thus the sequence of controlled-U 2 operations used in phase estimation is equivalent to multiplying the contents of the second register by the modular exponential
xz (mod N ), where z is the contents of the first register. This operation may be
accomplished easily using the techniques of reversible computation. The basic idea
is to reversibly compute the function xz (mod N ) of z in a third register, and then
to reversibly multiply the contents of the second register by xz (mod N ), using the
trick of uncomputation to erase the contents of the third register upon completion.
The algorithm for computing the modular exponential has two stages. The first stage
uses modular multiplication to compute x2 (mod N ), by squaring x modulo N , then
computes x4 (mod N ) by squaring x2 (mod N ), and continues in this way, computing
j
x2 (mod N ) for all j up to t − 1. We use t = 2L + 1 + ⌈log(2 + 1/(2ǫ))⌉ = O(L),
so a total of t − 1 = O(L) squaring operations is performed at a cost of O(L2 )
each (this cost assumes the circuit used to do the squaring implements the familiar
algorithm we all learn as children for multiplication), for a total cost of O(L3 ) for
the first stage. The second stage of the algorithm is based upon the observation
we’ve already noted,
'
('
(
'
(
t−1
t−2
0
xz (mod N ) = xzt 2 (mod N ) xzt−1 2 (mod N ) . . . xz1 2 (mod N ) .
(5.43)
Performing t − 1 modular multiplications with a cost O(L2 ) each, we see that this
product can be computed using O(L3 ) gates. This is sufficiently efficient for our
purposes, but more efficient algorithms are possible based on more efficient algorithms for multiplication (see ‘History and further reading’). Using the techniques
of Section 3.2.5, it is now straightforward to construct a reversible circuit with a
t bit register and an L bit register which, when started in the state (z, y) outputs
(z, xz y (mod N )), using O(L3 ) gates, which can be translated into a quantum circuit
using O(L3 ) gates computing the transformation |z|y → |z|xz y (mod N ).
if we initialize the second register as |1. Show that the same state is obtained if
we replace U j with a different unitary transform V , which computes
V |j|k = |j|k + xj mod N ,
(5.47)
and start the second register in the state |0. Also show how to construct V using
O(L3 ) gates.
229
Applications: order-finding and factoring
Register 1
t qubits
|0 /
Register 2
|1 /
L qubits
H ⊗t
|j
•
FT†
❴✤✤ ❴ ❴ ❴ ❴ ❴ L ❴ ❴ ✤✤
✤✤
✤✤
✙✙
✤
✤✤
✙✙
✤❴ ❴ ❴ ❴ ❴✙ ✙ ❴ ❴ ❴ ✤✤
xj mod N
Figure 5.4. Quantum circuit for the order-finding algorithm. The second register is shown as being initialized to
the |1 state, but if the method of Exercise 5.14 is used, it can be initialized to |0 instead. This circuit can also be
used for factoring, using the reduction given in Section 5.3.2.
The continued fraction expansion
The reduction of order-finding to phase estimation is completed by describing how to
obtain the desired answer, r, from the result of the phase estimation algorithm, ϕ ≈ s/r.
We only know ϕ to 2L + 1 bits, but we also know a priori that it is a rational number
– the ratio of two bounded integers – and if we could compute the nearest such fraction
to ϕ we might obtain r.
Remarkably, there is an algorithm which accomplishes this task efficiently, known as
the continued fractions algorithm. An example of how this works is described in Box 5.3.
The reason this algorithm satisfies our needs is the following theorem, which is proved
in Appendix 4:
Theorem 5.1: Suppose s/r is a rational number such that
+s
+
1
+
+
+ − ϕ+ ≤ 2 .
r
2r
(5.48)
Then s/r is a convergent of the continued fraction for ϕ, and thus can be
computed in O(L3 ) operations using the continued fractions algorithm.
Since ϕ is an approximation of s/r accurate to 2L + 1 bits, it follows that |s/r − ϕ| ≤
2−2L−1 ≤ 1/2r2 , since r ≤ N ≤ 2L . Thus, the theorem applies.
Summarizing, given ϕ the continued fractions algorithm efficiently produces numbers
s′ and r′ with no common factor, such that s′ /r′ = s/r. The number r′ is our candidate
′
for the order. We can check to see whether it is the order by calculating xr mod N , and
seeing if the result is 1. If so, then r′ is the order of x modulo N , and we are done!
Performance
How can the order-finding algorithm fail? There are two possibilities. First, the phase
estimation procedure might produce a bad estimate to s/r. This occurs with probability
at most ǫ, and can be made small with a negligible increase in the size of the circuit.
More seriously, it might be that s and r have a common factor, in which case the
number r′ returned by the continued fractions algorithm be a factor of r, and not r itself.
Fortunately, there are at least three ways around this problem.
Perhaps the most straightforward way is to note that for randomly chosen s in the
range 0 through r − 1, it’s actually pretty likely that s and r are co-prime, in which
case the continued fractions algorithm must return r. To see that this is the case, note
that by Problem 4.1 on page 638 the number of prime numbers less than r is at least
230
The quantum Fourier transform and its applications
Box 5.3: The continued fractions algorithm
The idea of the continued fractions algorithm is to describe real numbers in terms
of integers alone, using expressions of the form
[a0 , . . . , aM ] ≡ a0 +
1
a1 +
,
1
a2 +
(5.49)
1
...+ 1
aM
where a0 , . . . , aM are positive integers. (For applications to quantum computing it
is convenient to allow a0 = 0 as well.) We define the mth convergent (0 ≤ m ≤ M )
to this continued fraction to be [a0 , . . . , am ]. The continued fractions algorithm
is a method for determining the continued fraction expansion of an arbitrary real
number. It is easily understood by example. Suppose we are trying to decompose
31/13 as a continued fraction. The first step of the continued fractions algorithm
is to split 31/13 into its integer and fractional part,
31
5
=2+
.
13
13
Next we invert the fractional part, obtaining
(5.50)
31
1
= 2 + 13 .
13
5
(5.51)
These steps – split then invert – are now applied to 13/5, giving
1
31
= 2+
13
2+
3
5
=2+
1
2+
1
.
(5.52)
5
3
Next we split and invert 5/3:
31
1
1
=2+
.
= 2+
1
13
2 + 1+ 2
2 + 1+1 1
(5.53)
3
2
3
The decomposition into a continued fraction now terminates, since
1
3
= 1+
(5.54)
2
2
may be written with a 1 in the numerator without any need to invert, giving a final
continued fraction representation of 31/13 as
1
31
=2+
.
13
2 + 1+ 1 1
(5.55)
1+ 1
2
It’s clear that the continued fractions algorithm terminates after a finite number
of ‘split and invert’ steps for any rational number, since the numerators which
appear (31, 5, 3, 2, 1 in the example) are strictly decreasing. How quickly does this
termination occur? It turns out that if ϕ = s/r is a rational number, and s and r
are L bit integers, then the continued fraction expansion for ϕ can be computed
using O(L3 ) operations – O(L) ‘split and invert’ steps, each using O(L2 ) gates for
elementary arithmetic.
Applications: order-finding and factoring
231
r/2 log r, and thus the chance that s is prime (and therefore, co-prime to r) is at least
1/2 log(r) > 1/2 log(N ). Thus, repeating the algorithm 2 log(N ) times we will, with
high probability, observe a phase s/r such that s and r are co-prime, and therefore the
continued fractions algorithm produces r, as desired.
A second method is to note that if r′ = r, then r′ is guaranteed to be a factor of r,
unless s = 0, which possibility occurs with probability 1/r ≤ 1/2, and which can be
′
discounted further by a few repetitions. Suppose we replace a by a′ ≡ ar (mod N ). Then
the order of a′ is r/r′ . We can now repeat the algorithm, and try to compute the order
of a′ , which, if we succeed, allows us to compute the order of a, since r = r′ × r/r′ .
If we fail, then we obtain r′′ which is a factor of r/r′ , and we now try to compute the
′′
order of a′′ ≡ (a′ )r (mod N ). We iterate this procedure until we determine the order of
a. At most log(r) = O(L) iterations are required, since each repetition reduces the order
′′
of the current candidate a ... by a factor of at least two.
The third method is better than the first two methods, in that it requires only a
constant number of trials, rather than O(L) repetitions. The idea is to repeat the phase
estimation-continued fractions procedure twice, obtaining r1′ , s′1 the first time, and r2′ , s′2
the second time. Provided s′1 and s′2 have no common factors, r may be extracted by
taking the least common multiple of r1 and r2 . The probability that s′1 and s′2 have no
common factors is given by
1−
p(q|s′1 )p(q|s′2 ) ,
(5.56)
q
where the sum is over all prime numbers q, and p(x|y) here means the probability of x
dividing y. If q divides s′1 then it must also divide the true value of s, s1 , on the first
iteration, so to upper bound p(q|s′1 ) it suffices to upper bound p(q|s1 ), where s1 is chosen
uniformly at random from 0 through r − 1. It is easy to see that p(q|s1 ) ≤ 1/q, and thus
p(q|s′1 ) ≤ 1/q. Similarly, p(q|s′2 ) ≤ 1/q, and thus the probability that s′1 and s′2 have no
common factors satisfies
1
p(q|s′1 )p(q|s′2 ) ≥ 1 −
1−
.
(5.57)
q2
q
q
The right hand side can be upper bounded in a number of ways; a simple technique is
provided in Exercise 5.16, which gives
1−
q
p(q|s′1 )p(q|s′2 ) ≥
1
,
4
(5.58)
and thus the probability of obtaining the correct r is at least 1/4.
Exercise 5.15: Show that the least common multiple of positive integers x and y is
xy/ gcd(x, y), and thus may be computed in O(L2 ) operations if x and y are L
bit numbers.
7 x+1
Exercise 5.16: For all x ≥ 2 prove that x 1/y 2 dy ≥ 2/3x2 . Show that
3 ∞ 1
3
1
≤
dy = ,
(5.59)
2
2
q
2 2 y
4
q
and thus that (5.58) holds.
232
The quantum Fourier transform and its applications
What resource requirements does this algorithm consume? The Hadamard transform
requires O(L) gates, and the inverse Fourier transform requires O(L2 ) gates. The major
cost in the quantum circuit proper actually comes from the modular exponentiation,
which uses O(L3 ) gates, for a total of O(L3 ) gates in the quantum circuit proper. The
continued fractions algorithm adds O(L3 ) more gates, for a total of O(L3 ) gates to obtain
r′ . Using the third method for obtaining r from r′ we need only repeat this procedure a
constant number of times to obtain the order, r, for a total cost of O(L3 ). The algorithm
is summarized below.
Algorithm: Quantum order-finding
Inputs: (1) A black box Ux,N which performs the transformation
|j|k → |j|x2j k mod N ,3 for x co-prime to the L-bit number N , (2)
t = 2L + 1 + log 2 + 2ǫ1 qubits initialized to |0, and (3) L qubits initialized
to the state |1.
Outputs: The least integer r > 0 such that xr = 1 (mod N ).
Runtime: O(L3 ) operations. Succeeds with probability O(1).
Procedure:
1.
2.
3.
|0|1
1
→√
2t
1
→√
2t
initial state
2t −1
j=0
2t −1
j=0
1
≈√
r2t
4.
5.
6.
1
→√
r
8
→ s/r
→r
r−1
s=0
|j|1
create superposition
|j|xj mod N
apply Ux,N
r−1 2t −1
s=0 j=0
8
|s/r|u
s
e2πisj/r |j|us
apply inverse Fourier transform to first
register
measure first register
apply continued fractions
algorithm
5.3.2 Application: factoring
The problem of distinguishing prime numbers from composites, and of resolving
composite numbers into their prime factors, is one of the most important and
useful in all of arithmetic. [ . . . ] The dignity of science seems to demand that
every aid to the solution of such an elegant and celebrated problem be zealously
cultivated.
– Carl Friedrich Gauss, as quoted by Donald Knuth
Given a positive composite integer N , what prime numbers when multiplied together
equal it? This factoring problem turns out to be equivalent to the order-finding problem
233
Applications: order-finding and factoring
we just studied, in the sense that a fast algorithm for order-finding can easily be turned
into a fast algorithm for factoring. In this section we explain the method used to reduce
factoring to order-finding, and give a simple example of this reduction.
The reduction of factoring to order-finding proceeds in two basic steps. The first
step is to show that we can compute a factor of N if we can find a non-trivial solution
x = ± 1(mod N ) to the equation x2 = 1(mod N ). The second step is to show that a
randomly chosen y co-prime to N is quite likely to have an order r which is even, and
such that y r/2 = ± 1(mod N ), and thus x ≡ y r/2 (mod N ) is a non-trivial solution to
x2 = 1(mod N ). These two steps are embodied in the following theorems, whose proofs
may be found in Section A4.3 of Appendix 4.
Theorem 5.2: Suppose N is an L bit composite number, and x is a non-trivial solution
to the equation x2 = 1(mod N ) in the range 1 ≤ x ≤ N , that is, neither
x = 1(mod N ) nor x = N − 1 = −1(mod N ). Then at least one of
gcd(x − 1, N ) and gcd(x + 1, N ) is a non-trivial factor of N that can be
computed using O(L3 ) operations.
Theorem 5.3: Suppose N = pα1 1 . . . pαmm is the prime factorization of an odd composite
positive integer. Let x be an integer chosen uniformly at random, subject to the
requirements that 1 ≤ x ≤ N − 1 and x is co-prime to N . Let r be the order of
x modulo N . Then
p(r is even and xr/2 = − 1(mod N )) ≥ 1 −
1
.
2m
(5.60)
Theorems 5.2 and 5.3 can be combined to give an algorithm which, with high probability, returns a non-trivial factor of any composite N . All the steps in the algorithm
can be performed efficiently on a classical computer except (so far as is known today) an
order-finding ‘subroutine’ which is used by the algorithm. By repeating the procedure
we may find a complete prime factorization of N . The algorithm is summarized below.
Algorithm: Reduction of factoring to order-finding
Inputs: A composite number N
Outputs: A non-trivial factor of N .
Runtime: O((log N )3 ) operations. Succeeds with probability O(1).
Procedure:
1.
If N is even, return the factor 2.
2.
Determine whether N = ab for integers a ≥ 1 and b ≥ 2, and if so
return the factor a (uses the classical algorithm of Exercise 5.17).
3.
Randomly choose x in the range 1 to N −1. If gcd(x, N ) > 1 then return
the factor gcd(x, N ).
4.
Use the order-finding subroutine to find the order r of x modulo N .
234
The quantum Fourier transform and its applications
5.
If r is even and xr/2 = − 1(mod N ) then compute gcd(xr/2 − 1, N ) and
gcd(xr/2 + 1, N ), and test to see if one of these is a non-trivial factor,
returning that factor if so. Otherwise, the algorithm fails.
Steps 1 and 2 of the algorithm either return a factor, or else ensure that N is an
odd integer with more than one prime factor. These steps may be performed using
O(1) and O(L3 ) operations, respectively. Step 3 either returns a factor, or produces
a randomly chosen element x of {0, 1, 2, . . . , N − 1}. Step 4 calls the order-finding
subroutine, computing the order r of x modulo N . Step 5 completes the algorithm,
since Theorem 5.3 guarantees that with probability at least one-half r will be even and
xr/2 = − 1(mod N ), and then Theorem 5.2 guarantees that either gcd(xr/2 − 1, N ) or
gcd(xr/2 + 1, N ) is a non-trivial factor of N . An example illustrating the use of this
algorithm with the quantum order-finding subroutine is shown in Box 5.4.
Exercise 5.17: Suppose N is L bits long. The aim of this exercise is to find an
efficient classical algorithm to determine whether N = ab for some integers
a ≥ 1 and b ≥ 2. This may be done as follows:
(1) Show that b, if it exists, satisfies b ≤ L.
(2) Show that it takes at most O(L2 ) operations to compute log2 N , x = y/b for
b ≤ L, and the two integers u1 and u2 nearest to 2x .
(3) Show that it takes at most O(L2 ) operations to compute ub1 and ub2 (use
repeated squaring) and check to see if either is equal to N .
(4) Combine the previous results to give an O(L3 ) operation algorithm to
determine whether N = ab for integers a and b.
Exercise 5.18: (Factoring 91) Suppose we wish to factor N = 91. Confirm that
steps 1 and 2 are passed. For step 3, suppose we choose x = 4, which is co-prime
to 91. Compute the order r of x with respect to N , and show that
xr/2 mod 91 = 64 = − 1(mod 91), so the algorithm succeeds, giving
gcd(64 − 1, 19) = 7.
It is unlikely that this is the most efficient method you’ve seen for factoring 91.
Indeed, if all computations had to be carried out on a classical computer, this
reduction would not result in an efficient factoring algorithm, as no efficient
method is known for solving the order-finding problem on a classical computer.
Exercise 5.19: Show that N = 15 is the smallest number for which the order-finding
subroutine is required, that is, it is the smallest composite number that is not
even or a power of some smaller integer.
5.4 General applications of the quantum Fourier transform
The main applications of the quantum Fourier transform we have described so far in
this chapter are phase estimation and order-finding. What other problems can be solved
with these techniques? In this section, we define a very general problem known as the
hidden subgroup problem, and describe an efficient quantum algorithm for solving it. This
problem, which encompasses all known ‘exponentially fast’ applications of the quantum
Fourier transform, can be thought of as a generalization of the task of finding the unknown
period of a periodic function, in a context where the structure of the domain and range
235
General applications of the quantum Fourier transform
Box 5.4: Factoring 15 quantum-mechanically
The use of order-finding, phase estimation, and continued fraction expansions in
the quantum factoring algorithm is illustrated by applying it to factor N = 15.
First, we choose a random number which has no common factors with N ; suppose
we choose x = 7. Next, we compute the order r of x with respect to N , using the
quantum order-finding algorithm: begin with the state |0|0 and create the state
1
√
2t
2t −1
k=0
1
|k|0 = √
|0 + |1 + |2 + · · · + |2t − 1 |0
t
2
(5.61)
by applying t = 11 Hadamard transforms to the first register. Choosing this value
of t ensures an error probability ǫ of at most 1/4. Next, compute f (k) = xk mod N ,
leaving the result in the second register,
1
√
2t
2t −1
k=0
|k|xk mod N
(5.62)
1
|0|1 + |1|7 + |2|4 + |3|13 + |4|1 + |5|7 + |6|4 + · · · .
=√
2t
We now apply the inverse Fourier transform F T † to the first register and measure
it. One way of analyzing the distribution of outcomes obtained is to calculate the
reduced density matrix for the first register, and apply F T † to it, and calculate the
measurement statistics. However, since no further operation is applied to the second
register, we can instead apply the principle of implicit measurement (Section 4.4)
and assume that the second register is measured, obtaining a random result from 1,
7, 4, or 13. Suppose we get
4 (any
of the results works); this means the state input
4
†
to F T would have been 2t |2 + |6 + |10 + |14 + · · · . After applying F T †
we obtain some state ℓ αℓ |ℓ, with the probability distribution
0.25
|αℓ|2
0.2
0.15
0.1
0.05
0
200
400
600
800
1000
1200
1400
1600
1800
2000
ℓ
shown for 2t = 2048. The final measurement therefore gives either 0, 512, 1024,
or 1536, each with probability almost exactly 1/4. Suppose we obtain ℓ = 1536
from the measurement; computing the continued fraction expansion thus gives
1536/2048 = 1/(1 + (1/3)), so that 3/4 occurs as a convergent in the expansion, giving r = 4 as the order of x = 7. By chance, r is even, and moreover,
xr/2 mod N = 72 mod 15 = 4 = − 1 mod 15, so the algorithm works: computing
the greatest common divisor gcd(x2 − 1, 15) = 3 and gcd(x2 + 1, 15) = 5 tells us
that 15 = 3×5.
236
The quantum Fourier transform and its applications
of the function may be very intricate. In order to present this problem in the most
approachable manner, we begin with two more specific applications: period-finding (of a
one-dimensional function), and discrete logarithms. We then return to the general hidden
subgroup problem. Note that the presentation in this section is rather more schematic
and conceptual than earlier sections in this chapter; of necessity, this means that the
reader interested in understanding all the details will have to work much harder!
5.4.1 Period-finding
Consider the following problem. Suppose f is a periodic function producing a single
bit as output and such that f (x + r) = f (x), for some unknown 0 < r < 2L , where
x, r ∈ {0, 1, 2, . . .}. Given a quantum black box U which performs the unitary transform U |x|y → |x|y ⊕ f (x) (where ⊕ denotes addition modulo 2) how many black
box queries and other operations are required to determine r? Note that in practice U
operates on a finite domain, whose size is determined by the desired accuracy for r. Here
is a quantum algorithm which solves this problem using one query, and O(L2 ) other
operations:
Algorithm: Period-finding
Inputs: (1) A black box which performs the operation U |x|y = |x|y ⊕ f (x),
(2) a state to store the function evaluation, initialized to |0, and (3)
t = O(L + log(1/ǫ)) qubits initialized to |0.
Outputs: The least integer r > 0 such that f (x + r) = f (x).
Runtime: One use of U , and O(L2 ) operations. Succeeds with probability O(1).
Procedure:
1.
2.
3.
|0|0
1
→√
2t
1
→√
2t
initial state
2t −1
x=0
2t −1
x=0
1
≈√
r2t
4.
5.
6.
1
→√
r
8
→ ℓ/r
→r
r−1
ℓ=0
|x|0
create superposition
|x|f (x)
apply U
r−1 2t −1
ℓ=0 x=0
e2πiℓx/r |x|fˆ(ℓ)
8 fˆ(ℓ)
|ℓ/r|
apply inverse Fourier transform to first
register
measure first register
apply continued fractions
algorithm
The key to understanding this algorithm, which is based on phase estimation, and
is nearly identical to the algorithm for quantum order-finding, is step 3, in which we
introduce the state
r−1
1
e−2πiℓx/r |f (x) ,
(5.63)
|fˆ(ℓ) ≡ √
r x=0
237
General applications of the quantum Fourier transform
the Fourier transform of |f (x). The identity used in step 3 is based on
1
|f (x) = √
r
r−1
ℓ=0
e2πiℓx/r |fˆ(ℓ) ,
(5.64)
r−1 2πiℓx/r
e
= r for x an integer multiple of r,
which is easy to verify by noting that ℓ=0
and zero otherwise. The approximate equality in step 3 is required because 2t may not be
an integer multiple of r in general (it need not be: this is taken account of by the phase
estimation bounds). By Equation (5.22), applying the inverse Fourier transform to the
first register, in step 4, gives an estimate of the phase ℓ/r, where ℓ is chosen randomly.
r can be efficiently obtained in the final step using a continued fraction expansion.
Box 5.5: The shift-invariance property of the Fourier transform
The Fourier transform, Equation (5.1), has an interesting and very useful property,
known as shift invariance. Using notation which is useful in describing the general
application of this property, let us describe the quantum Fourier transform as
h∈H
αh |h →
g∈G
α̃g |g ,
(5.65)
where α̃g = h∈H αh exp(2πigh/|G|), H is some subset of G, and G indexes the
states in an orthonormal basis of the Hilbert space. For example, G may be the set
of numbers from 0 to 2n − 1 for an n qubit system. |G| denotes the number of
elements in G. Suppose we apply to the initial state an operator Uk which performs
the unitary transform
Uk |g = |g + k ,
(5.66)
then apply the Fourier transform. The result,
Uk
h∈H
αh |h =
h∈H
αh |h + k →
g∈G
e2πigk/|G| α̃g |g
(5.67)
has the property that the magnitude of the amplitude for |g does not change, no
matter what k is, that is: | exp(2πigk/|G|)α̃g | = |α̃g |.
In the language of group theory, G is a group, H a subgroup of G, and we say that
if a function f on G is constant on cosets of H, then the Fourier transform of f is
invariant over cosets of H.
Why does this work? One way to understand this is to realize that (5.63) is approximately the Fourier transform over {0, 1, . . . , 2L − 1} of |f (x) (see Exercise 5.20), and
the Fourier transform has an interesting and very useful property, known as shift invariance, described in Box 5.5. Another is to realize that what the order-finding algorithm
does is just to find the period of the function f (k) = xk mod N , so the ability to find the
period of a general periodic function is not unexpected. Yet another way is to realize that
the implementation of the black box U is naturally done using a certain unitary operator
whose eigenvectors are precisely |fˆ(ℓ), as described in Exercise 5.21 below, so that the
phase estimation procedure of Section 5.2 can be applied.
238
The quantum Fourier transform and its applications
Exercise 5.20: Suppose f (x + r) = f (x), and 0 ≤ x < N , for N an integer multiple
of r. Compute
1
fˆ(ℓ) ≡ √
N
N −1
e−2πiℓx/N f (x) ,
(5.68)
x=0
and relate the result to (5.63). You will need to use the fact that
*
N/r
if ℓ is an integer multiple of N/r
2πikℓ/N
e
=
0
otherwise.
k∈{0,r,2r,...,N −r}
(5.69)
Exercise 5.21: (Period-finding and phase estimation) Suppose you are given a
unitary operator Uy which performs the transformation Uy |f (x) = |f (x + y),
for the periodic function described above.
(1) Show that the eigenvectors of Uy are |fˆ(ℓ), and calculate their eigenvalues.
(2) Show that given |f (x0 ) for some x0 , Uy can be used to realize a black box
which is as useful as U in solving the period-finding problem.
5.4.2 Discrete logarithms
The period finding problem we just considered is a simple one, in that the domain and
range of the periodic function were integers. What happens when the function is more
complex? Consider the function f (x1 , x2 ) = asx1 +x2 mod N , where all the variables
are integers, and r is the smallest positive integer for which ar mod N = 1. This
function is periodic, since f (x1 + ℓ, x2 − ℓs) = f (x1 , x2 ), but now the period is a 2-tuple,
(ℓ, −ℓs), for integer ℓ. This may seem to be a strange function, but it is very useful
in cryptography, since determining s allows one to solve what is known as the discrete
logarithm problem: given a and b = as , what is s? Here is a quantum algorithm which
solves this problem using one query of a quantum black box U which performs the unitary
transform U |x1 |x2 |y → |x1 |x2 |y ⊕ f (x) (where ⊕ denotes bitwise addition modulo
2), and O(⌈log r⌉2 ) other operations. We assume knowledge of the minimum r > 0 such
that ar mod N = 1, which can be obtained using the order-finding algorithm described
previously.
Algorithm: Discrete logarithm
Inputs: (1) A black box which performs the operation
U |x1 |x2 |y = |x1 |x2 |y ⊕ f (x1 , x2 ), for f (x1 , x2 ) = bx1 ax2 , (2) a state to store
the function evaluation, initialized to |0, and (3) two t = O(⌈log r⌉ + log(1/ǫ))
qubit registers initialized to |0.
Outputs: The least positive integer s such that as = b.
Runtime: One use of U , and O(⌈log r⌉2 ) operations. Succeeds with probability
O(1).
Procedure:
1.
|0|0|0
initial state
239
General applications of the quantum Fourier transform
2.
3.
1
→ t
2
→
1
2t
2t −1 2t −1
x1 =0 x2 =0
2t −1 2t −1
x1 =0 x2 =0
1
≈ t√
2 r
1
= t√
2 r
5.
6.
create superposition
|x1 |x2 |f (x1 , x2 )
apply U
r−1 2t −1 2t −1
ℓ2 =0 x1 =0 x2 =0
r−1
ℓ2 =0
r−1
4.
|x1 |x2 |0
⎡
2t −1
⎣
x1 =0
e2πi(sℓ2 x1 +ℓ2 x2 )/r |x1 |x2 |fˆ(sℓ2 , ℓ2 )
⎤⎡
2t −1
e2πi(sℓ2 x1 )/r |x1 ⎣
x2 =0
1
8 ˆ
|sℓ8
→√
2 /r|ℓ2 /r|f (sℓ2 , ℓ2 )
r ℓ =0
2
(
'
8
→ sℓ8
2 /r, ℓ2 /r
→s
⎤
e2πi(ℓ2 x2 )/r |x2 |fˆ(sℓ2 , ℓ2 )
apply inverse Fourier transform to first
two registers
measure first two registers
apply generalized continued
fractions algorithm
Again, the key to understanding this algorithm is step 3, in which we introduce the
state
1
|fˆ(ℓ1 , ℓ2 ) = √
r
r−1
j=0
e−2πiℓ2 j/r |f (0, j) ,
(5.70)
the Fourier transform of |f (x1 , x2 ) (see Exercise 5.22). In this equation, the values of ℓ1
and ℓ2 must satisfy
r−1
e2πik(ℓ1 /s−ℓ2 )/r = r .
(5.71)
k=0
Otherwise, the amplitude of |fˆ(ℓ1 , ℓ2 ) is nearly zero. The generalized continued fraction
expansion used in the final step to determine s is analogous to the procedures used in
Section 5.3.1, and is left as a simple exercise for you to construct.
Exercise 5.22: Show that
r−1 r−1
|fˆ(ℓ1 , ℓ2 ) =
1
e−2πi(ℓ1 x1 +ℓ2 x2 )/r |f (x1 , x2 ) = √
r
x =0 x =0
1
2
r−1
j=0
e−2πiℓ2 j/r |f (0, j) ,
(5.72)
and we are constrained to have ℓ1 /s − ℓ2 be an integer multiple of r for this
expression to be non-zero.
Exercise 5.23: Compute
1
r
r−1 r−1
ℓ1 =0 ℓ2 =0
e−2πi(ℓ1 x1 +ℓ2 x2 )/r |fˆ(ℓ1 , ℓ2 )
(5.73)
using (5.70), and show that the result is f (x1 , x2 ).
Exercise 5.24: Construct the generalized continued fractions algorithm needed in
240
The quantum Fourier transform and its applications
step 6 of the discrete logarithm algorithm to determine s from estimates of sℓ2 /r
and ℓ2 /r.
Exercise 5.25: Construct a quantum circuit for the black box U used in the quantum
discrete logarithm algorithm, which takes a and b as parameters, and performs
the unitary transform |x1 |x2 |y → |x1 |x2 |y ⊕ bx1 ax2 . How many elementary
operations are required?
5.4.3 The hidden subgroup problem
By now, a pattern should be coming clear: if we are given a periodic function, even when
the structure of the periodicity is quite complicated, we can often use a quantum algorithm
to determine the period efficiently. Importantly, however, not all periods of periodic
functions can be determined. The general problem which defines a broad framework
for these questions can be succinctly expressed in the language of group theory (see
Appendix 2 for a quick review) as follows:
Let f be a function from a finitely generated group G to a finite set X such that
f is constant on the cosets of a subgroup K, and distinct on each coset. Given a
quantum black box for performing the unitary transform U |g|h = |g|h⊕f (g),
for g ∈ G, h ∈ X, and ⊕ an appropriately chosen binary operation on X, find a
generating set for K.
Order-finding, period-finding, discrete logarithms, and many other problems are instances of this hidden subgroup problem; some interesting ones are listed in Figure 5.5.
For G a finite Abelian group, a quantum computer can solve the hidden subgroup
problem using a number of operations polynomial in log |G|, and one use of the black
box function evaluation, using an algorithm very similar to the others in this section.
(In fact, solution for a finitely generated Abelian group is also possible, along similar
lines, but we’ll stick to the finite case here.) We shall leave detailed specification of the
algorithm to you as an exercise, which should be simple after we explain the basic idea.
Many things remain essentially the same, because finite Abelian groups are isomorphic
to products of additive groups over the integers in modular arithmetic. This means that
the quantum Fourier transform of f over G is well defined (see Section A2.3), and can
still be done efficiently. The first non-trivial step of the algorithm is to use a Fourier
transform (generalizing the Hadamard operation) to create a superposition over group
elements, which is then transformed by applying the quantum black box for f in the next
step, to give
1
|g|f (g) .
(5.74)
|G| g∈G
As before, we would now like to rewrite |f (g) in the Fourier basis. We start with
1
|f (g) =
|G|
|G|−1
ℓ=0
e2πiℓg/|G| |fˆ(ℓ) ,
(5.75)
where we have chosen exp[−2πiℓg/|G|] as a representation (see Exercise A2.13) of g ∈ G
indexed by ℓ (the Fourier transform maps between group elements and representations:
see Exercise A2.23). The key is to recognize that this expression can be simplified because
241
General applications of the quantum Fourier transform
Name
G
X
K
Function
*
f (x) = 0
K = {0, 1} :
f (x) = 1
*
f (x) = x
K = {0} :
f (x) = 1 − x
Deutsch
{0, 1}, ⊕
{0, 1}
{0} or {0, 1}
Simon
{0, 1}n , ⊕
any
finite
set
{0, s}
s ∈ {0, 1}n
f (x ⊕ s) = f (x)
Periodfinding
Z, +
any
finite
set
{0, r, 2r, . . .}
r∈G
f (x + r) = f (x)
Orderfinding
Z, +
{aj }
j ∈ Zr
ar = 1
{0, r, 2r, . . .}
r∈G
Discrete
logarithm
Zr × Zr
+ (mod r)
{aj }
j ∈ Zr
ar = 1
(ℓ, −ℓs)
ℓ, s ∈ Zr
Order of a
permutation
Z2m × Z2n
+ (mod 2m )
Z2n
{0, r, 2r, . . .}
r∈X
Hidden
linear
function
Z × Z, +
ZN
(ℓ, −ℓs)
ℓ, s ∈ X
Abelian
stabilizer
(H, X)
H = any
Abelian
group
any
finite
set
{s ∈ H |
f (s, x) = x,
∀x ∈ X}
f (x) = ax
f (x + r) = f (x)
f (x1 , x2 ) = akx1 +x2
f (x1 + ℓ, x2 − ℓs) = f (x1 , x2 )
f (x, y) = π x (y)
f (x + r, y) = f (x, y)
π = permutation on X
f (x1 , x2 ) =
π(sx1 + x2 mod N )
π = permutation on X
f (gh, x) = f (g, f (h, x))
f (gs, x) = f (g, x)
Figure 5.5. Hidden subgroup problems. The function f maps from the group G to the finite set X, and is
promised to be constant on cosets of the hidden subgroup K ⊆ G. ZN represents the set {0, 1, . . . , N − 1} in
this table, and Z is the integers. The problem is to find K (or a generating set for it), given a black box for f .
f is constant and distinct on cosets of the subgroup K, so that
1
e−2πiℓg/|G| |f (g)
|fˆ(ℓ) =
|G| g∈G
(5.76)
has nearly zero amplitude for all values of ℓ except those which satisfy
h∈K
e−2πiℓh/|G| = |K| .
(5.77)
242
The quantum Fourier transform and its applications
If we can determine ℓ, then using the linear constraints given by this expression allows
us to determine elements of K, and since K is Abelian, this allows us to eventually
determine a generating set for the whole hidden subgroup, solving the problem.
However, life is not so simple. An important reason why the period-finding and discrete
logarithm algorithms work is because of the success of the continued fraction expansion
in obtaining ℓ from ℓ/|G|. In those problems, ℓ and |G| are arranged to not have any
common factors, with high probability. In the general case, however, this may not be
true, since |G| is free to be a composite number with many factors, and we have no
useful prior information about ℓ.
Fortunately, this problem can be solved: as mentioned above, any finite Abelian group
G is isomorphic to a product of cyclic groups of prime power order, that is, G = Zp1 ×
Zp2 ×· · ·×ZpM , where pi are primes, and Zpi is the group over integers {0, 1, . . . , pi −1}
with addition modulo pi being the group operation. We can thus re-express the phase
which appears in (5.75) as
e2πiℓg/|G| =
M
/
′
e2πiℓi gi /pi
(5.78)
i=1
for gi ∈ Zpi . The phase estimation procedure now gives us ℓ′i , from which we determine
ℓ, and thus, sample K as described above, to solve the hidden subgroup problem.
Exercise 5.26: Since K is a subgroup of G, when we decompose G into a product of
cyclic groups of prime power order, this also decomposes K. Re-express (5.77)
to show that determining ℓ′i allows one to sample from the corresponding cyclic
subgroup Kpi of K.
Exercise 5.27: Of course, the decomposition of a general finite Abelian group G into a
product of cyclic groups of prime power order is usually a difficult problem (at
least as hard as factoring integers, for example). Here, quantum algorithms come
to the rescue again: explain how the algorithms in this chapter can be used to
efficiently decompose G as desired.
Exercise 5.28: Write out a detailed specification of the quantum algorithm to solve
the hidden subgroup problem, complete with runtime and success probability
estimates, for finite Abelian groups.
Exercise 5.29: Give quantum algorithms to solve the Deutsch and Simon problems
listed in Figure 5.5, using the framework of the hidden subgroup problem.
5.4.4 Other quantum algorithms?
One of the most intriguing aspects of this framework for describing quantum algorithms
in terms of the hidden subgroup problem is the suggestion that more difficult problems might be solvable by considering various groups G and functions f . We have only
described the solution of this problem for Abelian groups. What about non-Abelian
groups? They are quite interesting (see Appendix 2 for a discussion of general Fourier
transforms over non-Abelian groups): for example, the problem of graph isomorphism is
to determine if two given graphs are the same under some permutation of the labels of
the n vertices (Section 3.2.3). These permutations can be described as transformations
under the symmetric group Sn , and algorithms for performing fast Fourier transforms
Chapter problems
243
over these groups exists. However, a quantum algorithm for efficiently solving the graph
isomporphism problem remains unknown.
Even if more general cases of the hidden subgroup problem remain unsolvable by
quantum computers, having this unifying framework is useful, because it allows us to
ask questions about how one might be able to step outside its limitations. It is difficult
to believe that all fast quantum algorithms that will ever be discovered will be just ways
to solve the hidden subgroup problem. If one thinks of these problems as being based on
the coset invariance property of the Fourier transform, in searching for new algorithms,
perhaps the thing to do then is to investigate other transforms with different invariances.
Going in another direction, one might ask: what difficult hidden subgroup problems
might be efficiently solved given an arbitrary (but specified independently of the problem)
quantum state as a helper? After all, as discussed in Chapter 4, most quantum states are
actually exponentially hard to construct. Such a state might be a useful resource (a real
‘quantum oracle’), if quantum algorithms existed to utilize them to solve hard problems!
The hidden subgroup problem also captures an important constraint underlying the
class of quantum algorithms which are exponentially faster than their (known) classical
counterparts: this is a promise problem, meaning that it is of the form ‘F (X) is promised
to have such and such property: characterize that property.’ Rather disappointingly,
perhaps, we shall show at the end of the next chapter that, in solving problems without
some sort of promise, quantum computers cannot achieve an exponential speedup over
classical computers; the best speedup is polynomial. On the other hand, this gives us an
important clue as to what kinds of problems quantum computers might be good at: in
retrospect, the hidden subgroup problem might be thought of as a natural candidate for
quantum computation. What other natural problems are there? Think about it!
Problem 5.1: Construct a quantum circuit to perform the quantum Fourier transform
1
|j −→ √
p
p−1
k=0
e2πijk/p |k
(5.79)
where p is prime.
Problem 5.2: (Measured quantum Fourier transform) Suppose the quantum
Fourier transform is performed as the last step of a quantum computation,
followed by a measurement in the computational basis. Show that the
combination of quantum Fourier transform and measurement is equivalent to a
circuit consisting entirely of one qubit gates and measurement, with classical
control, and no two qubit gates. You may find the discussion of Section 4.4
useful.
Problem 5.3: (Kitaev’s algorithm) Consider the quantum circuit
❴✤✤ ❴ ❴ ❴ ❴ ❴ L ❴ ❴ ✤✤
✙
✤✤
✤✤
✙✙
✤✤
✤✤
✙✙
❴✤ ❴ ❴ ❴ ❴ ✙ ❴ ❴ ❴ ✤
where |u is an eigenstate of U with eigenvalue e2πiϕ . Show that the top qubit is
244
The quantum Fourier transform and its applications
measured to be 0 with probability p ≡ cos2 (πϕ). Since the state |u is unaffected
by the circuit it may be reused; if U can be replaced by U k , where k is an
arbitrary integer under your control, show that by repeating this circuit and
increasing k appropriately, you can efficiently obtain as many bits of p as desired,
and thus, of ϕ. This is an alternative to the phase estimation algorithm.
Problem 5.4: The runtime bound O(L3 ) we have given for the factoring algorithm is
not tight. Show that a better upper bound of O(L2 log L log log L) operations can
be achieved.
Problem 5.5: (Non-Abelian hidden subgroups – Research) Let f be a function
on a finite group G to an arbitrary finite range X, which is promised to be
constant and distinct on distinct left cosets of a subgroup K. Start with the state
1
|G|m
g1 ,...,gm
|g1 , . . . , gm |f (g1 ), . . . , f (gm ) ,
(5.80)
and prove that picking m = 4 log |G| + 2 allows K to be identified with
probability at least 1 − 1/|G|. Note that G does not necessarily have to be
Abelian, and being able to perform a Fourier transform over G is not required.
This result shows that one can produce (using only O(log |G|) oracle calls) a final
result in which the pure state outcomes corresponding to different possible
hidden subgroups are nearly orthogonal. However, it is unknown whether a
POVM exists or not which allows the hidden subgroup to be identified
efficiently (i.e. using poly(log |G|) operations) from this final state.
Problem 5.6: (Addition by Fourier transforms) Consider the task of constructing
a quantum circuit to compute |x → |x + y mod 2n , where y is a fixed constant,
and 0 ≤ x < 2n . Show that one efficient way to do this, for values of y such as
1, is to first perform a quantum Fourier transform, then to apply single qubit
phase shifts, then an inverse Fourier transform. What values of y can be added
easily this way, and how many operations are required?
245
History and further reading
Summary of Chapter 5: The quantum Fourier transform and its
applications
• When N = 2n the quantum Fourier transform
1
|j = |j1 , . . . , jn −→ √
N
N −1
k=0
jk
e2πi N |k
(5.81)
may be written in the form
1
|0 + e2πi0.jn−1 jn |1 . . . |0 + e2πi0.j1 j2 ...jn |1 ,
(5.82)
and may be implemented using Θ(n2 ) gates.
|j →
2n/2
|0 + e2πi0.jn |1
• Phase estimation: Let |u be an eigenstate of the operator U with eigenvalue
e2πiϕ . Starting from the initial state |0⊗t |u, and given the ability to efficiently
k
perform U 2 for integer k, this algorithm (shown in Figure 5.3) can be used
2to efficiently
3obtain the state |ϕ̃|u, where ϕ̃ accurately approximates ϕ to t −
log 2 + 2ǫ1 bits with probability at least 1 − ǫ.
• Order-finding: The order of x modulo N is the least positive integer r such that
xr mod N = 1. This number can be computed in O(L3 ) operations using the
quantum phase estimation algorithm, for L-bit integers x and N .
• Factoring: The prime factors of an L-bit integer N can be determined in O(L3 )
operations by reducing this problem to finding the order of a random number x
co-prime with N .
• Hidden subgroup problem: All the known fast quantum algorithms can be
described as solving the following problem: Let f be a function from a finitely
generated group G to a finite set X such that f is constant on the cosets of a
subgroup K, and distinct on each coset. Given a quantum black box for performing
the unitary transform U |g|h = |g|h ⊕ f (g), for g ∈ G and h ∈ X, find a
generating set for K.
History and further reading
The definition of the Fourier transform may be generalized beyond what we have considered in this chapter. In the general scenario a Fourier transform is defined on a set
of complex numbers αg , where the index g is chosen from some group, G. In this
chapter we have chosen G to be the additive group of integers modulo 2n , often denoted Z2n . Deutsch[Deu85] showed that the Fourier transform over the group Zn2 could
be implemented efficiently on a quantum computer – this is the Hadamard transform
of earlier chapters. Shor [Sho94] realized to spectacular effect that quantum computers
could efficiently implement the quantum Fourier transform over groups Zm for certain
special values of m. Inspired by this result Coppersmith[Cop94], Deutsch (unpublished),
and Cleve (unpublished) gave the simple quantum circuits for computing the quantum
Fourier transform over Z2n which we have used in this chapter. Cleve, Ekert, Mac-
246
The quantum Fourier transform and its applications
chiavello and Mosca[CEMM98] and Griffiths and Niu[GN96] independently discovered the
product formula (5.4); in fact, this result had been realized much earlier by Danielson
and Lanczos. The simplified proof starting in Equation (5.5) was suggested by Zhou.
Griffiths and Niu[GN96] are responsible for the measured quantum Fourier transform
found in Problem 5.2.
The Fourier transform over Z2n was generalized to obtain a Fourier transform over
an arbitrary finite Abelian group by Kitaev[Kit95], who also introduced the phase estimation procedure in the form given in Problem 5.3. Cleve, Ekert, Macchiavello and
Mosca[CEMM98] also integrated several of the techniques of Shor and Kitaev into one
nice picture, upon which Section 5.2 is based. A good description of the phase estimation
algorithm can be found in Mosca’s Ph.D. thesis[Mos99].
Shor announced the quantum order-finding algorithm in a seminal paper in 1994[Sho94],
and noted that the problems of performing discrete logarithms and factoring could be
reduced to order-finding. The final paper, including extended discussion and references,
was published in 1997[Sho97]. This paper also contains a discussion of clever multiplication methods that may be used to speed up the algorithm even further than in
our description, which uses relatively naive multiplication techniques. With these faster
multiplication methods the resources required to factor a composite integer n scale as
O(n2 log n log log n), as claimed in the introduction to the chapter. In 1995 Kitaev[Kit95]
announced an algorithm for finding the stabilizer of a general Abelian group, which he
showed could be used to solve discrete logarithm and factoring as special cases. In addition, this algorithm contained several elements not present in Shor’s algorithm. A good
review of the factoring algorithm was written by Ekert and Jozsa [EJ96]; also see DiVincenzo [DiV95a]. The discussion of continued fractions is based upon Chapter 10 of Hardy
and Wright[HW60]. At the time of writing, the most efficient classical algorithm for factoring on a classical computer is the number field sieve. This is described in a collection
edited by A. K. Lenstra and H. W. Lenstra, Jr.[LL93].
The generalization of quantum algorithms to solving the hidden subgroup problem has
been considered by many authors. Historically, Simon was first to note that a quantum
computer could find a hidden period of a function satisfying f (x⊕s) = f (x)[Sim94, Sim97].
In fact, Shor found his result by generalizing Simon’s result, and by applying a Fourier
transform over ZN instead of Simon’s Hadamard transforms (a Fourier transform over
Zk2 ). Boneh and Lipton then noted the connection to the hidden subgroup problem,
and described a quantum algorithm for solving the hidden linear function problem[BL95].
Jozsa was the first to explicitly provide a uniform description of the Deutsch–Jozsa, Simon, and Shor algorithms in terms of the hidden subgroup problem[Joz97]. Ekert and
Jozsa’s work in studying the role of the Abelian and non-Abelian Fast Fourier Transform algorithms in speedup of quantum algorithms[EJ98] has also been insightful. Our
description of the hidden subgroup problem in Section 5.4 follows the framework of
Mosca and Ekert[ME99, Mos99]. Cleve has proven that the problem of finding an order of a
permutation requires an exponential number of queries for a bounded-error probabilistic
classical computer[Cle99]. Generalizations of this method to beyond Abelian groups have
been attempted by Ettinger and Høyer[EH99], by Roetteler and Beth[RB98] and Pueschel,
Roetteler, and Beth[PRB98], by Beals, who also described constructions of quantum Fourier
+
transforms over the symmetric group[BBC 98], and by Ettinger, Høyer, and Knill[EHK99].
These results have shown, so far, that there exists a quantum algorithm to solve the
History and further reading
247
hidden subgroup problem for non-Abelian groups using only O(log |G|) oracle calls, but
whether this can be realized in polynomial time is unknown (Problem 5.5).
6 Quantum search algorithms
Suppose you are given a map containing many cities, and wish to determine the shortest
route passing through all cities on the map. A simple algorithm to find this route is to
search all possible routes through the cities, keeping a running record of which route has
the shortest length. On a classical computer, if there are N possible routes, it obviously
takes O(N ) operations to determine the shortest route using this method. Remarkably,
there is a quantum search algorithm, sometimes known as Grover’s algorithm,
which
√
enables this search method to be sped up substantially, requiring only O( N ) operations.
Moreover, the quantum search algorithm is general in the sense that it can be applied
far beyond the route-finding example just described to speed up many (though not all)
classical algorithms that use search heuristics.
In this chapter we explain the fast quantum search algorithm. The basic algorithm is
described in Section 6.1. In Section 6.2 we derive the algorithm from another point of
view, based on the quantum simulation algorithm of Section 4.7. Three important applications of this algorithm are also described: quantum counting in Section 6.3, speedup of
solution of NP-complete problems in Section 6.4, and search of unstructured databases
in Section 6.5. One might hope to improve upon the search algorithm to do even better
than a square root speedup but, as we show in Section 6.6, it turns out this is not possible.
We conclude in Section 6.7 by showing that this speed limit applies to most unstructured
problems.
6.1 The quantum search algorithm
Let us begin by setting the stage for the search algorithm in terms of an oracle, similar to
that encountered in Section 3.1.1. This allows us to present a very general description of
the search procedure, and a geometric way to visualize its action and see how it performs.
6.1.1 The oracle
Suppose we wish to search through a search space of N elements. Rather than search the
elements directly, we concentrate on the index to those elements, which is just a number
in the range 0 to N − 1. For convenience we assume N = 2n , so the index can be stored
in n bits, and that the search problem has exactly M solutions, with 1 ≤ M ≤ N . A
particular instance of the search problem can conveniently be represented by a function
f , which takes as input an integer x, in the range 0 to N − 1. By definition, f (x) = 1 if
x is a solution to the search problem, and f (x) = 0 if x is not a solution to the search
problem.
Suppose we are supplied with a quantum oracle – a black box whose internal workings
we discuss later, but which are not important at this stage – with the ability to recognize
solutions to the search problem. This recognition is signalled by making use of an oracle
The quantum search algorithm
249
qubit. More precisely, the oracle is a unitary operator, O, defined by its action on the
computational basis:
O
|x|q → |x|q ⊕ f (x) ,
(6.1)
where |x is the index register, ⊕ denotes addition modulo 2, and the oracle qubit |q is
a single qubit which is flipped if f (x) = 1, and is unchanged otherwise. We can check
whether x is a solution to our search problem by preparing |x|0, applying the oracle,
and checking to see if the oracle qubit has been flipped to |1.
In the quantum search algorithm
√ it is useful to apply the oracle with the oracle qubit
initially in the state (|0 − |1)/ 2, just as was done in the Deutsch–Jozsa algorithm of
Section 1.4.4. If√x is not a solution to the search problem, applying the oracle to the state
|x(|0 − |1)/ 2 does not change the state. On the other hand, if x is a solution to the
search problem, then |0 and
√ |1 are interchanged by the action of the oracle, giving a
final state −|x(|0 − |1)/ 2. The action of the oracle is thus:
|0 − |1
|0 − |1
O
√
√
−→ (−1)f (x) |x
.
(6.2)
|x
2
2
Notice that √
the state of the oracle qubit is not changed. It turns out that this remains
(|0 − |1)/ 2 throughout the quantum search algorithm, and can therefore be omitted
from further discussion of the algorithm, simplifying our description.
With this convention, the action of the oracle may be written:
O
|x −→ (−1)f (x) |x .
(6.3)
We say that the oracle marks the solutions to the search problem, by shifting the phase
of the solution. For an N item search
problem with M solutions, it turns out that we
need only apply the search oracle O( N/M ) times in order to obtain a solution, on a
quantum computer.
This discussion of the oracle without describing how it works in practice is rather
abstract, and perhaps even puzzling. It seems as though the oracle already knows the
answer to the search problem; what possible use could it be to have a quantum search
algorithm based upon such oracle consultations?! The answer is that there is a distinction
between knowing the solution to a search problem, and being able to recognize the
solution; the crucial point is that it is possible to do the latter without necessarily being
able to do the former.
A simple example to illustrate this is the problem of factoring. Suppose we have been
given a large number, m, and told that it is a product of two primes, p and q – the
same sort of situation as arises in trying to break the RSA public key cryptosystem
(Appendix 5). To determine p and q, the obvious method on a classical computer is to
search all numbers from 2 through m1/2 for the smaller of the two prime factors. That
is, we successively do a trial division of m by each number in the range 2 to m1/2 , until
we find the smaller prime factor. The other prime factor can then be found by dividing
m by the smaller prime. Obviously, this search-based method requires roughly m1/2 trial
divisions to find a factor on a classical computer.
The quantum search algorithm can be used to speed up this process. By definition,
the action of the oracle upon input of the state |x is to divide m by x, and check to see if
the division is exact, flipping the oracle qubit if this is so. Applying the quantum search
algorithm with this oracle yields the smaller of the two prime factors with high probability.
250
Quantum search algorithms
But to make the algorithm work, we need to construct an efficient circuit implementing
the oracle. How to do this is an exercise in the techniques of reversible computation.
We begin by defining the function f (x) ≡ 1 if x divides m, and f (x) = 0 otherwise;
f (x) tells us whether the trial division is successful or not. Using the techniques of
reversible computation discussed in Section 3.2.5, construct a classical reversible circuit
which takes (x, q) – representing an input register initially set to x and a one bit output
register initially set to q – to (x, q ⊕ f (x)), by modifying the usual (irreversible) classical
circuit for doing trial division. The resource cost of this reversible circuit is the same to
within a factor two as the irreversible classical circuit used for trial division, and therefore
we regard the two circuits as consuming essentially the same resources. Furthermore, the
classical reversible circuit can be immediately translated into a quantum circuit that takes
|x|q to |x|q ⊕ f (x), as required of the oracle. The key point is that even without
knowing the prime factors of m, we can explicitly construct an oracle which recognizes
a solution to the search problem when it sees one. Using this oracle and the quantum
search algorithm we can search the range 2 to m1/2 using O(m1/4 ) oracle consultations.
That is, we need only perform the trial division roughly m1/4 times, instead of m1/2
times, as with the classical algorithm!
The factoring example is conceptually interesting but not practical: there are classical
algorithms for factoring which work much faster than searching through all possible
divisors. However, it illustrates the general way in which the quantum search algorithm
may be applied: classical algorithms which rely on search-based techniques may be sped
up using the quantum search algorithm. Later in this chapter we examine scenarios where
the quantum search algorithm offers a genuinely useful aid in speeding up the solution
of NP-complete problems.
6.1.2 The procedure
Schematically, the search algorithm operates as shown in Figure 6.1. The algorithm
proper makes use of a single n qubit register. The internal workings of the oracle, including the possibility of it needing extra work qubits, are not important to the description
of the quantum search algorithm proper. The goal of the algorithm is to find a solution
to the search problem, using the smallest possible number of applications of the oracle.
The algorithm begins with the computer in the state |0⊗n . The Hadamard transform
is used to put the computer in the equal superposition state,
1
|ψ = 1/2
N
N −1
x=0
|x .
(6.4)
The quantum search algorithm then consists of repeated application of a quantum
subroutine, know as the Grover iteration or Grover operator, which we denote G. The
Grover iteration, whose quantum circuit is illustrated in Figure 6.2, may be broken up
into four steps:
(1) Apply the oracle O.
(2) Apply the Hadamard transform H ⊗n .
(3) Perform a conditional phase shift on the computer, with every computational basis
state except |0 receiving a phase shift of −1,
|x → −(−1)δx0 |x.
(6.5)
The quantum search algorithm
251
(4) Apply the Hadamard transform H ⊗n .
Exercise 6.1: Show that the unitary operator corresponding to the phase shift in the
Grover iteration is 2|00| − I.
Figure 6.1. Schematic circuit for the quantum search algorithm. The oracle may employ work qubits for its
implementation, but the analysis of the quantum search algorithm involves only the n qubit register.
Figure 6.2. Circuit for the Grover iteration, G.
Each of the operations in the Grover iteration may be efficiently implemented on
a quantum computer. Steps 2 and 4, the Hadamard transforms, require n = log(N )
operations each. Step 3, the conditional phase shift, may be implemented using the
techniques of Section 4.3, using O(n) gates. The cost of the oracle call depends upon
the specific application; for now, we merely need note that the Grover iteration requires
only a single oracle call. It is useful to note that the combined effect of steps 2, 3, and 4
is
H ⊗n (2|00| − I)H ⊗n = 2|ψψ| − I ,
(6.6)
where |ψ is the equally weighted superposition of states, (6.4). Thus the Grover iteration,
G, may be written G = (2|ψψ| − I)O.
Exercise 6.2: Show that the operation (2|ψψ| − I) applied to a general state
k αk |k produces
−αk + 2α |k ,
k
(6.7)
252
Quantum search algorithms
where α ≡ k αk /N is the mean value of the αk . For this reason,
(2|ψψ| − I) is sometimes referred to as the inversion about mean operation.
6.1.3 Geometric visualization
What does the Grover iteration do? We have noted that G = (2|ψψ| − I)O. In fact, we
will show that the Grover iteration can be regarded as a rotation in the two-dimensional
space spanned by the starting vector |ψ and the state consisting of a uniform superposition of solutions to the search problem. To see this it is useful to adopt the convention
that ′x indicates a sum over all x which are solutions to the search problem, and ′′x indicates a sum over all x which are not solutions to the search problem. Define normalized
states
′′
1
|x
(6.8)
|α ≡ √
N −M x
′
1
|x .
(6.9)
|β ≡ √
M x
Simple algebra shows that the initial state |ψ may be re-expressed as
)
)
N −M
M
|ψ =
|α +
|β ,
N
N
(6.10)
so the initial state of the quantum computer is in the space spanned by |α and |β.
The effect of G can be understood in a beautiful way by realizing that the oracle
operation O performs a reflection about the vector |α in the plane defined by |α and
|β. That is, O(a|α + b|β) = a|α − b|β. Similarly, 2|ψψ| − I also performs a
reflection in the plane defined by |α and |β, about the vector |ψ. And the product of
two reflections is a rotation! This tells us that the state Gk |ψ remains in the
space spanned
by |α and |β for all k. It also gives us the rotation angle. Let cos θ/2 = (N − M )/N ,
so that |ψ = cos θ/2|α + sin θ/2|β. As Figure 6.3 shows, the two reflections which
comprise G take |ψ to
3θ
3θ
|α + sin |β ,
(6.11)
2
2
so the rotation angle is in fact θ. It follows that continued application of G takes the state
to
2k + 1
2k + 1
(6.12)
θ |α + sin
θ |β .
Gk |ψ = cos
2
2
G|ψ = cos
Summarizing, G is a rotation in the two-dimensional space spanned by |α and |β,
rotating the space by θ radians per application of G. Repeated application of the Grover
iteration rotates the state vector close to |β. When this occurs, an observation in the
computational basis produces with high probability one of the outcomes superposed in
|β, that is, a solution to the search problem! An example illustrating the search algorithm
with N = 4 is given in Box 6.1.
Exercise 6.3: Show that in the |α, |β basis, we may write the Grover iteration as
cos θ − sin θ
,
(6.13)
G=
sin θ cos θ
The quantum search algorithm
253
Figure 6.3. The action of a single Grover iteration, G: the state vector is rotated by θ towards the superposition
|β of all solutions to the search problem. Initially, it is inclined at angle θ/2 from |α, a state orthogonal to |β.
An oracle operation O reflects the state about the state |α, then the operation 2|ψ ψ| − I reflects it about |ψ.
In the figure |α and |β are lengthened slightly to reduce clutter (all states should be unit vectors). After repeated
Grover iterations, the state vector gets close to |β, at which point an observation in the computational basis
outputs a solution to the search
probability. The remarkable efficiency of the algorithm occurs
problem with high
because θ behaves like Ω( M/N ), so only O( N/M ) applications of G are required to rotate the state vector
close to |β.
where θ is a real number in the range 0 to π/2 (assuming for simplicity that
M ≤ N/2; this limitation will be lifted shortly), chosen so that
√
2 M (N − M )
sin θ =
.
(6.14)
N
6.1.4 Performance
How many times must the Grover iteration
be repeated in orderto rotate |ψ near |β?
The initial stateof the system is |ψ = (N − M )/N |α + M/N |β, so rotating
through arccos M/N radians takes the system to |β. Let CI(x) denote the integer
closest to the real number x, where by convention we round halves down, CI(3.5) = 3,
for example. Then repeating the Grover iteration
arccos M/N
(6.15)
R = CI
θ
times rotates |ψ to within an angle θ/2 ≤ π/4 of |β. Observation of the state in the
computational basis then yields a solution to the search problem with probability at least
one-half. Indeed, for specific values of M and N it is possible to achieve a
much higher
M/N , and
probability of success. For example, when M ≪ N we have
θ
≈
sin
θ
≈
2
thus the angular error in the final state is at most θ/2 ≈ M/N, giving a probability
of error of at most M/N . Note that R depends on the number of solutions M , but not
254
Quantum search algorithms
on the identity of those solutions, so provided we know M we can apply the quantum
search algorithm as described. In Section 6.3 we will explain how to remove even the
need for a knowledge of M in applying the search algorithm.
The form (6.15) is useful as an exact expression for the number of oracle calls used
to perform the quantum search algorithm, but it would be useful to have a simpler
expression summarizing the essential behavior of R. To achieve this, note from (6.15)
that R ≤ ⌈π/2θ⌉, so a lower bound on θ will give an upper bound on R. Assuming for
the moment that M ≤ N/2, we have
M
θ
θ
≥ sin =
,
(6.16)
2
2
N
from which we obtain an elegant upper bound on the number of iterations required,
π N
.
(6.17)
R≤
4 M
That is, R = O( N/M ) Grover iterations (and thus oracle calls) must be performed
in order to obtain a solution to the search problem with high probability, a quadratic
improvement over the O(N/M ) oracle calls required classically. The quantum search
algorithm is summarized below, for the case M = 1.
Algorithm: Quantum search
Inputs: (1) a black box oracle O which performs the transformation
O|x|q = |x|q ⊕ f (x), where f (x) = 0 for all 0 ≤ x < 2n except x0 , for which
f (x0 ) = 1; (2) n + 1 qubits in the state |0.
Outputs: x0 .
√
Runtime: O( 2n ) operations. Succeeds with probability O(1).
Procedure:
1. |0⊗n |0
2n −1
1
|0 − |1
√
2. → √
|x
2n x=0
2
R
3. → (2|ψ ψ| − I)O
|0 − |1
√
≈ |x0
2
4. → x0
initial state
apply H ⊗n to the first n qubits,
and HX to the last qubit
n
2 −1
|0 − |1
1
√
√
|x
2n x=0
2
apply
√ the Grover iteration R ≈
⌈π 2n /4⌉ times.
measure the first n qubits
Exercise 6.4: Give explicit steps for the quantum search algorithm, as above, but for
the case of multiple solutions (1 < M < N/2).
What happens when more than half the items are
√ solutions to the search problem, that
is, M ≥ N/2? From the expression θ = arcsin(2 M (N − M )/N ) (compare (6.14)) we
see that the angle θ gets smaller as M varies from N/2 to N . As a result, the number of
iterations needed by the search algorithm increases with M , for M ≥ N/2. Intuitively,
Quantum search as a quantum simulation
255
this is a silly property for a search algorithm to have: we expect that it should become
easier to find a solution to the problem as the number of solutions increases. There are
at least two ways around this problem. If M is known in advance to be larger than N/2
then we can just randomly pick an item from the search space, and then check that it is
a solution using the oracle. This approach has a success probability at least one-half, and
only requires one consultation with the oracle. It has the disadvantage that we may not
know the number of solutions M in advance.
In the case where it isn’t known whether M ≥ N/2, another approach can be used.
This approach is interesting in its own right, and has a useful application to simplify the
analysis of the quantum algorithm for counting the number of solutions to the search
problem, as presented in Section 6.3. The idea is to double the number of elements in the
search space by adding N extra items to the search space, none of which are solutions.
As a consequence, less than half the items in the new search space are solutions. This is
effected by adding a single qubit |q to the search index, doubling the number of items to
be searched to 2N . A new augmented oracle O′ is constructed which marks an item only
if it is a solution to the search problem and the extra bit is set to zero. In Exercise 6.5 you
will explain how the oracle O′ may be constructed using one call to O. The new search
problem has only M solutions out of 2N entries,
so running the search algorithm with
the new oracle
O′ we see that at most R = π/4 2N/M calls to O′ are required, and it
follows that O( N/M ) calls to O are required to perform the search.
Exercise 6.5: Show that the augmented oracle O′ may be constructed using one
application of O, and elementary quantum gates, using the extra qubit |q.
The quantum search algorithm may be used in a wide variety of ways, some of which
will be explored in subsequent sections. The great utility of the algorithm arises because
we do not assume any particular structure to the search problems being performed. This
is the great advantage of posing the problem in terms of a ‘black box’ oracle, and we
adopt this point of view whenever convenient through the remainder of this chapter. In
practical applications, of course, it is necessary to understand how the oracle is being
implemented, and in each of the practical problems we concern ourselves with an explicit
description of the oracle implementation is given.
Exercise 6.6: Verify that the gates in the dotted box in the second figure of Box 6.1
perform the conditional phase shift operation 2|0000| − I, up to an
unimportant global phase factor.
6.2 Quantum search as a quantum simulation
The correctness of the quantum search algorithm is easily verified, but it is by no means
obvious how one would dream up such an algorithm from a state of ignorance. In this
section we sketch a heuristic means by which one can ‘derive’ the quantum search algorithm, in the hope of lending some intuition as to the tricky task of quantum algorithm
design. As a useful side effect we also obtain a deterministic quantum search algorithm.
Because our goal is to obtain insight rather than generality, we assume for the sake of
simplicity that the search problem has exactly one solution, which we label x.
Our method involves two steps. First, we make a guess as to a Hamiltonian which
256
Quantum search algorithms
Box 6.1: Quantum search: a two-bit example
Here is an explicit example illustrating how the quantum search algorithm works
on a search space of size N = 4. The oracle, for which f (x) = 0 for all x except
x = x0 , in which case f (x0 ) = 1, can be taken to be one of the four circuits
corresponding to x0 = 0, 1, 2, or 3 from left to right, where the top two qubits carry
the query x, and the bottom qubit carries the oracle’s response. The quantum circuit
which performs the initial Hadamard transforms and a single Grover iteration G is
Initially, the top two qubits are prepared in the state |0, and the bottom one as
|1. The gates in the dotted box perform the conditional phase shift operation
2|0000| − I. How many times must we repeat G to obtain x0 ? From Equation (6.15), using M = 1, we find that less than one iteration is required. It turns
out that because θ = π/3 in (6.14), only exactly one iteration is required, to perfectly obtain x0 , in this special case. In the geometric picture of Figure 6.3, our
initial state |ψ = (|00 + |01 + |10 + |11)/2 is 30◦ from |α, and a single rotation
by θ = 60◦ moves |ψ to |β. You can confirm for yourself directly, using the
quantum circuits, that measurement of the top two qubits gives x0 , after using the
oracle only once. In contrast, a classical computer – or classical circuit – trying to
differentiate between the four oracles would require on average 2.25 oracle queries!
solves the search problem. More precisely, we write down a Hamiltonian H which depends on the solution x and an initial state |ψ such that a quantum system evolving
according to H will change from |ψ to |x after some prescribed time. Once we’ve
found such a Hamiltonian and initial state, we can move on to the second step, which is
to attempt to simulate the action of the Hamiltonian using a quantum circuit. Amazingly,
following this procedure leads very quickly to the quantum search algorithm! We have
already met this two-part procedure while studying universality in quantum circuits, in
Problem 4.3, and it also serves well in the study of quantum searching.
We suppose that the algorithm starts with the quantum computer in a state |ψ. We’ll
tie down what |ψ should be later on, but it is convenient to leave |ψ undetermined
until we understand the dynamics of the algorithm. The goal of quantum searching is to
Quantum search as a quantum simulation
257
change |ψ into |x or some approximation thereof. What Hamiltonians might we guess
do a good job of causing such an evolution? Simplicity suggests that we should guess
a Hamiltonian constructed entirely from the terms |ψ and |x. Thus, the Hamiltonian
must be a sum of terms like |ψψ|, |xx|, |ψx| and |xψ|. Perhaps the simplest
choices along these lines are the Hamiltonians:
H = |xx| + |ψψ|
H = |xψ| + |ψx|.
(6.18)
(6.19)
It turns out that both these Hamiltonians lead to the quantum search algorithm! For now,
however, we restrict ourselves to analyzing the Hamiltonian in Equation (6.18). Recall
from Section 2.2.2 that after a time t, the state of a quantum system evolving according
to the Hamiltonian H and initially in the state |ψ is given by
exp(−iHt)|ψ .
(6.20)
Intuitively it looks pretty good: for small t the effect of the evolution is to take |ψ to
(I − itH)|ψ = (1 − it)|ψ − itx|ψ|x. That is, the |ψ vector is rotated slightly,
into the |x direction. Let’s actually do a full analysis, with the goal being to determine
whether there is a t such that exp(−iHt)|ψ = |x. Clearly we can restrict the analysis
to the two-dimensional space spanned by |x and |ψ. Performing the Gram–Schmidt
procedure, we can find |y such that |x, |y forms an orthonormal basis for this space,
and |ψ = α|x + β|y, for some α, β such that α2 + β 2 = 1, and for convenience we
have chosen the phases of |x and |y so that α and β are real and non-negative. In the
|x, |y basis we have
2
1 + α2
αβ
α αβ
1 0
= I + α(βX + αZ) . (6.21)
=
+
H=
αβ
1 − α2
αβ β 2
0 0
Thus
exp(−iHt)|ψ = exp(−it) cos(αt)|ψ − i sin(αt) (βX + αZ) |ψ .
(6.22)
cos(αt)|ψ − i sin(αt)|x .
(6.23)
The global phase factor exp(−it) can be ignored, and simple algebra shows that (βX +
αZ)|ψ = |x, so the state of the system after a time t is
Thus, observation of the system at time t = π/2α yields the result |x with probability
one: we have found a solution to the search problem! Unfortunately, the time of the
observation depends on α, the component of |ψ in the |x direction, and thus on x,
which is what we are trying to determine. The obvious solution is to attempt to arrange
α to be the same for all |x, that is, to choose |ψ to be the uniform superposition state
|x
.
(6.24)
|ψ = √x
N
√
Making
this choice gives α = 1/ N for all x, and thus the time of observation t =
√
π N /2 does not depend on knowing the value of x. Furthermore, the state (6.24) has
the obvious advantage that we already know how to prepare such a state by doing a
Hadamard transform.
We now know that the Hamiltonian (6.18) rotates the vector |ψ to |x. Can we find
258
Quantum search algorithms
a quantum circuit to simulate the Hamiltonian (6.18), and thus obtain a quantum search
algorithm? Applying the method of Section 4.7, we see that a natural way of simulating H
is to alternately simulate the Hamiltonians H1 ≡ |xx| and H2 ≡ |ψψ| for short time
increments Δt. These Hamiltonians are easily simulated using the methods of Chapter 4,
as illustrated in Figures 6.4 and 6.5.
Exercise 6.7: Verify that the circuits shown in Figures 6.4 and 6.5 implement the
operations exp(−i|xx|Δt) and exp(−i|ψψ|Δt), respectively, with |ψ as in
(6.24).
Figure 6.4. Circuit implementing the operation exp(−i|x x|Δt) using two oracle calls.
Figure 6.5. Circuit implementing the operation exp(−i|ψ ψ|Δt), for |ψ as in (6.24).
The number of oracle calls required by the quantum simulation is determined by
how small a time-step is required to obtain reasonably accurate results. Suppose we use a
2
number
of steps required
simulation step
√
√
√ of length Δt that is accurate to O(Δt ). The total
is t/Δt = Θ( N /Δt), and thus the cumulative error is O(Δt2 × N /Δt) = O(Δt N ).
To obtain a reasonably high success
√ probability we need the error to be O(1), which means
that we must choose Δt = Θ(1/ N ) which results in a number of oracle calls that scales
like O(N ) – no better than the classical solution! What if we use a more accurate method
of quantum simulation,
say one that is accurate to O(Δt3 )? The cumulative error in this
√
case is O(Δt2 N ), and thus to achieve a reasonable success probability we need to choose
Δt = Θ(N −1/4 ), resulting in a total number of oracle calls O(N 3/4 ), which is a distinct
improvement over the classical situation, although still not as good as achieved by the
quantum search algorithm of Section 6.1! In general going to a more accurate quantum
simulation technique results in a reduction in the number of oracle calls required to
perform the simulation:
Exercise 6.8: Suppose the simulation step is performed to an accuracy O(Δtr ). Show
Quantum search as a quantum simulation
259
that the number of oracle calls required to simulate H to reasonable accuracy is
O(N r/2(r−1) ). Note that as r becomes large the exponent of N approaches 1/2.
We have been analyzing the accuracy of the quantum simulation of the Hamiltonian (6.18) using general results on quantum simulation from Section 4.7. Of course, in
this instance we are dealing with a specific Hamiltonian, not the general case, which suggests that it might be interesting to calculate explicitly the effect of a simulation step of
time Δt, rather than relying on the general analysis. We can do this for any specific choice
of simulation method – it can be a little tedious to work out the effect of the simulation
step, but it is essentially a straightforward calculation. The obvious starting point is to explicitly calculate the action of the lowest-order simulation techniques, that is, to calculate
one or both of exp(−i|xx|Δt) exp(−i|ψψ|Δt) or exp(−i|ψψ|Δt) exp(−i|xx|Δt).
The results are essentially the same in both instances; we will focus on the study of
U (Δt) ≡ exp(−i|ψψ|Δt) exp(−i|xx|Δt). U (Δt) clearly acts non-trivially only in the
space spanned by |xx| and |ψψ|, so we restrict ourselves to that space, working
in the basis |x, |y, where |y is defined as before. Note that in this representation
|xx| = (I + Z)/2 = (I + ẑ · σ)/2, where ẑ ≡ (0, 0, 1) is the unit vector in the z
direction, and |ψψ| = (I + ψ · σ)/2, where ψ = (2αβ, 0, (α2 − β 2 )) (recall that this is
the Bloch vector representation; see Section 4.2). A simple calculation shows that up to
an unimportant global phase factor,
Δt
Δt
2
2
− sin
U (Δt) = cos
ψ · ẑ I
2
2
Δt ψ + ẑ
Δt ψ × ẑ
Δt
−2i sin
cos
·σ.
(6.25)
+ sin
2
2
2
2
2
Exercise 6.9: Verify Equation (6.25). (Hint: see Exercise 4.15.)
Equation (6.25) implies that U (Δt) is a rotation on the Bloch sphere about an axis of
rotation r defined by
Δt ψ + ẑ
Δt ψ × ẑ
r = cos
+ sin
,
(6.26)
2
2
2
2
and through an angle θ defined by
θ
Δt
Δt
= cos2
− sin2
cos
ψ · ẑ,
2
2
2
which simplifies upon substitution of ψ · ẑ = α2 − β 2 = (2/N − 1) to
2
θ
Δt
cos
=1−
.
sin2
2
N
2
(6.27)
(6.28)
Note that ψ · r = ẑ · r, so both |ψψ| and |xx| lie on the same circle of revolution
about the r axis on the Bloch sphere. Summarizing, the action of U (Δt) is to rotate
|ψψ| about the r axis, through an angle θ for each application of U (Δt), as illustrated
in Figure 6.6. We terminate the procedure when enough rotations have been performed
to rotate |ψψ| near to the solution |xx|. Now initially we imagined that Δt was small,
since we were considering the case of quantum simulation, but Equation (6.28) shows
260
Quantum search algorithms
that the smart thing to do is to choose Δt = π, in order to maximize the rotation angle
θ. If we√do this, then we obtain cos(θ/2) = 1 − 2/N , which for large N corresponds
√ to
θ ≈ 4/ N , and the number of oracle calls required to find the solution |x is O( N ),
just as for the original quantum search algorithm.
Figure 6.6. Bloch sphere diagram showing the initial state ψ rotating around the axis of rotation r going toward the
final state ẑ.
Indeed, if we make the choice Δt = π, then this ‘quantum simulation’ is in fact
identical with the original quantum search algorithm, since the operators applied in
the quantum simulation are exp(−iπ|ψψ|) = I − 2|ψψ| and exp(−iπ|xx|) =
I − 2|xx|, and up to a global phase shift these are identical to the steps making
up the Grover iteration. Viewed this way, the circuits shown in Figures 6.2 and 6.3 for
the quantum search algorithm are simplifications of the circuits shown in Figures 6.4
and 6.5 for the simulation, in the special case Δt = π!
Exercise 6.10: Show that by choosing√Δt appropriately we can obtain a quantum
search algorithm which uses O( N ) queries, and for which the final state is |x
exactly, that is, the algorithm works with probability 1, rather than with some
smaller probability.
We have re-derived the quantum search algorithm from a different point of view, the
point of view of quantum simulation. Why did this approach work? Might it be used to
find other fast quantum algorithms? We can’t answer these questions in any definitive
way, but the following few thoughts may be of some interest. The basic procedure used is
four-fold: (1) specify the problem to be solved, including a description of the desired input
and output from the quantum algorithm; (2) guess a Hamiltonian to solve the problem,
and verify that it does in fact work; (3) find a procedure to simulate the Hamiltonian;
and (4) analyze the resource costs of the simulation. This is different from the more
conventional approach in two respects: we guess a Hamiltonian, rather than a quantum
circuit, and there is no analogue to the simulation step in the conventional approach. The
more important of these two differences is the first. There is a great deal of freedom in
specifying a quantum circuit to solve a problem. While that freedom is, in part, responsible
Quantum counting
261
for the great power of quantum computation, it makes searching for good circuits rather
difficult. By contrast, specifying a Hamiltonian is a much more constrained problem, and
therefore affords less freedom in the solution of problems, but those same constraints
may in fact make it much easier to find an efficient quantum algorithm to solve a problem.
We’ve seen this happen for the quantum search algorithm, and perhaps other quantum
algorithms will be discovered by this method; we don’t know. What seems certain is that
this ‘quantum algorithms as quantum simulations’ point of view offers a useful alternative
viewpoint to stimulate in the development of quantum algorithms.
Exercise 6.11: (Multiple solution continuous quantum search) Guess a
Hamiltonian with which one may solve the continuous time search problem in
the case where the search problem has M solutions.
Exercise 6.12: (Alternative Hamiltonian for quantum search) Suppose
H = |xψ| + |ψx| .
(6.29)
(1) Show that it takes time O(1) to rotate from the state |ψ to the state |x,
given an evolution according to the Hamiltonian H.
(2) Explain how a quantum simulation of the Hamiltonian H may be performed,
and determine the number of oracle calls your simulation technique requires
to obtain the solution with high probability.
6.3 Quantum counting
How quickly can we determine the number of solutions, M , to an N item search problem,
if M is not known in advance? Clearly, on a classical computer it takes Θ(N ) consultations
with an oracle to determine M . On a quantum computer it is possible to estimate the
number of solutions much more quickly than is possible on a classical computer by
combining the Grover iteration with the phase estimation technique based upon the
quantum Fourier transform (Chapter 5). This has some important applications. First, if
we can estimate the number of solutions quickly then it is also possible to find a solution
quickly, even if the number of solutions is unknown, by first counting the number of
solutions, and then applying the quantum search algorithm to find a solution. Second,
quantum counting allows us to decide whether or not a solution even exists, depending on
whether the number of solutions is zero, or non-zero. This has applications, for example,
to the solution of NP-complete problems, which may be phrased in terms of the existence
of a solution to a search problem.
Exercise 6.13: Consider a classical algorithm for the counting problem which samples
uniformly and independently k times from the search space, and let X1 , . . . , Xk
be the results of the oracle calls, that is, Xj = 1 if the jth oracle call revealed a
solution to the problem, and Xj = 0 if the jth oracle call did not reveal a
solution to the problem. This algorithm returns the estimate S ≡ N × j Xj /k
for the number of solutions
to the search problem. Show that the standard
deviation in S is ΔS = M (N − M )/k. Prove that to obtain
√ a probability at
least 3/4 of estimating M correctly to within an accuracy M for all values of
M we must have k = Ω(N ).
262
Quantum search algorithms
Exercise 6.14: Prove that any classical counting algorithm with
√ a probability at least
3/4 for estimating M correctly to within an accuracy c M for some constant c
and for all values of M must make Ω(N ) oracle calls.
Quantum counting is an application of the phase estimation procedure of Section 5.2 to
estimate the eigenvalues of the Grover iteration G, which in turn enables us to determine
the number of solutions M to the search problem. Suppose |a and |b are the two
eigenvectors of the Grover iteration in the space spanned by |α and |β. Let θ be the
angle of rotation determined by the Grover iteration. From Equation (6.13) we see that
the corresponding eigenvalues are eiθ and ei(2π−θ) . For ease of analysis it is convenient to
assume that the oracle has been augmented, as described in Section 6.1, expanding the
size of the search space to 2N , and ensuring that sin2 (θ/2) = M/2N .
The phase estimation circuit used for quantum counting is shown in Figure 6.7. The
function of the circuit is to estimate θ to m bits of accuracy, with a probability of success
at least 1 − ǫ. The first register contains t ≡ m + ⌈log(2 + 1/2ǫ)⌉ qubits, as per the phase
estimation algorithm, and the second register contains n + 1 qubits, enough to implement
the Grover iteration on the augmented search space. The state of the second register
is initialized to an equal superposition of all possible inputs x |x by a Hadamard
transform. As we saw in Section 6.1 this state is a superposition of the eigenstates |a
and |b, so by the results of Section 5.2 the circuit in Figure 6.7 gives us an estimate of
θ or 2π − θ accurate to within |Δθ| ≤ 2−m , with probability at least 1 − ǫ. Furthermore,
an estimate for 2π − θ is clearly equivalent to an estimate of θ with the same level of
accuracy, so effectively the phase estimation algorithm determines θ to an accuracy 2−m
with probability 1 − ǫ.
Figure 6.7. Circuit for performing approximate quantum counting on a quantum computer.
Speeding up the solution of NP-complete problems
263
Using the equation sin2 (θ/2) = M/2N and our estimate for θ we obtain an estimate
of the number of solutions, M . How large an error, ΔM , is there in this estimate?
+
+
θ ++
|ΔM | ++ 2 θ + Δθ
2
(6.30)
− sin
= +sin
2N
2
2 +
+
+
+
+
θ + Δθ
θ
+sin θ + Δθ − sin θ + .
+ sin
(6.31)
= sin
+
2
2
2
2 +
Calculus implies that | sin((θ +Δθ)/2)−sin(θ/2)| ≤ |Δθ|/2, and elementary trigonometry
that | sin((θ + Δθ)/2)| < sin(θ/2) + |Δθ|/2, so
|Δθ| |Δθ|
θ
|ΔM |
+
< 2 sin
.
(6.32)
2N
2
2
2
Substituting sin2 (θ/2) = M/2N and |Δθ| ≤ 2−m gives our final estimate for the error
in our estimate of M ,
√
N
2M N + m+1 2−m .
(6.33)
|ΔM | <
2
As an example, suppose we choose m √
= ⌈n/2⌉ + 1, and ǫ = 1/6. Then we
√ have t =
N ) Grover iterations,
and
thus
Θ(
N ) oracle
⌈n/2⌉ + 3, so the algorithm requires Θ(
√
calls. By (6.33) our accuracy is |ΔM | < M/2 + 1/4 = O( M ). Compare this with
Exercise 6.14, according to which it would have required O(N ) oracle calls to obtain a
similar accuracy on a classical computer.
Indeed, the example just described serves double duty as an algorithm for determining
whether a solution to the search problem exists at all, that is, whether M = 0 or M = 0. If
M = 0 then we have |ΔM | < 1/4, so the algorithm must produce the estimate zero with
probability at least 5/6. Conversely, if M = 0 then it is easy to verify that the estimate
for M is not equal to 0 with probability at least 5/6.
Another application of quantum counting is to find a solution to a search problem
when the number M of solutions is unknown. The difficulty in applying the quantum
search algorithm as described in Section 6.1 is that the number of times to repeat the
Grover iteration, Equation (6.15), depends on knowing the number of solutions M . This
problem can be alleviated by using the quantum counting algorithm to first estimate θ
and M to high accuracy using phase estimation, and then to apply the quantum search
algorithm as in Section 6.1, repeating the Grover iteration a number of times determined
by (6.15), with the estimates for θ and M obtained by phase estimation substituted to
determine R. The angular error in this case is at most π/4(1 + |Δθ|/θ), so choosing
m = ⌈n/2⌉ + 1 as before gives an angular error at most π/4 × 3/2 =√ 3π/8, which
corresponds to a success probability of at least cos2 (3π/8) = 1/2 − 1/2 2 ≈ 0.15 for
the search algorithm. If the probability of obtaining an estimate of θ this accurate is 5/6,
as in our earlier example, then the total probability of obtaining a solution to the search
problem is 5/6 × cos2 (3π/8) ≈ 0.12, a probability which may quickly be boosted close
to 1 by a few repetitions of the combined counting–search procedure.
6.4 Speeding up the solution of NP-complete problems
Quantum searching may be used to speed up the solution to problems in the complexity
class NP (Section 3.2.2). We already saw, in Section 6.1.1, how factoring can be sped
264
Quantum search algorithms
up; here, we illustrate how quantum search can be applied to assist the solution of the
Hamiltonian cycle problem ( ). Recall that a Hamiltonian cycle of a graph is a simple
problem is to determine whether a
cycle which visits every vertex of the graph. The
given graph has a Hamiltonian cycle or not. This problem belongs to the class of NPcomplete problems, widely believed (but not yet proved) to be intractable on a classical
computer.
is to perform a search through all possible orderings
A simple algorithm to solve
of the vertices:
(1) Generate each possible ordering (v1 , . . . , vn ) of vertices for the graph. Repetitions
will be allowed, as they ease the analysis without affecting the essential result.
(2) For each ordering check to see whether it is a Hamiltonian cycle for the graph. If
not, continue checking the orderings.
Since there are nn = 2n log n possible orderings of the vertices which must be searched,
this algorithm requires 2n log n checks for the Hamiltonian cycle property in the worst
case. Indeed, any problem in NP may be solved in a similar way: if a problem of size n has
witnesses which can be specified using w(n) bits, where w(n) is some polynomial in n,
then searching through all 2w(n) possible witnesses will reveal a solution to the problem,
if one exists.
The quantum search algorithm may be used to speed up this algorithm by increasing
the speed of the search. Specifically, we use the algorithm described in Section 6.3 to
determine whether a solution to the search problem exists. Let m ≡ ⌈log n⌉. The search
space for the algorithm will be represented by a string of mn qubits, each block of m
qubits being used to store the index to a single vertex. Thus we can write the computational basis states as |v1 , . . . , vn , where each |vi is represented by the appropriate string
of m qubits, for a total of nm qubits. The oracle for the search algorithm must apply
the transformation:
*
|v1 , . . . , vn if v1 , . . . , vn is not a Hamiltonian cycle
(6.34)
O|v1 , . . . , vn =
−|v1 , . . . , vn if v1 , . . . , vn is a Hamiltonian cycle
Such an oracle is easy to design and implement when one has a description of the graph.
One takes a polynomial size classical circuit recognizing Hamiltonian cycles in the graph,
and converts it to a reversible circuit, also of polynomial size, computing the transformation (v1 , . . . , vn , q) → (v1 , . . . , vn , q ⊕ f (v1 , . . . , vn )), where f (v1 , . . . , vn ) = 1 if
v1 , . . . , vn is a Hamiltonian cycle, and is 0 otherwise. Implementing the corresponding
√
circuit on a quantum computer with the final qubit starting in the state (|0 − |1)/ 2
gives the desired transformation. We won’t explicitly describe the details here, except
to note the key point: the oracle requires a number of gates polynomial in n, as a direct consequence of the fact that Hamiltonian cycles can be recognized using polynomially many gates classically. Applying the variant of the search algorithm which determines whether a solution to the search problem exists (Section 6.3) we see that it takes
O(2mn/2 ) = O(2n⌈log n⌉/2 ) applications of the oracle to determine whether a Hamiltonian
cycle exists. When one does exist it is easy to apply the combined counting–search algorithm to find an example of such a cycle, which can then be exhibited as a witness for
the problem.
To summarize:
• The classical algorithm requires O p(n)2n⌈log n⌉ operations to determine whether a
Quantum search of an unstructured database
265
Hamiltonian cycle exists, where the polynomial factor p(n) is overhead
predominantly due to the implementation of the oracle, that is, the gates checking
whether a candidate path is Hamiltonian or not. The dominant effect in determining
the resources required is the exponent in 2n⌈log n⌉ . The classical algorithm is
deterministic, that is, it succeeds with probability 1.
• The quantum algorithm requires O p(n)2n⌈log n⌉/2 operations to determine whether
a Hamiltonian cycle exists. Once again, the polynomial p(n) is overhead
predominantly due to implementation of the oracle. The dominant effect in
determining the resources required is the exponent in 2n⌈log n⌉/2 . There is a constant
probability (say, 1/6) of error for the algorithm, which may be reduced to 1/6r by r
repetitions of the algorithm.
• Asymptotically the quantum algorithm requires the square root of the number of
operations the classical algorithm requires.
6.5 Quantum search of an unstructured database
Suppose somebody gives you a list containing one thousand flower names, and asks
you where ‘Perth Rose’ appears on the list. If the flower appears exactly once on the
list, and the list is not ordered in any obvious way, then you will need to examine five
hundred names, on average, before you find the ‘Perth Rose’. Might it be possible to
speed up this kind of database searching using the quantum search algorithm? Indeed,
the quantum search algorithm is sometimes referred to as a database search algorithm,
but its usefulness for that application is limited, and based on certain assumptions. In this
section we take a look at how the quantum search algorithm can conceptually be used
to search an unstructured database, in a setting rather like that found on a conventional
computer. The picture we construct will clarify what resources are required to enable a
quantum computer to search classical databases.
Suppose we have a database containing N ≡ 2n items, each of length l bits. We will
label these items d1 , . . . , dN . We want to determine where a particular l bit string, s, is
in the database. A classical computer, used to solve this problem, is typically split into
two parts, illustrated in Figure 6.8. One is the central processing unit, or CPU, where
data manipulation takes place, using a small amount of temporary memory. The second
part is a large memory which stores the database in a string of 2n blocks of l bit cells.
The memory is assumed to be passive, in the sense that it is not capable of processing
data on its own. What is possible is to LOAD data from memory into the CPU, and STORE
data from the CPU in memory, and to do manipulations of the data stored temporarily
in the CPU. Of course, classical computers may be designed along different lines, but
this CPU–memory split is a popular and common architecture.
To find out where a given string s is in the unstructured database, the most efficient
classical algorithm is as follows. First, an n-bit index to the database elements is set up
in the CPU. We assume that the CPU is large enough to store the n ≡ ⌈log N ⌉ bit
index. The index starts out at zero, and is incremented by one on each iteration of the
algorithm. At each iteration, the database entry corresponding to the index is loaded into
the CPU, and compared to the string which is being searched for. If they are the same,
the algorithm outputs the value of the index and halts. If not, the algorithm continues
incrementing the index. Obviously, this algorithm requires that items be loaded from
266
Quantum search algorithms
$""""""%""""""&
)
/ / /
Figure 6.8. Classical database searching on a computer with distinct central processing unit (CPU) and memory.
Only two operations may be directly performed on the memory – a memory element may be LOADed into the CPU,
or an item from the CPU may be STOREd in memory.
memory 2n times in the worst case. It is also clear that this is the most efficient possible
algorithm for solving the problem in this model of computation.
How efficiently can an analogous algorithm be implemented on a quantum computer?
And, even if a quantum speedup is possible, how useful is such an algorithm? We show
first that a speedup is possible, and then return to the question of the utility of such an
algorithm. Suppose our quantum computer consists of two units, just like the classical
computer, a CPU and a memory. We assume that the CPU contains four registers: (1)
an n qubit ‘index’ register initialized to |0; (2) an l qubit register initialized to |s and
remaining in that state for the entire computation; (3) an l√qubit ‘data’ register initialized
to |0; and (4) a 1 qubit register initialized to (|0 − |1)/ 2.
The memory unit can be implemented in one of two ways. The simplest is a quantum
memory containing N = 2n cells of l qubits each, containing the database entries |dx .
The second implementation is to implement the memory as a classical memory with
N = 2n cells of l bits each, containing the database entries dx . Unlike a traditional classical
memory, however, it can be addressed by an index x which can be in a superposition of
multiple values. This quantum index allows a superposition of cell values to be LOADed
from memory. Memory access works in the following way: if the CPU’s index register
is in the state |x and the data register is in the state |d, then the contents dx of the
xth memory cell are added to the data register: |d → |d ⊕ dx , where the addition is
done bitwise, modulo 2. First, let us see how this capability is used to perform quantum
search, then we shall discuss how such a memory might be physically constructed.
The key part of implementing the quantum search algorithm is realization of the oracle,
which must flip the phase of the index which locates s in the memory. Suppose the CPU
is in the state
|x|s|0
|0 − |1
√
.
2
(6.35)
Applying the LOAD operation puts the computer in the state
|x|s|dx
|0 − |1
√
.
2
(6.36)
Now the second and third registers are compared, and if they are the same, then a bit
Quantum search of an unstructured database
267
flip is applied to register 4; otherwise nothing is changed. The effect of this operation is
⎧
|0 − |1
⎪
⎪
⎨ −|x|s|dx √
|0 − |1
2
|x|s|dx √
→
|0 − |1
⎪
2
⎪
⎩ |x|s|dx √
2
if dx = s
(6.37)
if dx = s.
The data register is then restored to the state |0 by performing the LOAD operation again.
The total action of the oracle thus leaves registers 2, 3 and 4 unaffected, and unentangled
with register 1. Thus, the overall effect is to take the state of register 1 from |x to −|x
if dx = s, and to leave the register alone otherwise. Using the oracle implemented in this
way, we may apply√the quantum search algorithm to determine the location of s in the
database, using O( N ) LOAD operations, compared to the N LOAD operations that were
required classically.
In order for the oracle to function correctly on superpositions it seems at first glance
as though the memory needs to be quantum mechanical. In fact, as we noted above, with
some caveats the memory can actually be implemented classically, which likely makes it
much more resistant to the effects of noise. But a quantum addressing scheme is still
needed; a conceptual picture illustrating how this might be done is shown in Figure 6.9.
The principle of operation is a means by which the binary encoded state of the quantum
index (where 0 to 2n − 1 is represented by n qubits) is translated into a unary encoding
(where 0 to 2n − 1 is represented by the position of a single probe within 2n possible
locations) which addresses the classical database. The database effects a change on a
degree of freedom within the probe which is unrelated to its position. The binary to
unary encoding is then reversed, leaving the data register with the desired contents.
Are there practical instances in which the quantum search algorithm could be useful
for searching classical databases? Two distinct points may be made. First, databases are
not ordinarily unstructured. Simple databases, like one containing flower names discussed
in the introduction to this section, may be maintained in alphabetical order, such that a
binary search can be used to locate an item in time which is O(log(N )) for an N -element
database. However, some databases may require a much more complex structure, and
although sophisticated techniques exist to optimize classical searches, given queries of a
sufficiently complex or unanticipated nature, a predetermined structure may not be of
assistance, and the problem can be regarded as being essentially the unstructured database
search problem we discussed.
Second, for a quantum computer to be able to search a classical database, a quantum
addressing scheme is required. The scheme we depicted requires O(N log N ) quantum
switches – about the same amount of hardware as would be required to store the database
itself. Presumably, these switches may someday be as simple and inexpensive as classical
memory elements, but if that is not the case, then building a quantum computer to
perform a quantum search may not be economically advantageous, compared with using
classical computing hardware distributed over the memory elements.
Given these considerations, it appears that the principle use of quantum search algorithms will not be in searching classical databases. Rather, their use will probably be
in searching for solutions to hard problems, as discussed in the last section, such as the
Hamiltonian cycle, traveling salesman, and satisfiability problems.
268
Quantum search algorithms
01
03
03
02
02
0
0
0
0
0
0
0
0
0
0
////
2
3
0
1
7
0
8
24
0
0
25
26
0
0
3
0
0
02
3
32
0
0
02
03
03
01
Figure 6.9. Conceptual diagram of a 32 cell classical memory with a five qubit quantum addressing scheme. Each
circle represents a switch, addressed by the qubit inscribed within. For example, when |x4 = |0, the
corresponding switch routes the√input qubit towards the left; when |x4 = |1 the switch routes the input qubit to
the right. If |x4 = (|0 + |1)/ 2, then an equal superposition of both routes is taken. The data register qubits
enter at the top of the tree, and are routed down to the database, which changes their state according to the
contents of the memory. The qubits are then routed back into a definite position, leaving them with the retrieved
information. Physically, this could be realized using, for example, single photons for the data register qubits, which
are steered using nonlinear interferometers (Chapter 7). The classical database could be just a simple sheet of
plastic in which a ‘zero’ (illustrated as white squares) transmits light unchanged, and a ‘one’ (shaded squares) shifts
the polarization of the incident light by 90◦ .
Optimality of the search algorithm
269
6.6 Optimality of the search algorithm
We have shown
√ that a quantum computer can search N items, consulting the search
oracle only O( N ) times.
√ We now prove that no quantum algorithm can perform this
task using fewer than Ω( N ) accesses to the search oracle, and thus the algorithm we
have demonstrated is optimal.
Suppose the algorithm starts in the state |ψ. For simplicity, we prove the lower
bound for the case where the search problem has a single solution, x. To determine x
we are allowed to apply an oracle Ox which gives a phase shift of −1 to the solution
|x and leaves all other states invariant, Ox = I − 2|xx|. We suppose the algorithm
starts in a state |ψ and applies the oracle Ox exactly k times, with unitary operations
U1 , U2 , . . . , Uk interleaved between the oracle operations. Define
|ψkx ≡ Uk Ox Uk−1 Ox . . . U1 Ox |ψ
(6.38)
|ψk ≡ Uk Uk−1 . . . U1 |ψ .
(6.39)
That is, |ψk is the state that results when the sequence of unitary operations U1 , . . . , Uk
is carried out, without the oracle operations. Let |ψ0 = |ψ. Our goal will be to bound
the quantity
Dk ≡
x
ψkx − ψk 2 ,
(6.40)
where we use the notation ψ for |ψ as a convenience to simplify formulas. Intuitively,
Dk is a measure of the deviation after k steps caused by the oracle, from the evolution
that would otherwise have ensued. If this quantity is small, then all the states |ψkx are
roughly the same, and it is not possible to correctly identify x with high probability. The
strategy of the proof is to demonstrate two things: (a) a bound on Dk that shows it can
grow no faster than O(k2 ); and (b) a proof that Dk must be Ω(N ) if it is to be possible to
distinguish N alternatives. Combining these two results gives the desired lower bound.
First, we give an inductive proof that Dk ≤ 4k 2 . This is clearly true for k = 0, where
Dk = 0. Note that
(6.41)
x
Ox ψkx − ψk 2
(6.42)
x
Ox (ψkx − ψk ) + (Ox − I)ψk 2 .
Dk+1 =
=
Applying b+c2 ≤ b2 +2b c+c2 with b ≡ Ox (ψkx −ψk ) and c ≡ (Ox −I)ψk =
−2x|ψk |x, gives
Dk+1 ≤
x
ψkx − ψk 2 + 4ψkx − ψk |x|ψk | + 4|ψk |x|2 .
(6.43)
Applying the Cauchy–Schwarz inequality to the second term on the right hand side, and
noting that x |x|ψk |2 = 1 gives
Dk+1 ≤ Dk + 4
x
ψkx − ψk 2
≤ Dk + 4 Dk + 4.
12
x′
|ψk |x′ |2
12
+4
(6.44)
(6.45)
270
Quantum search algorithms
By the inductive hypothesis Dk ≤ 4k 2 we obtain
Dk+1 ≤ 4k 2 + 8k + 4 = 4(k + 1)2 ,
(6.46)
which completes the induction.
To complete the proof we need to show that the probability of success can only be
high if Dk is Ω(N ). We suppose |x|ψkx |2 ≥ 1/2 for all x, so that an observation yields a
solution to the search problem with probability at least one-half. Replacing |x by eiθ |x
does not change the probability of success, so without loss of generality we may assume
that x|ψkx = |x|ψkx |, and therefore
√
(6.47)
ψkx − x2 = 2 − 2|x|ψkx | ≤ 2 − 2.
√
Defining Ek ≡ x ψkx − x2 we see that Ek ≤ (2 − 2)N . We are now in position to
prove that Dk is Ω(N ). Defining Fk ≡ x x − ψk 2 we have
x
(ψkx − x) + (x − ψk )2
x
ψkx − x2 − 2
Dk =
≥
= Ek + Fk − 2
x
ψkx − x x − ψk +
ψkx − x x − ψk .
x
(6.48)
x
x − ψk 2
(6.49)
(6.50)
√
Applying the Cauchy–Schwarz inequality gives x ψkx − x x − ψk ≤ Ek Fk , so
we have
(6.51)
Dk ≥ Ek + Fk − 2 Ek Fk = ( Fk − Ek )2 .
√
In Exercise √
6.15 you will show that Fk ≥ 2N − 2 N . Combining this with the result
Ek ≤ (2 − 2)N gives Dk ≥ cN for sufficiently large N , where c is any constant less
√
√
than ( 2 − 2 − 2)2 ≈ 0.42. Since Dk ≤ 4k 2 this implies that
(6.52)
k ≥ cN/4 .
Summarizing, to achieve a probability of success√at least one-half for finding a solution
to the search problem we must call the oracle Ω( N ) times.
Exercise 6.15: Use the Cauchy–Schwarz inequality to show that for any normalized
state vector |ψ and set of N orthonormal basis vectors |x,
√
ψ − x2 ≥ 2N − 2 N .
(6.53)
x
Exercise 6.16: Suppose we merely require that the probability of an error being made
is less than 1/2 when averaged uniformly
over the possible values for x, instead
√
of for all values of x. Show that O( N ) oracle calls are still required to solve the
search problem.
This result, that the quantum search algorithm is essentially optimal, is both exciting
and disappointing. It is exciting because it tells us that for this problem, at least, we have
fully plumbed the depths of quantum mechanics; no further improvement is possible.
The disappointment arises because we might have hoped to do much better than the
square root speedup offered by the quantum search algorithm. The sort of dream result
Black box algorithm limits
271
we might have hoped for a priori is that it would be possible to search an N item search
space using O(log N ) oracle calls. If such an algorithm existed, it would allow us to
solve NP-complete problems efficiently on a quantum computer, since it could search
all 2w(n) possible witnesses using roughly w(n) oracle calls, where the polynomial w(n)
is the length of a witness in bits. Unfortunately, such an algorithm is not possible. This
is useful information for would-be algorithm designers, since it indicates that a naive
search-based method for attacking NP-complete problems is guaranteed to fail.
Venturing into the realm of opinion, we note that many researchers believe that the
essential reason for the difficulty of NP-complete problems is that their search space has
essentially no structure, and that (up to polynomial factors) the best possible method for
solving such a problem is to adopt a search method. If one takes this point of view, then
it is bad news for quantum computing, indicating that the class of problems efficiently
soluble on a quantum computer, BQP, does not contain the NP-complete problems. Of
course, this is merely opinion, and it is still possible that the NP-complete problems
contain some unknown structure that allows them to be efficiently solved on a quantum
computer, or perhaps even on a classical computer. A nice example to illustrate this
point is the problem of factoring, widely believed to be in the class NPI of problems
intermediate in difficulty between P and the NP-complete problems. The key to the
efficient quantum mechanical solution of the factoring problem was the exploitation of a
structure ‘hidden’ within the problem – a structure revealed by the reduction to orderfinding. Even with this amazing structure revealed, it has not been found possible to
exploit the structure to develop an efficient classical algorithm for factoring, although, of
course, quantum mechanically the structure can be harnessed to give an efficient factoring
algorithm! Perhaps a similar structure lurks in other problems suspected to be in NPI,
such as the graph isomorphism problem, or perhaps even in the NP-complete problems
themselves.
Exercise 6.17: (Optimality for multiple
solutions) Suppose the search problem
has M solutions. Show that O( N/M ) oracle applications are required to find a
solution.
6.7 Black box algorithm limits
We conclude this chapter with a generalization of the quantum search algorithm which
provides insightful bounds on the power of quantum computation. At the beginning of
the chapter, we described the search problem as finding an n-bit integer x such that
the function f : {0, 1}n → {0, 1} evaluates to f (x) = 1. Related to this is the decision
problem of whether or not there exists x such that f (x) = 1. Solving this decision
problem is equivalently difficult, and can be expressed as computing the Boolean function
F (X) = X0 ∨ X1 ∨ · · · ∨ XN −1 , where ∨ denotes the binary OR operation, Xk ≡ f (k),
and X denotes the set {X0 , X1 , . . . , XN −1 }. More generally, we may wish to compute
some function other than OR. For example, F (X) could be the AND, PARITY (sum
modulo two), or MAJORITY (F (X) = 1 if and only if more Xk = 1 than not) functions.
In general, we can consider F to be any Boolean function. How fast (measured in number
of queries) can a computer, classical or quantum, compute these functions, given an oracle
for f ?
It might seem difficult to answer such questions without knowing something about the
272
Quantum search algorithms
function f , but in fact a great deal can be determined even in this ‘black box’ model, where
the means by which the oracle accomplishes its task is taken for granted, and complexity
is measured only in terms of the number of required oracle queries. The analysis of
the search algorithm in the previous sections demonstrated one way to approach such
problems, but a more powerful approach for obtaining query complexities is the method
of polynomials, which we now briefly describe.
Let us begin with some useful definitions. The deterministic query complexity D(F )
is the minimum number of oracle queries a classical computer must perform to compute
F with certainty. The quantum equivalent, QE (F ), is the minimum number of oracle
queries a quantum computer requires to compute F with certainty. Since a quantum
computer produces probabilistic outputs by nature, a more interesting quantity is the
bounded error complexity Q2 (F ), the minimum number of oracle queries a quantum
computer requires to produce an output which equals F with probability at least 2/3.
(The 2/3 is an arbitrary number – the probability need only be bounded finitely away
from 1/2 in order to be boosted close to 1 by repetitions.) A related measure is the zeroerror complexity Q0 (F ), the minimum number of oracle queries a quantum computer
requires to produce an output which either equals F with certainty, or, with probability
less than 1/2, an admission of an inconclusive result. All these bounds must hold for any
oracle function f (or in other words, any input X into F ). Note that Q2 (F ) ≤ Q0 (F ) ≤
QE (F ) ≤ D(F ) ≤ N .
The method of polynomials is based upon the properties of minimum-degree multilinear polynomials (over the real numbers) which represent Boolean functions. All the
polynomials we shall consider below are functions of Xk ∈ {0, 1} and are thus multilinear, since Xk2 = Xk . We say that a polynomial p : RN → R represents F if p(X) =
F (X) for all X ∈ {0, 1}N (where R denotes the real numbers). Such a polynomial p
always exists, since we can explicitly construct a suitable candidate:
p(X) =
F (Y )
Y ∈{0,1}N
N
−1
/
k=0
1 − (Yk − Xk )2 .
(6.54)
That the minimum degree p is unique is left as Exercise 6.18 for the reader. The minimum
degree of such a representation for F , denoted as deg(F ), is a useful measure of the
complexity of F . For example, it is known that deg(OR), deg(AND), and deg(PARITY)
are all equal to N . In fact, it is known that the degree of most functions is of order N .
Moreover, it has also been proven that
D(F ) ≤ 2 deg(F )4 .
(6.55)
This result places an upper bound on the performance of deterministic classical computation in calculating most Boolean functions. Extending this concept, if a polynomial
8 )
satisfies |p(X)−F (X)| ≤ 1/3 for all X ∈ {0, 1}N , we say p approximates F , and deg(F
denotes the minimum degree of such an approximating polynomial. Such measures are
important in randomized classical computation and, as we shall see, in describing the
8
= N,
quantum case. It is known that deg(PARITY)
√
√
8
8
deg(OR)
∈ Θ( N )
and
deg(AND)
∈ Θ( N ) ,
(6.56)
and
8 )6 .
D(F ) ≤ 216 deg(F
(6.57)
273
Black box algorithm limits
The bounds of Equations (6.55) and (6.57) are only the best known at the time of writing;
their proof is outside the scope of this book, but you may find further information about
them in ‘History and further reading’. It is believed that tighter bounds are possible, but
these will be good enough for our purposes.
Exercise 6.18: Prove that the minimum degree polynomial representing a Boolean
function F (X) is unique.
Exercise 6.19: Show that P (X) = 1 − (1 − X0 )(1 − X1 ) . . . (1 − XN −1 ) represents OR.
Polynomials naturally arise in describing the results of quantum algorithms. Let us
write the output of a quantum algorithm Q which performs T queries to an oracle O as
2n −1
k=0
ck |k .
(6.58)
We will show that the amplitudes ck are polynomials of degree at most T in the variables X0 , X1 , . . . , XN −1 . Any Q can be realized using the quantum circuit shown in
Figure 6.10. The state |ψ0 right before the first oracle query can be written as
'
(
ai0j |i|0 + ai1j |i|1 |j ,
(6.59)
|ψ0 =
ij
where the first label corresponds to the n qubit oracle query, the next to a single qubit
in which the oracle leaves its result, and the last to the m − n − 1 working qubits used
by Q. After the oracle query, we obtain the state
'
(
|ψ1 =
ai0j |i|Xi + ai1j |i|Xi ⊕ 1 |j ,
(6.60)
ij
but since Xi is either 0 or 1, we can re-express this as
'
(
'
(
|ψ1 =
(1 − Xi )ai0j + Xi ai1j |i0 + (1 − Xi )ai1j + Xi ai0j |i1 |j . (6.61)
ij
Note that in |ψ0 , the amplitudes of the computational basis states were of degree 0 in X,
while those of |ψ1 are of degree 1 (linear in X). The important observation is that any
unitary operation which Q performs before or after the oracle query cannot change the
degree of these polynomials, but each oracle call can increase the degree by at most one.
Thus, after T queries, the amplitudes are polynomials of at most degree T . Moreover,
measuring the final output (6.58) in the computational basis produces a result k with
probability Pk (X) = |ck |2 , which are real-valued polynomials in X of degree at most 2T .
Figure 6.10. General quantum circuit for a quantum algorithm which performs T queries to an oracle O.
U0 , U1 , . . . , UT are arbitrary unitary transforms on m qubits, and the oracle acts on n + 1 qubits.
274
Quantum search algorithms
The total probability P (X) of obtaining a one as the output from the algorithm is a
sum over some subset of the polynomials Pk (X), and thus also has degree at most 2T . In
the case that Q produces the correct answer with certainty we must have P (X) = F (X),
and thus deg(F ) ≤ 2T , from which we deduce
QE (F ) ≥
deg(F )
.
2
(6.62)
In the case where Q produces an answer with bounded probability of error it follows that
8 ) ≤ 2T , from which we deduce
P (X) approximates F (X), and thus deg(F
Q2 (F ) ≥
Combining (6.55) and (6.62), we find that
8 )
deg(F
.
2
D(F )
QE (F ) ≥
32
1/4
(6.63)
.
Similarly, combining (6.57) and (6.63), we find that
D(F ) 1/6
Q2 (F ) ≥
.
13 824
(6.64)
(6.65)
This means that in computing Boolean functions using a black box, quantum algorithms
may only provide a polynomial speedup over classical algorithms, at best – and even that
is not generally possible (since deg(F ) is Ω(N ) for most functions). On the other hand,
it is known that for F = OR, D(F ) = N , and the randomized classical query complexity
R(F ) ∈ Θ(N ), whereas combining (6.63) and (6.56),
√ and the known performance of the
quantum search algorithm, shows that Q2 (F ) ∈ Θ( N ). This square root speedup is just
what the quantum search algorithm achieves, and the method of polynomials indicates
that the result can perhaps be generalized to a somewhat wider class of problems, but
without extra information about the structure of the black box oracle function f , no
exponential speedup over classical algorithms is possible.
Exercise 6.20: Show that Q0 (OR) ≥ N by constructing a polynomial which
represents the OR function from the output of a quantum circuit which
computes OR with zero error.
Problem 6.1: (Finding the minimum) Suppose x1 , . . . , xN is a database√of
numbers held in memory, as in Section 6.5. Show that only O(log(N ) N )
accesses to the memory are required on a quantum computer, in order to find
the smallest element on the list, with probability at least one-half.
Problem 6.2: (Generalized quantum searching) Let |ψ be a quantum state, and
define U|ψ ≡ I − 2|ψψ|. That is, U|ψ gives the state |ψ a −1 phase, and
leaves states orthogonal to |ψ invariant.
(1) Suppose we have a quantum circuit implementing a unitary operator U such
that U |0⊗n = |ψ. Explain how to implement U|ψ .
Chapter problems
275
√
√
(2) Let |ψ1 = |1, |ψ2 = (|0 − |1)/ 2, |ψ3 = (|0 − i|1)/ 2. Suppose an
unknown oracle O is selected from the set U|ψ1 , U|ψ2 , U|ψ3 . Give a
quantum algorithm which identifies the oracle with just one application of
the oracle. (Hint: consider superdense coding.)
(3) Research: More generally, given k states |ψ1 , . . . , |ψk , and an unknown
oracle O selected from the set U|ψ1 , . . . , U|ψk , how many oracle
applications are required to identify the oracle, with high probability?
Problem 6.3: (Database retrieval) Given a quantum oracle which returns
|k, y ⊕ Xk given an n qubit query (and one scratchpad qubit) |k, y, show that
n
with high
√
√ probability, all N = 2 bits of X can be obtained using only
N/2 + N queries. This implies the general upper bound Q2 (F ) ≤ N/2 + N
for any F .
Problem 6.4: (Quantum searching and cryptography) Quantum searching can,
potentially, be used to speed up the search for cryptographic keys. The idea is to
search through the space of all possible keys for decryption, in each case trying
the key, and checking to see whether the decrypted message makes ‘sense’.
Explain why this idea doesn’t work for the Vernam cipher (Section 12.6). When
might it work for cryptosystems such as DES? (For a description of DES see, for
example, [MvOV96] or [Sch96a].)
Summary of Chapter 6: Quantum search algorithms
• Quantum search algorithm: For a search problem with M solutions out of
N =
2n possibilities, prepare x |x and then repeat G ≡ H ⊗n U H ⊗n O a total
of O( N/M ) times, where O is the search oracle, |x → −|x if x is a solution,
no change otherwise, and U takes |0 → −|0 and leaves all other computational
basis states alone. Measuring yields a solution to the search problem with high
probability.
• Quantum counting algorithm: Suppose a search problem has an unknown
number M of solutions. G has eigenvalues exp(±iθ) where sin2 (θ/2) = M/N .
The Fourier transform based√phase estimation procedure enables us to estimate
M to high accuracy using O( N ) oracle applications. Quantum counting, in turn,
allows us to determine whether a given search problem has any solutions, and to
find one if there are, even if the number of solutions is not known in advance.
• Polynomial bounds: For problems which are described as evaluations of total
functions F (as opposed to partial functions, or ‘promise’ problems), quantum
algorithms can give no more than a polynomial speedup over classical algorithms.
1/6
. Moreover, the performance of the quanSpecifically, Q2 (F ) ≥ D(F )/13 824
√
tum search is optimal: it is Θ( N ).
276
Quantum search algorithms
History and further reading
The quantum search algorithm and much of its further development and elaboration is
due to Grover[Gro96, Gro97]. Boyer, Brassard, Høyer and Tapp[BBHT98] wrote an influential
paper in which they developed the quantum search algorithm for cases where the number
of solutions M is greater than one, and outlined the quantum counting algorithm, later
developed in more detail by Brassard, Høyer, and Tapp[BHT98], and from the point of
view of phase estimation by Mosca[Mos98]. That the Grover iteration can be understood
as a product of two reflections was first pointed out in a review by Aharonov[Aha99b]. The
continuous-time Hamiltonian (6.18) was first investigated by Farhi and Gutmann[FG98],
from a rather different point of view than we take in Section 6.2. That Grover’s algorithm
is the best possible oracle-based search algorithm was proved by Bennett, Bernstein,
Brassard and Vazirani[BBBV97]. The version of this proof we have presented is based upon
that given by Boyer, Brassard, Høyer and Tapp[BBHT98]. Zalka[Zal99] has refined these
proofs to show that the quantum search algorithm is, asymptotically, exactly optimal.
The method of polynomials for bounding the power of quantum algorithms was intro+
duced into quantum computing by Beals, Buhrman, Cleve, Mosca, and de Wolf[BBC 98].
An excellent discussion is also available in Mosca’s Ph.D. thesis[Mos99], on which much
of the discussion in Section 6.7 is based. A number of results are quoted in that section without proof; here are the citations: Equation (6.55) is attributed to Nisan and
Smolensky in [BBC+ 98], but otherwise is presently unpublished, (6.56) is derived from
a theorem by Paturi[Pat92] and (6.57) is derived in [BBC+ 98]. A better bound than (6.65)
is given in [BBC+ 98], but requires concepts such as block sensitivity which are outside
the scope of this book. A completely different approach for bounding quantum black box
algorithms, using arguments based on entanglement, was presented by Ambainis[Amb00].
Problem 6.1 is due to Dürr and Høyer[DH96]. Problem 6.3 is due to van Dam[van98a].
7 Quantum computers: physical realization
Computers in the future may weigh no more than 1.5 tons.
– Popular Mechanics, forecasting the relentless march of science, 1949
I think there is a world market for maybe five computers.
– Thomas Watson, chairman of IBM, 1943
Quantum computation and quantum information is a field of fundamental interest because we believe quantum information processing machines can actually be realized in
Nature. Otherwise, the field would be just a mathematical curiosity! Nevertheless, experimental realization of quantum circuits, algorithms, and communication systems has
proven extremely challenging. In this chapter we explore some of the guiding principles and model systems for physical implementation of quantum information processing
devices and systems.
We begin in Section 7.1 with an overview of the tradeoffs in selecting a physical realization of a quantum computer. This discussion provides perspective for an elaboration of
a set of conditions sufficient for the experimental realization of quantum computation in
Section 7.2. These conditions are illustrated in Sections 7.3 through 7.7, through a series
of case studies, which consider five different model physical systems: the simple harmonic
oscillator, photons and nonlinear optical media, cavity quantum electrodynamics devices,
ion traps, and nuclear magnetic resonance with molecules. For each system, we briefly
describe the physical apparatus, the Hamiltonian which governs its dynamics, means for
controlling the system to perform quantum computation, and its principal drawbacks. We
do not go into much depth in describing the physics of these systems; as each of these are
entire fields unto themselves, that would be outside the scope of this book! Instead, we
summarize just the concepts relevant to quantum computation and quantum information
such that both the experimental challenge and theoretical potential can be appreciated.
On the other hand, analyzing these systems from the standpoint of quantum information
also provides a fresh perspective which we hope you will find insightful and useful, as
it also allows strikingly simple derivations of some important physics. We conclude the
chapter in Section 7.8 by discussing aspects of some other physical systems – quantum
dots, superconducting gates, and spins in semiconductors – which are also of interest
to this field. For the benefit of the reader wishing to catch just the highlights of each
implementation, a summary is provided at the end of each section.
7.1 Guiding principles
What are the experimental requirements for building a quantum computer? The elementary units of the theory are quantum bits – two-level quantum systems; in Section 1.5
we took a brief look at why it is believed that qubits exist in Nature, and what physical
forms they may take on. To realize a quantum computer, we must not only give qubits
278
Quantum computers: physical realization
some robust physical representation (in which they retain their quantum properties), but
also select a system in which they can be made to evolve as desired. Furthermore, we
must be able to prepare qubits in some specified set of initial states, and to measure the
final output state of the system.
The challenge of experimental realization is that these basic requirements can often
only be partially met. A coin has two states, and makes a good bit, but a poor qubit
because it cannot remain in a superposition state (of ‘heads’ and ‘tails’) for very long.
A single nuclear spin can be a very good qubit, because superpositions of being aligned
with or against an external magnetic field can last a long time – even for days. But it
can be difficult to build a quantum computer from nuclear spins because their coupling
to the world is so small that it is hard to measure the orientation of single nuclei. The
observation that the constraints are opposing is general: a quantum computer has to be
well isolated in order to retain its quantum properties, but at the same time its qubits
have to be accessible so that they can be manipulated to perform a computation and to
read out the results. A realistic implementation must strike a delicate balance between
these constraints, so that the relevant question is not how to build a quantum computer,
but rather, how good a quantum computer can be built.
System
τQ
τop
nop = λ−1
Nuclear spin
Electron spin
Ion trap (In+ )
Electron – Au
Electron – GaAs
Quantum dot
Optical cavity
Microwave cavity
10−2 − 108
10−3
10−1
10−8
10−10
10−6
10−5
100
10−3 − 10−6
10−7
10−14
10−14
10−13
10−9
10−14
10−4
105 − 1014
104
1013
106
103
103
109
104
Figure 7.1. Crude estimates for decoherence times τQ (seconds), operation times τop (seconds), and maximum
number of operations nop = λ−1 = τQ /τop for various candidate physical realizations of interacting systems of
quantum bits. Despite the number of entries in this table, only three fundamentally different qubit representations
are given: spin, charge, and photon. The ion trap utilizes either fine or hyperfine transitions of a trapped atom
(Section 7.6), which correspond to electron and nuclear spin flips. The estimates for electrons in gold and GaAs,
and in quantum dots are given for a charge representation, with an electrode or some confined area either
containing an electron or not. In optical and microwave cavities, photons (of frequencies from gigahertz to
hundreds of terahertz) populating different modes of the cavities represent the qubit. Take these estimates with a
grain of salt: they are only meant to give some perspective on the wide range of possibilities.
What physical systems are potentially good candidates for handling quantum information? A key concept in understanding the merit of a particular quantum computer
realization is the notion of quantum noise (sometimes called decoherence) , the subject of
Chapter 8: processes corrupting the desired evolution of the system. This is because the
length of the longest possible quantum computation is roughly given by the ratio of τQ ,
the time for which a system remains quantum-mechanically coherent, to τop , the time it
takes to perform elementary unitary transformations (which involve at least two qubits).
These two times are actually related to each other in many systems, since they are both
Conditions for quantum computation
279
determined by the strength of coupling of the system to the external world. Nevertheless,
λ = τop /τQ can vary over a surprisingly wide range, as shown in Figure 7.1.
These estimates give some insight into the merits of different possible physical realizations of a quantum information processing machine, but many other important sources
of noise and imperfections arise in actual implementations. For example, manipulations
of a qubit represented by two electronic levels of an atom by using light to cause transitions between levels would also cause transitions to other electronic levels with some
probability. These would also be considered noise processes, since they take the system
out of the two states which define the qubit. Generally speaking, anything which causes
loss of (quantum) information is a noise process – later, in Chapter 8, we discuss the
theory of quantum noise in more depth.
7.2 Conditions for quantum computation
Let us return to discuss in detail the four basic requirements for quantum computation
which were mentioned at the beginning of the previous section. These requirements are
the abilities to:
1.
2.
3.
4.
Robustly represent quantum information
Perform a universal family of unitary transformations
Prepare a fiducial initial state
Measure the output result
7.2.1 Representation of quantum information
Quantum computation is based on transformation of quantum states. Quantum bits are
two-level quantum systems, and as the simplest elementary building blocks for a quantum computer, they provide a convenient labeling for pairs of states and their physical
realizations. Thus, for example, the four states of a spin-3/2 particle, |m = +3/2, |m =
+1/2, |m = −1/2, |m = −3/2, could be used to represent two qubits.
For the purpose of computation, the crucial realization is that the set of accessible states
should be finite. The position x of a particle along a one-dimensional line is not generally
a good set of states for computation, even though the particle may be in a quantum state
|x, or even some superposition x cx |x. This is because x has a continuous range
of possibilities, and the Hilbert space has infinite size, so that in the absence of noise
the information capacity is infinite. For example, in a perfect world, the entire texts
of Shakespeare could be stored in (and retrieved from) the infinite number of digits in
the binary fraction x = 0.010111011001 . . .. This is clearly unrealistic; what happens in
reality is that the presence of noise reduces the number of distinguishable states to a finite
number.
In fact, it is generally desirable to have some aspect of symmetry dictate the finiteness of
the state space, in order to minimize decoherence. For example, a spin-1/2 particle lives
in a Hilbert space spanned by the | ↑ and | ↓ states; the spin state cannot be anything
outside this two-dimensional space, and thus is a nearly ideal quantum bit when well
isolated.
If the choice of representation is poor, then decoherence will result. For example,
as described in Box 7.1, a particle in a finite square well which is just deep enough to
contain two bound states would make a mediocre quantum bit, because transitions from
280
Quantum computers: physical realization
the bound states to the continuum of unbound states would be possible. These would lead
to decoherence since they could destroy qubit superposition states. For single qubits, the
figure of merit is the minimum lifetime of arbitrary superposition states; a good measure,
used for spin states √
and atomic systems, is T2 , the (‘transverse’) relaxation time of states
such as (|0 + |1)/ 2. Note that T1 , the (‘longitudinal’) relaxation time of the higher
energy |1 state, is just a classical state lifetime, which is usually longer than T2 .
Box 7.1: Square wells and qubits
A prototypical quantum system is known as the ‘square well,’ which is a particle in
a one-dimensional box, behaving according to Schrödinger’s equation, (2.86). The
Hamiltonian for this system is H = p2 /2m + V (x), where V (x) = 0 for 0 < x < L,
and V (x) = ∞ otherwise. The energy eigenstates, expressed as wavefunctions in
the position basis, are
)
' nπ (
2
(7.1)
sin
x ,
|ψn =
L
L
where n is an integer, and |ψn (t) = e−iEn t |ψn , with En = n2 π 2 m/2L2 . These
states have a discrete spectrum. In particular, suppose that we arrange matters such
that only the two lowest energy levels need be considered in an experiment. We
define an arbitrary wavefunction of interest as |ψ = a |ψ1 + b |ψ2 . Since
|ψ(t) = e−i(E1 +E2 )/2t ae−iωt |ψ1 + beiωt |ψ2 ,
(7.2)
where ω = (E1 − E2 )/2, we can just forget about everything except
a and b, and
a
write our state abstractly as the two-component vector |ψ =
. This two-level
b
system represents a qubit! Does our two-level system transform like a qubit? Under
time evolution, this qubit evolves under the effective Hamiltonian H = ωZ, which
can be disregarded by moving into the rotating frame. To perform operations to
this qubit, we perturb H. Consider the effect of adding the additional term
1
9π 2 x
−
(7.3)
δV (x) = −V0 (t)
16L L 2
to V (x). In the basis of our two-level system, this can be rewritten by taking the
matrix elements Vnm = ψn |δV (x)|ψm , giving V11 = V22 = 0, and V12 = V21 = V0 ,
such that, to lowest order in V0 , the perturbation to H is H1 = V0 (t)X. This
generates rotations about the x̂ axis. Similar techniques can be used to perform
other single qubit operations, by manipulating the potential function.
This shows how a single qubit can be represented by the two lowest levels in a square
well potential, and how simple perturbations of the potential can effect computational operations on the qubit. However, perturbations also introduce higher order
effects, and in real physical systems boxes are not infinitely deep, other levels begin
to enter the picture, and our two-level approximation begins to fail. Also, in reality,
the controlling system is just another quantum system, and it couples to the one we
are trying to do quantum computation with. These problems lead to decoherence.
Conditions for quantum computation
281
7.2.2 Performance of unitary transformations
Closed quantum systems evolve unitarily as determined by their Hamiltonians, but to
perform quantum computation one must be able to control the Hamiltonian to effect an arbitrary selection from a universal family of unitary transformations (as described in Section 4.5). For example, a single spin might evolve under the Hamiltonian
H = Px (t)X + Py (t)Y , where P{x,y} are classically controllable parameters. From Exercise 4.10, we know that by manipulating Px and Py appropriately, one can perform
arbitrary single spin rotations.
According to the theorems of Section 4.5, any unitary transform can be composed
from single spin operations and controlledgates, and thus realization of those two
kinds of quantum logic gates are natural goals for experimental quantum computation.
However, implicitly required also is the ability to address individual qubits, and to apply
these gates to select qubits or pairs of qubits. This is not simple to accomplish in many
physical systems. For example, in an ion trap, one can direct a laser at one of many
individual ions to selectively excite it, but only as long as the ions are spatially separated
by a wavelength or more.
Unrecorded imperfections in unitary transforms can lead to decoherence. In Chapter 8
we shall see how the average effect of random kicks (small rotations to a single spin about
its ẑ axis) leads to loss of quantum information which is represented by the relative phases
in a quantum state. Similarly, the cumulative effect of systematic errors is decoherence,
when the information needed to be able to reverse them is lost. Furthermore, the control
parameters in the Hamiltonian are only approximately classical controls: in reality, the
controlling system is just another quantum system, and the true Hamiltonian should
include the back-action of the control system upon the quantum computer. For example,
instead of Px (t) in the above example, one actually has a Jaynes–Cummings type atom–
†
photon interaction Hamiltonian (Section 7.5.2), with Px (t) =
k ωk (t)(ak + ak ) or
something similar being the cavity photon field. After interacting with a qubit, a photon
can carry away information about the state of the qubit, and this is thus a decoherence
process.
Two important figures of merit for unitary transforms are the minimum achievable
fidelity F (Chapter 9), and the maximum time top required to perform elementary opgate.
erations such as single spin rotations or a controlled7.2.3 Preparation of fiducial initial states
One of the most important requirements for being able to perform a useful computation,
even classically, is to be able to prepare the desired input. If one has a box which can
perform perfect computations, what use is it if numbers cannot be input? With classical
machines, establishing a definite input state is rarely a difficulty – one merely sets some
switches in the desired configuration and that defines the input state. However, with
quantum systems this can be very difficult, depending on the realization of qubits.
Note that it is only necessary to be able to (repeatedly) produce one specific quantum
state with high fidelity, since a unitary transform can turn it into any other desired input
state. For example, being able to put n spins into the |00 . . . 0 state is good enough. The
fact that they may not stay there for very long due to thermal heating is a problem with
the choice of representation.
Input state preparation is a significant problem for most physical systems. For example,
ions can be prepared in good input states by physically cooling them into their ground state
282
Quantum computers: physical realization
(Section 7.6), but this is challenging. Moreover, for physical systems in which ensembles
of quantum computers are involved, extra concerns arise. In nuclear magnetic resonance
(Section 7.7), each molecule can be thought of as a single quantum computer, and a large
number of molecules is needed to obtain a measurable signal strength. Although qubits
can remain in arbitrary superposition states for relatively long times, it is difficult to put
all of the qubits in all of the molecules into the same state, because the energy difference
ω between the |0 and |1 states is much smaller than kB T . On the other hand, simply
letting the system equilibrate establishes it in a very well-known state, the thermal one,
with the density matrix ρ ≈ e−H/kB T /Z, where Z is a normalization factor required to
maintain tr(ρ) = 1.
Two figures of merit are relevant to input state preparation: the minimum fidelity
with which the initial state can be prepared in a given state ρin , and the entropy of
ρin . The entropy is important because, for example, it is very easy to prepare the state
ρin = I/2n with high fidelity, but that is a useless state for quantum computation, since
it is invariant under unitary transforms! Ideally, the input state is a pure state, with zero
entropy. Generally, input states with non-zero entropy reduce the accessibility of the
answer from the output result.
7.2.4 Measurement of output result
What measurement capability is required for quantum computation? For the purpose of
the present discussion, let us think of measurement as a process of coupling one or more
qubits to a classical system such that after some interval of time, the state of the qubits
is indicated by the state of the classical system. For example, a qubit state a|0 + b|1,
represented by the ground and excited states of a two-level atom, might be measured by
pumping the excited state and looking for fluorescence. If an electrometer indicates that
fluorescence had been detected by a photomultiplier tube, then the qubit would collapse
into the |1 state; this would happen with probability |b|2 . Otherwise, the electrometer
would detect no charge, and the qubit would collapse into the |0 state.
An important characteristic of the measurement process for quantum computation is
the wavefunction collapse which describes what happens when a projective measurement
is performed (Section 2.2.5). The output from a good quantum algorithm is a superposition state which gives a useful answer with high probability when measured. For
example, one step in Shor’s quantum factoring algorithm is to find an integer r from
the measurement result, which is an integer close to qc/r, where q is the dimension
of a Hilbert space. The output state is actually in a nearly uniform superposition of all
possible values of c, but a measurement collapses this into a single, random integer, thus
allowing r to be determined with high probability (using a continued fraction expansion,
as was described in Chapter 5).
Many difficulties with measurement can be imagined; for example, inefficient photon
counters and amplifier thermal noise can reduce the information obtained about measured qubit states in the scheme just described. Furthermore, projective measurements
(sometimes called ‘strong’ measurements) are often difficult to implement. They require
that the coupling between the quantum and classical systems be large, and switchable.
Measurements should not occur when not desired; otherwise they can be a decoherence
process.
Surprisingly, however, strong measurements are not necessary; weak measurements
which are performed continuously and never switched off, are usable for quantum com-
Harmonic oscillator quantum computer
283
putation. This is made possible by completing the computation in time short compared
with the measurement coupling, and by using large ensembles of quantum computers.
These ensembles together give an aggregate signal which is macroscopically observable
and indicative of the quantum state. Use of an ensemble introduces additional problems.
For example, in the factoring algorithm, if the measurement output is qc/r, the algorithm would fail because c, the average value of c, is not necessarily an integer (and
thus the continued fraction expansion would not be possible). Fortunately, it is possible
to modify quantum algorithms to work with ensemble average readouts. This will be
discussed further in Section 7.7.
A good figure of merit for measurement capability is the signal to noise ratio (SNR).
This accounts for measurement inefficiency as well as inherent signal strength available
from coupling a measurement apparatus to the quantum system.
7.3 Harmonic oscillator quantum computer
Before continuing on to describe a complete physical model for a realizable quantum
computer, let us pause for a moment to consider a very elementary system – the simple
harmonic oscillator – and discuss why it does not serve as a good quantum computer.
The formalism used in this example will also serve as a basis for studying other physical
systems.
7.3.1 Physical apparatus
An example of a simple harmonic oscillator is a particle in a parabolic potential well,
V (x) = mω 2 x2 /2. In the classical world, this could be a mass on a spring, which oscillates
back and forth as energy is transfered between the potential energy of the spring and the
kinetic energy of the mass. It could also be a resonant electrical circuit, where the energy
sloshes back and forth between the inductor and the capacitor. In these systems, the total
energy of the system is a continuous parameter.
In the quantum domain, which is reached when the coupling to the external world
becomes very small, the total energy of the system can only take on a discrete set of
values. An example is given by a single mode of electromagnetic radiation trapped in
a high Q cavity; the total amount of energy (up to a fixed offset) can only be integer
multiples of ω, an energy scale which is determined by the fundamental constant and
the frequency of the trapped radiation, ω.
The set of discrete energy eigenstates of a simple harmonic oscillator can be labeled
as |n, where n = 0, 1, . . . , ∞. The relationship to quantum computation comes by
taking a finite subset of these states to represent qubits. These qubits will have lifetimes
determined by physical parameters such as the cavity quality factor Q, which can be made
very large by increasing the reflectivity of the cavity walls. Moreover, unitary transforms
can be applied by simply allowing the system to evolve in time. However, there are
problems with this scheme, as will become clear below. We begin by studying the system
Hamiltonian, then discuss how one might implement simple quantum logic gates such
.
as the controlled-
284
Quantum computers: physical realization
7.3.2 The Hamiltonian
The Hamiltonian for a particle in a one-dimensional parabolic potential is
1
p2
(7.4)
+ mω 2 x2 ,
2m 2
where p is the particle momentum operator, m is the mass, x is the position operator, and
ω is related to the potential depth. Recall that x and p are operators in this expression
(see Box 7.2), which can be rewritten as
1
,
(7.5)
H = ω a† a +
2
H=
where a† and a are creation and annihilation operators, defined as
'
(
1
a= √
mωx + ip
2mω
'
(
1
a† = √
mωx − ip .
2mω
(7.6)
(7.7)
The zero point energy ω/2 contributes an unobservable overall phase factor, which can
be disregarded for our present purpose.
The eigenstates |n of H, where n = 0, 1, . . ., have the properties
a† a|n = n|n
√
a† |n = n + 1 |n + 1
√
a|n = n |n − 1 .
(7.10)
(7.11)
(7.12)
Later, we will find it convenient to express interactions with a simple harmonic oscillator
by introducing additional terms involving a and a† , and interactions between oscillators
with terms such as a†1 a2 + a1 a†2 . For now, however, we confine our attention to a single
oscillator.
Exercise 7.1: Using the fact that x and p do not commute, and that in fact
[x, p] = i, explicitly show that a† a = H/ω − 1/2.
Exercise 7.2: Given that [x, p] = i, compute [a, a† ].
Exercise 7.3: Compute [H, a] and use the result to show that if |ψ is an eigenstate of
H with energy E ≥ nω, then an |ψ is an eigenstate with energy E − nω.
Exercise 7.4: Show that |n =
† n
(a
√ ) |0.
n!
Exercise 7.5: Verify that Equations (7.11) and (7.12) are consistent with (7.10) and
the normalization condition n|n = 1.
Time evolution of the eigenstates is given by solving the Schrödinger equation, (2.86),
from which we find that the state |ψ(0) = n cn (0)|n evolves in time to become
|ψ(t) = e−iHt/ |ψ(0) =
n
cn e−inωt |n .
(7.13)
We will assume for the purpose of discussion that an arbitrary state can be perfectly
prepared, and that the state of the system can be projectively measured (Section 2.2.3),
Harmonic oscillator quantum computer
285
Box 7.2: The quantum harmonic oscillator
The harmonic oscillator is an extremely important and useful concept in the quantum description of the physical world, and a good way to begin to understand its
properties is to determine the energy eigenstates of its Hamiltonian, (7.4). One way
to do this is simply to solve the Schrödinger equation
2 d2 ψn (x) 1
(7.8)
+ mω 2 x2 ψn (x) = Eψn (x)
2m dx2
2
7
for ψn (x) and the eigenenergies E, subject to ψ(x) → 0 at x = ±∞, and |ψ(x)|2 =
1; the first five solutions are sketched here:
These wavefunctions describe the probability amplitudes that a particle in the harmonic oscillator will be found at different positions within the potential.
Although these pictures may give some intuition about what a physical system
is doing in co-ordinate space, we will generally be more interested in the abstract
algebraic properties of the states. Specifically, suppose |ψ satisfies (7.8) with energy
E. Then defining operators a and a† as in (7.6)–(7.7), we find that since [H, a† ] =
ωa† ,
'
(
Ha† |ψ = [H, a† ] + a† H |ψ = (ω + E)a† |ψ ,
(7.9)
that is, a† |ψ is an eigenstate of H, with energy E + ω! Similarly, a|ψ is an
eigenstate with energy E − ω. Because of this, a† and a are called raising and
n
lowering operators. It follows that a† |ψ are eigenstates for any integer n, with
energies E + nω. There are thus an infinite number of energy eigenstates, whose
energies are equally spaced apart, by ω. Moreover, since H is positive definite,
there must be some |ψ0 for which a|ψ0 = 0; this is the ground state – the
eigenstate of H with lowest energy. These results efficiently capture the essence of
the quantum harmonic oscillator, and allow us to use a compact notation |n for
the eigenstates, where n is an integer, and H|n = (n + 1/2)|n. We shall often
work with |n, a, and a† in this chapter, as harmonic oscillators arise in the guise
of many different physical systems.
286
Quantum computers: physical realization
but otherwise, there are no interactions with the external world, so that the system is
perfectly closed.
7.3.3 Quantum computation
Suppose we want to perform quantum computation with the single simple harmonic
oscillator described above. What can be done? The most natural choice for representation
of qubits are the energy eigenstates |n. This choice allows us to perform a controlledgate in the following way. Recall that this transformation performs the mapping
|00L
|01L
|10L
|11L
→
→
→
→
|00L
|01L
|11L
|10L ,
(7.14)
on two qubit states (here, the subscript L is used to clearly distinguish ‘logical’ states in
contrast to the harmonic oscillator basis states). Let us encode these two qubits using the
mapping
|00L
|01L
|10L
|11L
=
=
=
=
|0
|2
√
(|4 + |1)/ √2
(|4 − |1)/ 2 .
(7.15)
Now suppose that at t = 0 the system is started in a state spanned by these basis states,
and we simply evolve the system forward to time t = π/ω. This causes the energy
eigenstates to undergo the transformation |n → exp(−iπa† a)|n = (−1)n |n, such that
|0, |2, and |4 stay unchanged, but |1 → −|1. As a result, we obtain the desired
gate transformation.
controlledIn general, a necessary and sufficient condition for a physical system to be able to
perform a unitary transform U is simply that the time evolution operator for the system,
T = exp(−iHt), defined by its Hamiltonian H, has nearly the same eigenvalue spectrum
as U . In the case above, the controlledgate was simple to implement because it
only has eigenvalues +1 and −1; it was straightforward to arrange an encoding to obtain
the same eigenvalues from the time evolution operator for the harmonic oscillator. The
Hamiltonian for an oscillator could be perturbed to realize nearly any eigenvalue spectrum, and any number of qubits could be represented by simply mapping them into the
infinite number of eigenstates of the system. This suggests that perhaps one might be
able to realize an entire quantum computer in a single simple harmonic oscillator!
7.3.4 Drawbacks
Of course, there are many problems with the above scenario. Clearly, one will not always
know the eigenvalue spectrum of the unitary operator for a certain quantum computation,
even though one may know how to construct the operator from elementary gates. In
fact, for most problems addressed by quantum algorithms, knowledge of the eigenvalue
spectrum is tantamount to knowledge of the solution!
Another obvious problem is that the technique used above does not allow one computation to be cascaded with another, because in general, cascading two unitary transforms
results in a new transform with unrelated eigenvalues.
Finally, the idea of using a single harmonic oscillator to perform quantum computation
Optical photon quantum computer
287
is flawed because it neglects the principle of digital representation of information. A
Hilbert space of 2n dimensions mapped into the state space of a single harmonic oscillator
would have to allow for the possibility of states with energy 2n ω. In contrast, the same
Hilbert space could be obtained by using n two-level quantum systems, which has an
energy of at most nω. Similar comparisons can be made between a classical dial with
2n settings, and a register of n classical bits. Quantum computation builds upon digital
computation, not analog computation.
The main features of the harmonic oscillator quantum computer are summarized below
(each system we consider will be summarized similarly, at the end of the corresponding
section). With this, we leave behind us the study of single oscillators, and turn next to
systems of harmonic oscillators, made of photons and atoms.
Harmonic oscillator quantum computer
• Qubit representation: Energy levels |0, |1, . . ., |2n of a single quantum
oscillator give n qubits.
• Unitary evolution: Arbitrary transforms U are realized by matching their
eigenvalue spectrums to that given by the Hamiltonian H = a† a.
• Initial state preparation: Not considered.
• Readout: Not considered.
• Drawbacks: Not a digital representation! Also, matching eigenvalues to realize
transformations is not feasible for arbitrary U , which generally have unknown
eigenvalues.
7.4 Optical photon quantum computer
An attractive physical system for representing a quantum bit is the optical photon. Photons are chargeless particles, and do not interact very strongly with each other, or even
with most matter. They can be guided along long distances with low loss in optical fibers,
delayed efficiently using phase shifters, and combined easily using beamsplitters. Photons
exhibit signature quantum phenomena, such as the interference produced in two-slit experiments. Furthermore, in principle, photons can be made to interact with each other,
using nonlinear optical media which mediate interactions. There are problems with this
ideal scenario; nevertheless, many things can be learned from studying the components,
architecture, and drawbacks of an optical photon quantum information processor, as we
shall see in this section.
7.4.1 Physical apparatus
Let us begin by considering what single photons are, how they can represent quantum
states, and the experimental components useful for manipulating photons. The classical
behavior of phase shifters, beamsplitters, and nonlinear optical Kerr media is described.
Photons can represent qubits in the following manner. As we saw in the discussion
of the simple harmonic oscillator, the energy in an electromagnetic cavity is quantized
in units of ω. Each such quantum is called a photon. It is possible for a cavity to
contain a superposition of zero or one photon, a state which could be expressed as a qubit
c0 |0 + c1 |1, but we shall do something different. Let us consider two cavities, whose
total energy is ω, and take the two states of a qubit as being whether the photon is in
288
Quantum computers: physical realization
one cavity (|01) or the other (|10). The physical state of a superposition would thus be
written as c0 |01 + c1 |10; we shall call this the dual-rail representation. Note that we
shall focus on single photons traveling as a wavepacket through free space, rather than
inside a cavity; one can imagine this as having a cavity moving along with the wavepacket.
Each cavity in our qubit state will thus correspond to a different spatial mode.
One scheme for generating single photons in the laboratory is by attenuating the output
of a laser. A laser outputs a state known as a coherent state, |α, defined as
2
|α = e−|α| /2
∞
n=0
αn
√ |n ,
n!
(7.16)
where |n is an n-photon energy eigenstate. This state, which has been the subject of
thorough study in the field of quantum optics, has many beautiful properties which we
shall not describe here. It suffices to understand just that coherent states are naturally
radiated from driven oscillators such as a laser when pumped high above its lasing threshold. Note that the mean energy is α|n|α = |α|2 . When attenuated, a coherent state just
becomes a weaker coherent state, and a weak coherent state can be made to have just one
photon, with high probability.
Exercise 7.6: (Eigenstates of photon annihilation) Prove that a coherent state is
an eigenstate of the photon annihilation operator, that is, show a|α = λ|α for
some constant λ.
√
√
√
√
For example, for α = 0.1, we obtain the state 0.90 |0+ 0.09 |1+ 0.002 |2+· · ·.
Thus if light ever makes it through the attenuator, one knows it is a single photon with
probability better than 95%; the failure probability is thus 5%. Note also that 90% of the
time, no photons come through at all; this source thus has a rate of 0.1 photons per unit
time. Finally, this source does not indicate (by means of some classical readout) when a
photon has been output or not; two of these sources cannot be synchronized.
Better synchronicity can be achieved using parametric down-conversion. This involves
sending photons of frequency ω0 into a nonlinear optical medium such as KH2 PO4 to
generate photon pairs at frequencies ω1 + ω2 = ω0 . Momentum is also conserved, such
that k1 + k2 = k3 , so that when a single ω2 photon is (destructively) detected, then a single
ω1 photon is known to exist (Figure 7.2). By coupling this to a gate, which is opened
only when a single photon (as opposed to two or more) is detected, and by appropriately
delaying the outputs of multiple down-conversion sources, one can, in principle, obtain
multiple single photons propagating in time synchronously, within the time resolution of
the detector and gate.
Single photons can be detected with high quantum efficiency for a wide range of
wavelengths, using a variety of technologies. For our purposes, the most important characteristic of a detector is its capability of determining, with high probability, whether
zero or one photon exists in a particular spatial mode. For the dual-rail representation,
this translates into a projective measurement in the computational basis. In practice, imperfections reduce the probability of being able to detect a single photon; the quantum
efficiency η (0 ≤ η ≤ 1) of a photodetector is the probability that a single photon incident
on the detector generates a photocarrier pair that contributes to detector current. Other
important characteristics of a detector are its bandwidth (time responsivity), noise, and
‘dark counts’ which are photocarriers generated even when no photons are incident.
Optical photon quantum computer
289
w1
9:
;:
w0
w2
Figure 7.2. Parametric down-conversion scheme for generation of single photons.
Three of the most experimentally accessible devices for manipulating photon states are
mirrors, phase shifters and beamsplitters. High reflectivity mirrors reflect photons and
change their propagation direction in space. Mirrors with 0.01% loss are not unusual.
We shall take these for granted in our scenario. A phase shifter is nothing more than
a slab of transparent medium with index of refraction n different from that of free
space, n0 ; for example, ordinary borosilicate glass has n ≈ 1.5n0 at optical wavelengths.
Propagation in such a medium through a distance L changes a photon’s phase by eikL ,
where k = nω/c0 , and c0 is the speed of light in vacuum. Thus, a photon propagating
through a phase shifter will experience a phase shift of ei(n−n0 )Lω/c0 compared to a photon
going the same distance through free space.
Another useful component, the beamsplitter, is nothing more than a partially silvered
piece of glass, which reflects a fraction R of the incident light, and transmits 1−R. In the
laboratory, a beamsplitter is usually fabricated from two prisms, with a thin metallic layer
sandwiched in-between, schematically drawn as shown in Figure 7.3. It is convenient to
define the angle θ of a beamsplitter as cos θ = R; note that the angle parameterizes
the amount of partial reflection, and does not necessarily have anything to do with the
physical orientation of the beamsplitter. The two inputs and two outputs of this device
are related by
aout = ain cos θ + bin sin θ
(7.17)
bout = −ain sin θ + bin cos θ ,
(7.18)
where classically we think of a and b as being the electromagnetic fields of the radiation at
the two ports. Note that in this definition we have chosen a non-standard phase convention
convenient for our purposes. In the special case of a 50/50 beamsplitter, θ = 45◦ .
Nonlinear optics provides one final useful component for this exercise: a material
>
>-=
>
=+>
=
=+>
=
=->
Figure 7.3. Schematic of an optical beamsplitter, showing the two input ports, the two output ports, and the phase
conventions for a 50/50 beamsplitter (θ = π/4). The beamsplitter on the right is the inverse of the one on the left
(the two are distinguished by the dot drawn inside). The input-output relations for the mode operators a and b are
given for θ = π/4.
290
Quantum computers: physical realization
whose index of refraction n is proportional to the total intensity I of light going through
it:
n(I) = n + n2 I .
(7.19)
This is known as the optical Kerr effect, and it occurs (very weakly) in materials as
mundane as glass and sugar water. In doped glasses, n2 ranges from 10−14 to 10−7 cm2 /W,
and in semiconductors, from 10−10 to 102 . Experimentally, the relevant behavior is that
when two beams of light of equal intensity are nearly co-propagated through a Kerr
medium, each beam will experience an extra phase shift of ein2 ILω/c0 compared to what
happens in the single beam case. This would be ideal if the length L could be arbitrarily
long, but unfortunately that fails because most Kerr media are also highly absorptive,
or scatter light out of the desired spatial mode. This is the primary reason why a single
photon quantum computer is impractical, as we shall discuss in Section 7.4.3.
We turn next to a quantum description of these optical components.
7.4.2 Quantum computation
Arbitrary unitary transforms can be applied to quantum information, encoded with single
photons in the c0 |01+c1 |10 dual-rail representation, using phase shifters, beamsplitters,
and nonlinear optical Kerr media. How this works can be understood in the following
manner, by giving a quantum-mechanical Hamiltonian description of each of these devices.
The time evolution of a cavity mode of electromagnetic radiation is modeled quantummechanically by a harmonic oscillator, as we saw in Section 7.3.2. |0 is the vacuum state,
n
a†
|1 = a† |0 is a single photon state, and in general, |n = √
|0 is an n-photon state,
n!
†
where a is the creation operator for the mode. Free space evolution is described by the
Hamiltonian
H = ωa† a ,
(7.20)
and applying (7.13), we find that the state |ψ = c0 |0 + c1 |1 evolves in time to become |ψ(t) = c0 |0 + c1 e−iωt |1. Note that the dual-rail representation is convenient
because free evolution only changes |ϕ = c0 |01 + c1 |10 by an overall phase, which is
undetectable. Thus, for that manifold of states, the evolution Hamiltonian is zero.
Phase shifter. A phase shifter P acts just like normal time evolution, but at a different
rate, and localized to only the modes going through it. That is because light slows down
in a medium with larger index of refraction; specifically, it takes Δ ≡ (n − n0 )L/c0 more
time to propagate a distance L in a medium with index of refraction n than in vacuum.
For example, the action of P on the vacuum state is to do nothing: P |0 = |0, but on a
single photon state, one obtains P |1 = eiΔ |1.
P performs a useful logical operation on a dual-rail state. Placing a phase shifter in
one mode retards its phase evolution with respect to another mode, which travels the
same distance but without going through the shifter. For dual-rail states this transforms
c0 |01 + c1 |10 to c0 e−iΔ/2 |01 + c1 eiΔ/2 |10, up to an irrelevant overall phase. Recall from
Section 4.2 that this operation is nothing more than a rotation,
Rz (Δ) = e−iZΔ/2 ,
(7.21)
where we take as the logical zero |0L = |01 and one |1L = |10, and Z is the usual
291
Optical photon quantum computer
Pauli operator. One can thus think of P as resulting from time evolution under the
Hamiltonian
H = (n0 − n)Z ,
(7.22)
where P = exp(−iHL/c0 ).
Exercise 7.7: Show that the circuit below transforms a dual-rail state by
iπ
e
0
|ψin ,
|ψout =
0 1
(7.23)
if we take the top wire to represent the |01 mode, and |10 the bottom mode,
and the boxed π to represent a phase shift by π:
p
y out
y in
Note that in such ‘optical circuits’, propagation in space is explicitly represented
by putting in lumped circuit elements such as in the above, to represent phase
evolution. In the dual-rail representation, evolution according to (7.20) changes
the logical state only by an unobservable global phase, and thus we are free to
disregard it and keep only relative phase shifts.
Exercise 7.8: Show that P |α = |αeiΔ where |α is a coherent state (note that, in
general, α is a complex number!).
Beamsplitter. A similar Hamiltonian description of the beamsplitter also exists, but
instead of motivating it phenomenologically, let us begin with the Hamiltonian and show
how the expected classical behavior, Equations (7.17)–(7.18) arises from it. Recall that the
beamsplitter acts on two modes, which we shall describe by the creation (annihilation)
operators a (a† ) and b (b† ). The Hamiltonian is
Hbs = iθ ab† − a† b ,
and the beamsplitter performs the unitary operation
B = exp θ a† b − ab† .
(7.24)
(7.25)
The transformations effected by B on a and b, which will later be useful, are found to
be
BaB † = a cos θ + b sin θ
BbB † = −a sin θ + b cos θ .
and
(7.26)
We verify these relations using the Baker–Campbell–Hausdorf formula (also see Exercise 4.49)
eλG Ae−λG =
∞
n=0
λn
Cn ,
n!
(7.27)
where λ is a complex number, A, G, and Cn are operators, and Cn is defined recursively
as the sequence of commutators C0 = A, C1 = [G, C0 ], C2 = [G, C1 ], C3 = [G, C2 ], . . .,
Cn = [G, Cn−1 ]. Since it follows from [a, a† ] = 1 and [b, b† ] = 1 that [G, a] = −b and
[G, b] = a, for G ≡ a† b−ab† , we obtain for the expansion of BaB † the series coefficients
292
Quantum computers: physical realization
C0 = a, C1 = [G, a] = −b, C2 = [G, C1 ] = −a, C3 = [G, C2 ] = −[G, C0 ] = b, which in
general are
Cn
even
= in a
n+1
Cn odd = i
(7.28)
b.
(7.29)
From this, our desired result follows straightforwardly:
BaB † = eθG ae−θG
∞
θn
=
Cn
n!
n=0
=
(7.30)
(7.31)
(iθ)n
(iθ)n
a+i
b
n!
n!
n even
n odd
= a cos θ − b sin θ .
(7.32)
(7.33)
†
The transform BbB is trivially found by swapping a and b in the above solution. Note
that the beamsplitter operator arises from a deep relationship between the beamsplitter
and the algebra of SU (2), as explained in Box 7.3.
In terms of quantum logic gates, B performs a useful operation. First note that B|00
= |00, that is, when no photons in either input mode exist, no photons will exist in
either output mode. When one photon exists in mode a, recalling that |1 = a† |0, we
find that
B|01 = Ba† |00 = Ba† B † B|00 = (a† cos θ + b† sin θ)|00 = cos θ|01 + sin θ|10 .
(7.34)
Similarly, B|10 = cos θ|10 − sin θ|01. Thus, on the |0L and |1L manifold of states,
we may write B as
cos θ − sin θ
B=
= eiθY .
(7.35)
sin θ cos θ
Phase shifters and beamsplitters together allow arbitrary single qubit operations to be
performed to our optical qubit. This a consequence of Theorem 4.1 on page 175, which
states that all single qubit operations can be generated from ẑ-axis rotations Rz (α) =
exp(−iαZ/2), and ŷ-axis rotations, Ry (α) = exp(−iαY /2). A phase shifter performs Rz
rotations, and a beamsplitter performs Ry rotations.
Exercise 7.9: (Optical Hadamard gate) Show that the following circuit acts as a√
Hadamard gate on dual-rail
√single photon states, that is, |01 → (|01 + |10)/ 2
and |10 → (|01 − |10)/ 2 up to an overall phase:
p
Exercise 7.10: (Mach–Zehnder interferometer) Interferometers are optical tools
used to measure small phase shifts, which are constructed from two
beamsplitters. Their basic principle of operation can be understood by this
simple exercise.
1. Note that this circuit performs the identity operation:
293
Optical photon quantum computer
Box 7.3: SU (2) Symmetry and quantum beamsplitters
There is an interesting connection between the Lie group SU (2) and the algebra of
two coupled harmonic oscillators, which is useful for understanding the quantum
beamsplitter transformation. Identify
a† a − b† b → Z
(7.36)
†
a b → σ+
(7.37)
†
ab → σ− ,
(7.38)
where Z is the Pauli operator, and σ± = (X ± iY )/2 are raising and lowering
operators defined in terms of Pauli X and Y . From the commutation relations for a,
a† , b, and b† , it is easy to verify that these definitions satisfy the usual commutation
relations for the Pauli operators, (2.40). Also note that the total number operator,
a† a + b† b, commutes with σz , σ+ , and σ− , as it should, being an invariant quantity
under rotations in the SU (2) space. Using X = a† b + ab† and Y = −i(a† b − ab† )
in the traditional SU (2) rotation operator
R(n̂, θ) = e−iθσ·n̂/2
(7.39)
gives us the desired beamsplitter operator when n̂ is taken to be the −ŷ-axis.
>
>
=
=
2. Compute the rotation operation (on dual-rail states) which this circuit
performs, as a function of the phase shift ϕ:
j
>
=
Exercise 7.11: What is B|2, 0 for θ = π/4?
Exercise 7.12: (Quantum beamsplitter with classical inputs) What is B|α|β
where |α and |β are two coherent states as in Equation (7.16)? (Hint: recall
† n
that |n = (a√n!) |0.)
Nonlinear Kerr media. The most important effect of a Kerr medium is the cross
phase modulation it provides between two modes of light. That is classically described
by the n2 term in (7.19), which is effectively an interaction between photons, mediated
by atoms in the Kerr medium. Quantum-mechanically, this effect is described by the
Hamiltonian
Hxpm = −χa† ab† b ,
(7.40)
where a and b describe two modes propagating through the medium, and for a crystal of
294
Quantum computers: physical realization
length L we obtain the unitary transform
†
K = eiχLa
ab† b
.
(7.41)
χ is a coefficient related to n2 , and the third order nonlinear susceptibility coefficient
usually denoted as χ(3) . That the expected classical behavior arises from this Hamiltonian
is left as Exercise 7.14 for the reader.
gate can be constructed
By combining Kerr media with beamsplitters, a controlledin the following manner. For single photon states, we find that
K|00 = |00
(7.42)
K|10 = |10
(7.44)
K|01 = |01
K|11 = e
iχL
(7.43)
|11 ,
(7.45)
and let us take χL = π, such that K|11 = −|11. Now consider two dual-rail states,
that is, four modes of light. These live in a space spanned by the four basis states
|e00 = |1001, |e01 = |1010, |e10 = |0101, |e11 = |0110. Note that we have flipped
the usual order of the two modes for the first pair, for convenience (physically, the two
modes are easily swapped using mirrors). Now, if a Kerr medium is applied to act upon
the two middle modes, then we find that K|ei = |ei for all i except K|e11 = −|e11 .
operation can be factored into
This is useful because the controlled⎡
⎤
⎤
1 1 0 0
0 0
⎢
⎥
0 0 ⎥
⎥ √1 ⎢ 1 −1 0 0 ⎥
⎣
1 0
2 0 0 1 1
0 0 1 −1
0 −1
#
#
!"
#
!
UCN
K
I ⊗H
,
(7.46)
where H is the single qubit Hadamard transform (simply implemented with beamsplitters
and phase shifters), and K is the Kerr interaction we just considered, with χL = π. Such
an apparatus has been considered before, for constructing a reversible classical optical
logic gate, as described in Box 7.4; in the single photon regime, it also functions as a
quantum logic gate.
can be constructed from Kerr media, and arbitrary single
Summarizing, the
qubit operations realized using beamsplitters and phase shifters. Single photons can be
created using attenuated lasers, and detected with photodetectors. Thus, in theory, a
quantum computer can be implemented using these optical components!
⎡
1
⎢0
⎢
⎣0
0
"
0
1
0
0
0
0
0
1
⎡
⎤⎡
⎤
1 1 0 0
0
1
⎢ 1 −1 0 0 ⎥ ⎢ 0
1
0⎥
=
⎥⎢
⎥ √ ⎢
1
2 ⎣0 0 1 1 ⎣0
0 0 1 −1
0
0
! "
#
!"
I ⊗H
0
1
0
0
Exercise 7.13: (Optical Deutsch–Jozsa quantum circuit) In Section 1.4.4
(page 34), we described a quantum circuit for solving the one-bit Deutsch–Jozsa
problem. Here is a version of that circuit for single photon states (in the dual-rail
representation), using beamsplitters, phase shifters, and nonlinear Kerr media:
Optical photon quantum computer
295
Box 7.4: The quantum optical Fredkin gate
An optical Fredkin gate can be built using two beamsplitters and a nonlinear Kerr
medium as shown in this schematic diagram:
c
c'
Kerr
b
b'
a'
a
†
This performs the unitary transform U = B KB, where B is a 50/50 beamsplitter,
†
†
K is the Kerr cross phase modulation operator K = eiξ b b c c , and ξ = χL is the
product of the coupling constant and the interaction distance. This simplifies to
give
†
b−a
b − a†
†
(7.47)
U=exp iξc c
2
2
π †
ξ †
=ei 2 b b e 2 c
c(a† b−b† a)
π †
ξ
†
e−i 2 b b ei 2 a
a c† c
ξ †
ei 2 b
b c† c
.
(7.48)
The first and third exponentials are constant phase shifts, and the last two phase
shifts come from cross phase modulation. All those effects are not fundamental,
and can be compensated for. The interesting term is the second exponential, which
is defines the quantum Fredkin operator
ξ † †
†
F (ξ) = exp c c (a b − b a) .
(7.49)
2
The usual (classical) Fredkin gate operation is obtained for ξ = π, in which case
when no photons are input at c, then a′ = a and b′ = b, but when a single photon is
input at c, then a′ = b and b′ = a. This can be understood by realizing that F (χ) is
like a controlled-beamsplitter operator, where the rotation angle is ξc† c. Note that
this description does not use the dual-rail representation; in that representation,
gate.
this Fredkin gate corresponds to a controlled-
@
?
>
7B
=
1. Construct circuits for the four possible classical functions Uf using Fredkin
gates and beamsplitters.
2. Why are no phase shifters necessary in this construction?
3. For each Uf show explicitly how interference can be used to explain how the
quantum algorithm works.
296
Quantum computers: physical realization
4. Does this implementation work if the single photon states are replaced by
coherent states?
Exercise 7.14: (Classical cross phase modulation) To see that the expected
classical behavior of a Kerr medium is obtained from the definition of K,
Equation (7.41), apply it to two modes, one with a coherent state and the other
in state |n; that is, show that
K|α|n = |αeiχLn |n .
(7.50)
Use this to compute
ρa = Trb K|α|ββ|α|K †
= e−|β|
(7.51)
2m
2
m
|β|
|αeiχLm αeiχLm | ,
m!
(7.52)
and show that the main contribution to the sum is for m = |β|2 .
7.4.3 Drawbacks
The single photon representation of a qubit is attractive. Single photons are relatively
simple to generate and measure, and in the dual-rail representation, arbitrary single qubit
operations are possible. Unfortunately, interacting photons is difficult – the best nonlinear Kerr media available are very weak, and cannot provide a cross phase modulation of
π between single photon states. In fact, because a nonlinear index of refraction is usually
obtained by using a medium near an optical resonance, there is always some absorption
associated with the nonlinearity, and it can theoretically be estimated that in the best such
arrangement, approximately 50 photons must be absorbed for each photon which experiences a π cross phase modulation. This means that the outlook for building quantum
computers from traditional nonlinear optics components is slim at best.
Nevertheless, from studying this optical quantum computer, we have gained some
valuable insight into the nature of the architecture and system design of a quantum
computer. We now can see what an actual quantum computer might look like in the
laboratory (if only sufficiently good components were available to construct it), and a
striking feature is that it is constructed nearly completely from optical interferometers.
In the apparatus, information is encoded both in the photon number and the phase of
the photon, and interferometers are used to convert between the two representations.
Although it is feasible to construct stable optical interferometers, if an alternate, massive
representation of a qubit were chosen, then it could rapidly become difficult to build
stable interferometers because of the shortness of typical de Broglie wavelengths. Even
with the optical representation, the multiple interlocked interferometers which would
be needed to realize a large quantum algorithm would be a challenge to stabilize in the
laboratory.
Historically, optical classical computers were once thought to be promising replacements for electronic machines, but they ultimately failed to live up to expectations when
sufficiently nonlinear optical materials were not discovered, and when their speed and
parallelism advantages did not sufficiently outweigh their alignment and power disadvantages. On the other hand, optical communications is a vital and important area; one reason
for this is that for distances longer than one centimeter, the energy needed to transmit
Optical cavity quantum electrodynamics
297
a bit using a photon over a fiber is smaller than the energy required to charge a typical
50 ohm electronic transmission line covering the same distance. Similarly, it may be that
optical qubits may find a natural home in communication of quantum information, such
as in quantum cryptography, rather than in computation.
Despite the drawbacks facing optical quantum computer realizations, the theoretical
formalism which describes them is absolutely fundamental in all the other realizations
we shall study in the remainder of this chapter. In fact, you may think of what we shall
turn to next as being just another kind of optical quantum computer, but with a different
(and better!) kind of nonlinear medium.
Optical photon quantum computer
• Qubit representation: Location of single photon between two modes, |01 and
|10, or polarization.
• Unitary evolution: Arbitrary transforms are constructed from phase shifters (Rz
rotations), beamsplitters (Ry rotations), and nonlinear Kerr media,
which allow
two single photons to cross phase modulate, performing exp iχL|1111| .
• Initial state preparation: Create single photon states (e.g. by attenuating laser
light).
• Readout: Detect single photons (e.g. using a photomultipler tube).
• Drawbacks: Nonlinear Kerr media with large ratio of cross phase modulation
strength to absorption loss are difficult to realize.
7.5 Optical cavity quantum electrodynamics
Cavity quantum electrodynamics (QED) is a field of study which accesses an important
regime involving coupling of single atoms to only a few optical modes. Experimentally,
this is made possible by placing single atoms within optical cavities of very high Q; because
only one or two electromagnetic modes exist within the cavity, and each of these has a very
high electric field strength, the dipole coupling between the atom and the field is very high.
Because of the high Q, photons within the cavity have an opportunity to interact many
times with the atoms before escaping. Theoretically, this technique presents a unique
opportunity to control and study single quantum systems, opening many opportunities
in quantum chaos, quantum feedback control, and quantum computation.
In particular, single-atom cavity QED methods offer a potential solution to the dilemma
with the optical quantum computer described in the previous section. Single photons can
be good carriers of quantum information, but they require some other medium in order
to interact with each other. Because they are bulk materials, traditional nonlinear optical
Kerr media are unsatisfactory in satisfying this need. However, well isolated single atoms
might not necessarily suffer from the same decoherence effects, and moreover, they could
also provide cross phase modulation between photons. In fact, what if the state of single
photons could be efficiently transfered to and from single atoms, whose interactions could
be controlled? This potential scenario is the topic of this section.
298
Quantum computers: physical realization
7.5.1 Physical apparatus
The two main experimental components of a cavity QED system are the electromagnetic
cavity and the atom. We begin by describing the basic physics of cavity modes, and then
summarize basic ideas about atomic structure and the interaction of atoms with light.
Fabry–Perot cavity
The main interaction involved in cavity QED is the dipolar interaction d · E between an
electric dipole moment d and an electric field E. How large can this interaction be? It
is difficult in practice to change the size of d; however, |E| is experimentally accessible,
and one of the most important tools for realizing a very large electric field in a narrow
band of frequencies and in a small volume of space, is the Fabry–Perot cavity.
In the approximation that the electric field is monochromatic and occupies a single
spatial mode, it can be given a very simple quantum-mechanical description:
(7.56)
E(r) = iǫ E0 aeikr − a† e−ikr .
As described in Box 7.5, these approximations are appropriate for the field in a FabryPerot cavity. Here, k = ω/c is the spatial frequency of the light, E0 is the field strength,
ǫ is the polarization, and r is the position at which the field is desired. a and a† are
creation and annihilation operators for photons in the mode, and behave as described
in Section 7.4.2. Note that the Hamiltonian governing the evolution of the field in the
cavity is simply
Hfield = ωa† a ,
(7.57)
and this is consistent with the semiclassical notion that the energy is the volume integral
of |E|2 in the cavity.
Exercise 7.15: Plot (7.55) as a function of field detuning ϕ, for R1 = R2 = 0.9.
Two-level atoms
Until this section of the chapter, we have discussed only photons, or interactions such as
the cross phase modulation between photons mediated by a semiclassical medium. Now,
let us turn our attention to atoms, their electronic structure, and their interactions with
photons. This is, of course, a very deep and well-developed field of study; we shall only
describe a small part of it that touches upon quantum computation.
The electronic energy eigenstates of an atom can be very complicated (see Box 7.6), but
for our purposes modeling an atom as having only two states is an excellent approximation.
This two-level atom approximation can be valid because we shall be concerned with the
interaction with monochromatic light and, in this case, the only relevant energy levels
are those satisfying two conditions: their energy difference matches the energy of the
incident photons, and symmetries (‘selection rules’) do not inhibit the transition. These
conditions arise from basic conservation laws for energy, angular momentum, and parity.
Energy conservation is no more than the condition that
ω = E2 − E1 ,
(7.58)
where E2 and E1 are two eigenenergies of the atom. Angular momentum and parity
conservation requirements can be illustrated by considering the matrix element of r̂
between two orbital wavefunctions, l1 , m1 |r̂|l2 , m2 . Without loss of generality, we can
299
Optical cavity quantum electrodynamics
Box 7.5: The Fabry–Perot cavity
A basic component of a Fabry–Perot cavity is a partially silvered mirror, off which
incident light Ea and Eb partially reflect and partially transmit, producing the
output fields Ea′ and Eb′ . These are related by the unitary transform
1
√
0 √
Ea′
E
R
1
−
R
a
√
= √
,
(7.53)
Eb′
Eb
1−R − R
where R is the reflectivity of the mirror, and the location of the ‘−’ sign is a
convention chosen as given here for convenience.
in
out
//
/
refl
cav
A Fabry–Perot cavity is made from two plane parallel mirrors of reflectivities R1
and R2 , upon which light Ein is incident from the outside, as shown in the figure.
Inside the cavity, light bounces back and forth between the two mirrors, such that
the field acquires a phase shift eiϕ on each round-trip; ϕ is a function of the path
length and the frequency of the light. Thus, using (7.53), we find the cavity internal
field to be
√
1 − R1 Ein
√
Ek =
,
(7.54)
Ecav =
1 + eiϕ R1 R2
k
√
√
iϕ
E
=
−e
R1√
R2 Ek−1 . Similarly, we find Eout =
where√E0 = 1 − R1 Ein , and
k
√
√
iϕ/2
1 − R2 , and Erefl = R1 Ein + 1 − R1 R2 eiϕ Ecav .
e
One of the most important characteristics of a Fabry–Perot cavity for our purpose
is the power in the cavity internal field as a function of the input power and field
frequency,
+
+
1 − R1
Pcav ++ Ecav ++2
√
=
.
(7.55)
=+
+
P
E
|1 + eiϕ R R |2
in
in
1
2
Two aspects are noteworthy. First, frequency selectivity is given by the fact that
ϕ = ωd/c, where d is the mirror separation, c is the speed of light, and ω is
the frequency of the field. Physically, it comes about because of constructive and
destructive interference between the cavity field and the front surface reflected
light. And second, on resonance, the cavity field achieves a maximum value which
is approximately 1/(1 − R) times the incident field. This property is invaluable for
cavity QED.
300
Quantum computers: physical realization
take r̂ to be in the x̂ − ŷ plane, such that it can be expressed in terms of spherical
harmonics (Box 7.6) as
)
3
r̂ =
(−rx + iry )Y1,+1 + (rx + iry )Y1,−1
(7.59)
8π
In this basis, the relevant terms in l, m1 |r̂|l, m2 are
Yl∗1 m1 Y1m Yl2 m2 dΩ .
(7.60)
Recall that m = ±1; this integral is non-zero only when m2 −m1 = ±1 and Δl = ±1. The
first condition is the conservation of angular momentum, and the second, parity, under
the dipole approximation where l1 , m1 |r̂|l2 , m2 becomes relevant. These conditions are
selection rules which are important in the two-level atom approximation.
Exercise 7.16: (Electric dipole selection rules) Show that (7.60) is non-zero only
when m2 − m1 = ±1 and Δl = ±1.
In reality, light is never perfectly monochromatic; it is generated from some source
such as a laser, in which longitudinal modes, pump noise, and other sources give rise to a
finite linewidth. Also, an atom coupled to the external world never has perfectly defined
energy eigenstates; small perturbations such as nearby fluctuating electric potentials, or
even interaction with the vacuum, cause each energy level to be smeared out and become
a distribution with finite width.
Nevertheless, by choosing an atom and excitation energy carefully, and by taking
advantage of the selection rules, it is possible to arrange circumstances such that the
two-level atom approximation i