Barak Shoshany PHY 256 Lecture Notes
Barak Shoshany PHY 256 Lecture Notes
Barak Shoshany PHY 256 Lecture Notes
Barak Shoshany
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1 Course Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Exercises and Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 NonTechnical Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 The Failures of Classical Physics . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1 BlackBody Radiation and the Ultraviolet Catastrophe . . . . . 6
2.1.2 The Photoelectric Effect . . . . . . . . . . . . . . . . . . . . . . 7
2.1.3 The DoubleSlit Experiment . . . . . . . . . . . . . . . . . . . . 8
2.1.4 The SternGerlach Experiment . . . . . . . . . . . . . . . . . . 12
2.2 Quantum vs. Classical Mechanics . . . . . . . . . . . . . . . . . . . . . 13
3 Mathematical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1 Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1.2 Operations on Complex Numbers . . . . . . . . . . . . . . . . 18
3.1.3 The Complex Plane and Real 2Vectors . . . . . . . . . . . . . 19
3.1.4 Polar Coordinates and Complex Phases . . . . . . . . . . . . . 22
3.2 Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.1 Complex Vector Spaces . . . . . . . . . . . . . . . . . . . . . . 23
3.2.2 Dual Vectors, Inner Products, Norms, and Hilbert Spaces . . . 25
3.2.3 Orthonormal Bases . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2.4 Matrices and the Adjoint . . . . . . . . . . . . . . . . . . . . . . 30
3.2.5 The Outer Product . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.6 The Completeness Relation . . . . . . . . . . . . . . . . . . . . 33
3.2.7 Representing Vectors in Different Bases . . . . . . . . . . . . . 34
3.2.8 Change of Basis . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2.9 Multiplication and Inverse of Matrices . . . . . . . . . . . . . . 37
3.2.10 Matrices Inside Inner Products . . . . . . . . . . . . . . . . . . 38
3.2.11 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . 39
3.2.12 Hermitian Matrices . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2.13 Unitary Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2.14 Normal Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2.15 Representing Matrices in Different Bases . . . . . . . . . . . . 43
3.2.16 Diagonalizable Matrices . . . . . . . . . . . . . . . . . . . . . . 45
3.2.17 The CauchySchwarz Inequality . . . . . . . . . . . . . . . . . 47
3.3 Probability Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3.1 Random Variables and Probability Distributions . . . . . . . . 48
3.3.2 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . 50
3.3.3 Expected Values . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.3.4 Standard Deviation . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.3.5 Normal (Gaussian) Distributions . . . . . . . . . . . . . . . . . 55
4 The Foundations of Quantum Theory . . . . . . . . . . . . . . . . . . . . . . 57
4.1 Axiomatic Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.1.1 Dimensionless and Dimensionful Constants . . . . . . . . . . . 58
4.1.2 Hilbert Spaces, States, and Operators . . . . . . . . . . . . . . 59
4.1.3 Hermitian Operators and Observables . . . . . . . . . . . . . . 61
4.1.4 Probability Amplitudes . . . . . . . . . . . . . . . . . . . . . . . 61
4.1.5 Superposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.1.6 Inner Products with Matrices, and the Expectation Value . . . 65
4.1.7 Summary For Discrete Systems . . . . . . . . . . . . . . . . . 67
4.2 TwoState Systems, Spin 1/2, and Qubits . . . . . . . . . . . . . . . . 68
4.2.1 The Pauli Matrices . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.2.2 Spin 1/2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.2.3 Qubits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.2.4 The Meaning of Superposition . . . . . . . . . . . . . . . . . . 74
4.3 Composite Systems and Quantum Entanglement . . . . . . . . . . . . 77
4.3.1 The Tensor Product . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.3.2 Vectors and Matrices in the Composite Hilbert Space . . . . . 79
4.3.3 Quantum Entanglement . . . . . . . . . . . . . . . . . . . . . . 83
4.3.4 The Bell States . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.3.5 Entanglement Does Not Transmit Information . . . . . . . . . 88
4.3.6 Bell’s Theorem and Bell’s Inequality . . . . . . . . . . . . . . . 89
4.4 NonCommuting Observables and the Uncertainty Principle . . . . . . 93
4.4.1 Commuting and NonCommuting Observables . . . . . . . . . 93
4.4.2 The Uncertainty Principle . . . . . . . . . . . . . . . . . . . . . 94
4.4.3 Simultaneous Diagonalization . . . . . . . . . . . . . . . . . . . 97
4.5 Dynamics, Transformations, and Measurements . . . . . . . . . . . . . 99
4.5.1 Unitary Transformations and Evolution . . . . . . . . . . . . . 99
4.5.2 Quantum Logic Gates . . . . . . . . . . . . . . . . . . . . . . . 101
4.5.3 The Measurement Axiom (Projective) . . . . . . . . . . . . . . 105
4.5.4 Applications of the Measurement Axiom . . . . . . . . . . . . . 107
4.5.5 The Measurement Axiom (Simplified) . . . . . . . . . . . . . . 109
4.5.6 Interpretations of Quantum Mechanics and the Measurement
Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.5.7 Superposition Once Again: Schrödinger’s Cat . . . . . . . . . 114
4.6 The NoCloning Theorem and Quantum Teleportation . . . . . . . . . 116
4.6.1 The NoCloning Theorem . . . . . . . . . . . . . . . . . . . . . 116
4.6.2 Quantum Teleportation . . . . . . . . . . . . . . . . . . . . . . 118
4.7 The Foundations of Quantum Theory: Summary . . . . . . . . . . . . 122
5 Continuous Quantum Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 125
5.1 Mathematical Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . 125
5.1.1 Exponentials and Logarithms . . . . . . . . . . . . . . . . . . . 125
5.1.2 Matrix and Operator Exponentials . . . . . . . . . . . . . . . . 128
5.2 Continuous Time Evolution, Hamiltonians, and the Schrödinger Equa
tion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
5.2.1 The Schrödinger Equation and Hamiltonians: Preface . . . . . 131
5.2.2 Derivation of the Schrödinger Equation . . . . . . . . . . . . . 132
5.2.3 TimeIndependent Hamiltonians . . . . . . . . . . . . . . . . . 135
5.2.4 Hamiltonians and Energy . . . . . . . . . . . . . . . . . . . . . 136
5.3 Hamiltonian Mechanics and Canonical Quantization . . . . . . . . . . . 137
5.3.1 A Quick Review of Classical Hamiltonian Mechanics . . . . . . 138
5.3.2 Canonical Quantization . . . . . . . . . . . . . . . . . . . . . . 141
5.4 The Harmonic Oscillator . . . . . . . . . . . . . . . . . . . . . . . . . . 143
5.4.1 The Classical Harmonic Oscillator . . . . . . . . . . . . . . . . 143
5.4.2 Quantizing the Harmonic Oscillator . . . . . . . . . . . . . . . 145
5.4.3 The Energy Eigenstates of the Harmonic Oscillator . . . . . . 146
5.5 Wavefunctions, Position, and Momentum . . . . . . . . . . . . . . . . . 150
5.5.1 The Position Operator . . . . . . . . . . . . . . . . . . . . . . . 150
5.5.2 Wavefunctions in the Position Basis . . . . . . . . . . . . . . . 151
5.5.3 The Momentum Operator . . . . . . . . . . . . . . . . . . . . . 155
5.5.4 Quantum Interference . . . . . . . . . . . . . . . . . . . . . . . 156
5.6 Solutions of the Schrödinger Equation . . . . . . . . . . . . . . . . . . 157
5.6.1 The Schrödinger Equation for a Particle . . . . . . . . . . . . . 157
5.6.2 Separation of Variables . . . . . . . . . . . . . . . . . . . . . . 159
List of Figures
2.1 The electromagnetic spectrum of a black body . . . . . . . . . . . . . 6
2.2 The photoelectric effect . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Light waves in the doubleslit experiment . . . . . . . . . . . . . . . . 9
2.4 Interference of two light waves . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Electron interference pattern in the doubleslit experiment . . . . . . 10
2.6 The SternGerlach experiment . . . . . . . . . . . . . . . . . . . . . . . 13
2.7 The uncertainty principle . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1 The complex plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 The normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.3 Probability distribution for one roll of a 6sided die . . . . . . . . . . . 56
3.4 Probability distribution for the sum of two rolls of a 6sided die . . . . 57
3.5 Probability distribution for the sum of three rolls of a 6sided die . . . 57
4.1 A qubit in a superposition of |0⟩ and |1⟩ . . . . . . . . . . . . . . . . . . 75
4.2 Schrödinger’s Cat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
1 Introduction
such as Hilbert spaces, states, operators, observables, superposition, probability
amplitudes, and expectation values.
Then, we will begin studying simple discrete quantum systems known as qubits,
which are the quantum analogue of bits, and are used in quantum computers. We
will learn about Schrödinger’s cat, quantum entanglement, Bell’s theorem, the
uncertainty principle, unitary evolution, quantum measurements, and quantum
In the remainder of the course we will study continuous quantum systems and re
lated concepts, including Hamiltonians, the Schrödinger equation, canonical quan
tization, the quantum harmonic oscillator, wavefunctions, quantum interference,
and solutions to the timeindependent Schrödinger equation, including scattering
and tunneling in one dimension.
By the end of the course, the students should expect to have a fairly good under
standing of quantum mechanics, and to develop an intuition for this very strange
and unintuitive theory. They will also be adequately prepared to dive deeper into
the subject, whether by taking more advanced courses or by doing research.
Throughout these notes, you will find many exercises and problems.
• Exercises are usually just calculations. They are meant to verify that you
understand how to calculate things, and they are usually simple and straight
• Problems are usually proofbased. They are meant to verify that you un
derstand the more abstract relations between the concepts we will introduce,
and they often require some thought.
2 NonTechnical Overview
should also convince you that your classical intuition must be replaced with
quantum intuition, which is what we will try to develop in this course.
5000 K
00 ¹ · m 00 ² · nm 00 ¹)
Classical theory (5000 K)
Spectral radiance (kW · sr 00
4000 K
3000 K
0 0.5 1 1.5 2 2.5 3
Wavelength (μm)
A black body is an object that absorbs all incoming light at all frequencies. It
absorbs it and does not reflect it – therefore, it is black. More generally, it absorbs
not just light, but all electromagnetic radiation. Black bodies also emit radiation,
due to their heat. Electromagnetic radiation has a spectrum of wavelengths of
different lengths. We are interested in predicting the amount of radiation emitted
by the black body at each wavelength, which we will refer to as the black body’s
One can try to use classical physics to calculate this spectrum. It turns out that the
amount of the radiation is inversely proportional to the wavelength1 . This means
that as the wavelength approaches zero, the amount of radiation approaches
infinity! This is illustrated by the black curve in figure 2.1. This result is called
the ultraviolet catastrophe, since ultraviolet light has shorter wavelengths than
visible light. Obviously, this does not fit well with experimental data, since when
we measure the total radiation emitted from a black body, we most definitely do
not measure it to be infinity!
To solve this problem, we must use quantum physics. If we assume that radiation
can only be emitted in discrete “packets” of energy called quanta, we get the
correct spectrum of radiation, which is compatible with experiment. The law
describing the amount of radiation at each wavelength is called Planck’s law. In
figure 2.1, we can see three different curves, calculated using Planck’s law, giving
the radiation spectrum at different temperatures (in Kelvin). You can see that the
total amount of radiation is no longer infinite. The quanta of electromagnetic
radiation are called photons.
When light hits a material, it causes the material to emit electrons. This phe
nomenon is called the photoelectric effect. Using classical physics, and the as
sumption that light is a wave, we can make the following predictions:
• Brighter light should have more energy, so it should cause the emitted elec
trons to have more kinetic energy, and thus move faster.
• Light with higher frequency should hit the material more often, so it should
cause a higher rate of electron emission, resulting in a larger electric current.
More precisely, the power emitted per unit area per unit solid angle per unit wavelength is
proportional to 1/𝜆4 where 𝜆 is the wavelength... But fortunately, we don’t need to be very precise
• Assuming there is a certain minimum energy needed to dislodge an electron
from the material, sufficiently bright light of any frequency should cause
electron emission.
• The kinetic energy of the emitted electrons increases with frequency, not
• Electrons are emitted only when the frequency of the light exceeds a certain
threshold, regardless of how bright it is.
This is illustrated in figure 2.2, where the red light does not cause any electrons
to be emitted, but the green and blue lights do, since they have higher frequency.
Furthermore, since the blue light has higher frequency than the green light, the
kinetic energy of the emitted electrons is larger.
To explain this, we must again use quantum physics. Einstein proposed to use
the same model that Planck suggested to solve the ultraviolet catastrophe, where
light is made of discrete photons. Each photon has energy proportional to the
frequency of the light, and brighter light of the same frequency simply has more
photons, each photon still with the same amount of energy. This model fits the
predictions perfectly.
So in figure 2.2, making the red light brighter will increase the number of photons,
but no matter how bright it is, the individual photons it’s made of still do not have
enough energy to dislodge an electron on their own. On the other hand, each
individual photon of the green and blue lights has, on its own, enough energy
to dislodge a photon, and even if the light is very dim, the electrons will still be
The previous two experiments may have convinced you that light is not a wave,
but a particle. But is that really the case? The doubleslit experiment shows that
things are actually more complicated. In this experiment, a light beam hits a
plate with two parallel slits. Most of the light is blocked by the plate, but some
of it passes through the slits and hits a screen, creating a pattern of bright and
dark bands.
This can be most naturally explained by assuming that light is not a particle, but
a wave. Each of the slits becomes the origin of a new wave, as illustrated in
figure 2.3. Each of the two waves has crests and troughs. When a crest of one
Figure 2.5: An interference pattern created by electrons in the double
slit experiment. Each image (from top to bottom) corresponds to a
later point in time, after more electrons have accumulated. Source:
wave is at the same place as a crest of the other wave, they add up to create
a crest with double the magnitude. This is called constructive interference. On
the other hand, if a crest of one wave is at the same place as a trough of the
other wave, they cancel each other. This is called destructive interference. See
figure 2.4 for an illustration. The pattern on the screen, as seen in figure 2.3, is
a consequence of this interference.
So the doubleslit experiment seems to prove that light is a wave, in contradic
tion with blackbody radiation and the photoelectric effect, which seem to prove
that light is a particle. It turns out that, in fact, both are correct; the quantum
nature of light has the consequence that it sometimes behaves like a classical
wave, and other times like a classical particle. This is called waveparticle duality.
Contrary to common misconception, this doesn’t mean that light is “both a wave
and a particle”; it simply demonstrates that the classical concepts of “wave” and
“particle” are not the proper way to describe reality.
Okay, so light exhibits waveparticle duality. Maybe this makes sense. But matter,
which is a tangible thing you can touch, is definitely made of particles, right? To
check that, we can replace the beam of light with a beam of electrons. Since
we think electrons are particles, not waves, we expect to find on the screen not
an interference pattern, but just individual dots corresponding to the individual
electron particles. And this is indeed what happens, except... If we run the
experiment for some time, and let the electrons build up, then after a while we
see that an interference pattern emerges nonetheless! This is shown in figure 2.5.
What does this mean? It means that, in quantum physics, both light and matter
exhibit waveparticle duality. In classical physics, the measurement of the posi
tion of the electron on the screen is deterministic; if we know the initial position
and velocity of the electron, then we can predict exactly where the electron lands.
In quantum physics, we instead have a probability distribution, which gives us
the probability for the electron to be measured at each particular point on the
screen. This probability distribution turns out to propagate in space like a wave,
and interfere with itself constructively and destructively on the way as a wave
does, which is what causes the interference pattern on the screen – it is actually
a pattern of probabilities! In the end, the probability will be enlarged on some
points of the screen and reduced on other points.
To clarify how the measurement of the positions of the electrons on the screen
yields a probability distribution, consider instead a 6sided die. If you roll the
die just once or twice, you won’t have much information about the probabilities
to roll each number on the die. This is analogous to sending just a couple of
electrons through the slits. What you need to do is to roll the die a large number
of times, let’s say 6,000 times. Then you count how many times the die rolled
on each number. For example, if it rolled around 1,000 times on each number,
then you know the die is fair; but if it rolled around 2,000 times on 6 and around
800 times on every other number, then you know the die is loaded. Similarly, we
need to send a large number of electrons through the slits in order to determine
the probability distribution for their positions on the screen. It turns out that the
position of the electron is “loaded”!
As an aside, in 21st century terms, the precise answer to the question “is light
a wave or a particle?” turns out to be that both of them are different aspects of
the same fundamental entity called the quantum electromagnetic field. This field
propagates from place to place like a wave, but on the other hand, if you put
enough energy into it, you can cause a quantum excitation in the field. It is this
excitation that behaves like a particle.
Moreover, it turns out that all elementary particles are quantum fields, and thus
all of them exhibit these two aspects. This is called quantum field theory. It
neatly unites quantum mechanics with special relativity, and explains elementary
particle physics in amazing accuracy – it is actually the most accurate theory in
all of science! In this course we will focus on nonrelativistic quantum mechanics,
which is to quantum field theory as Newtonian physics is to special relativity.
Quantum field theory is much more complicated, and is usually only taught at
the graduateschool level.
5 N
S 1
– into discrete spin. This seems to be a general property of most, but not all,
quantum systems: something that in classical physics was continuous turns out
to actually be discrete in quantum physics.
Finally, let me just mention that one can use spin to create qubits, or “quantum
bits”, where “spin up” represents a value of 0 and “spin down” represents a value
of 1. Because spin is a quantum quantity, it satisfies all of the weird properties
of quantum mechanics that we will discuss later. By taking advantage of these
quantum properties, we can potentially do calculations faster with a quantum
computer that uses qubits compared to a classical computer that uses classical
Figure 2.7: The uncertainty principle.
Similarly, we saw that angular momentum, which is continuous in the clas
sical theory, is replaced by discrete spin in the quantum theory.
3 Mathematical Background
Complex numbers are at the very core of the mathematical formulation of quan
tum theory. In this section we will give a review of complex numbers and present
some definitions and results that will be used throughout the course.
3.1.1 Motivation
In real life, we only encounter real numbers. These numbers form a field, that
is, a set of elements with welldefined operations of addition, subtraction, multi
plication, and division. This field is denoted ℝ. Geometrically, we can imagine ℝ
as a 1dimensional line, stretching from −∞ to +∞.
Unfortunately, it turns out that the field of real numbers has a serious flaw. One
can write down completely reasonablelooking quadratic equations, with only real
coefficient, which nonetheless have no solutions in ℝ. Consider the most general
quadratic equation:
𝑎𝑥2 + 𝑏𝑥 + 𝑐 = 0, 𝑎, 𝑏, 𝑐 ∈ ℝ. (3.1)
One can easily prove (by completing the square) that there are two potential
solutions, given by √
−𝑏 ± 𝑏2 − 4𝑎𝑐
𝑥± ≡ . (3.2)
The number (and existence) of real solutions is thus determined by the sign of
the expression inside the square root, called the discriminant Δ ≡ 𝑏2 − 4𝑎𝑐:
⎧ −𝑏 ± Δ
{Δ > 0 ∶ two real roots 𝑥± = ,
{ 2𝑎
{ 𝑏
⎨Δ = 0 ∶ one real root 𝑥 = − 2𝑎 , (3.4)
{Δ < 0 ∶ no real roots.
The new field created by extending ℝ with i is the field of complex numbers,
denoted ℂ. A general complex number is written
𝑧 = 𝑎 + i 𝑏, 𝑧 ∈ ℂ, 𝑎, 𝑏, ∈ ℝ, (3.6)
where 𝑎 is called the real part and 𝑏 is called the imaginary part, both real num
Now, in the quadratic equation, having Δ with a negative Δ is no longer a
problem, since the number i −Δ squares to Δ:
√ 2 2
(i −Δ) = i (−Δ) = (−1) (−Δ) = Δ. (3.7)
Therefore, we conclude that every quadratic equation has a solution in the field
We use nonitalic font exclusively for i in order to distinct it from 𝑖, which will be used for labels
and variables. Of course, it is usually a wise idea not to have both i and 𝑖 in the same equation in
the first place, but sometimes that is unavoidable.
of complex numbers4 :
⎧ −𝑏 ± Δ
{Δ > 0 ∶ two real roots 𝑥± = ,
{ 2𝑎
{ 𝑏
⎨Δ = 0 ∶ one real root 𝑥 = −
, (3.8)
{ √
{ 𝑏 −Δ
{Δ < 0 ∶ two complex roots 𝑥± = − ± i .
{ 2𝑎 2𝑎
𝑥2 − 6𝑥 + 25 = 0. (3.9)
Complex numbers can be added and multiplied with other complex numbers.
There is really nothing special about these operations, except that it is customary
Note that real numbers are a special case of complex numbers, so the two real roots are also
two complex roots.
Again, real numbers are a special case of complex numbers, so the coefficients can be all real.
Or equivalently, it has exactly 𝑛 not necessarily unique complex roots, accounting for possible
degeneracy/multiplicity. For example, for Δ = 0 the quadratic equation has two degenerate roots,
or one root of multiplicity 2.
Sometimes also called purely imaginary numbers.
to group the imaginary parts (i.e. anything that is a multiple of i) together and
turn i into −1 in the final result:
(𝑎 + i 𝑏) + (𝑐 + i 𝑑) = (𝑎 + 𝑐) + i (𝑏 + 𝑑) , (3.10)
Next, note that the two solutions to a quadratic equation with Δ < 0 are the same,
up to the sign of i. That is, if we replace i with − i in one of the solutions, we get
the other solution. Such numbers are called complex conjugates, and the process
of replacing i with − i is called complex conjugation. The complex conjugate of 𝑧
is denoted 𝑧∗ :
𝑧 = 𝑎 + i 𝑏 ⟹ 𝑧 ∗ = 𝑎 − i 𝑏. (3.12)
This means that the complex conjugation operation is an involution, that is, its
own inverse.
Complex conjugation allows us to write a general formula for the real or imaginary
parts of a complex number, denoted Re 𝑧 and Im 𝑧 respectively:
𝑧 + 𝑧∗ 𝑧 − 𝑧∗
Re 𝑧 ≡ , Im 𝑧 ≡ . (3.14)
2 2i
You can check that if 𝑧 = 𝑎 + i 𝑏 then we get Re 𝑧 = 𝑎 and Im 𝑧 = 𝑏, as expected.
Exercise 3.3. What are the real and imaginary parts of 4−7 i? What is its complex
Problem 3.4. If a number is the complex conjugate of itself, can you say anything
interesting about that number? What about if a number is minus the complex
conjugate of itself?
Recall that the field of real numbers ℝ is geometrically a line. The space ℝ𝑛 is
an 𝑛dimensional space which is home to real 𝑛vectors, that is, ordered lists of
𝑛 real numbers of the form (𝑣1 , … , 𝑣𝑛 ). In particular, ℝ2 is geometrically a plane,
with vectors of the form (𝑥, 𝑦).
The complex plane ℂ is similar to ℝ2 , except that instead of the 𝑥 and 𝑦 axes we
have the real and imaginary axes respectively. The real unit 1, which squares
to +1, defines the positive direction of the real axis, while the imaginary unit i,
z = a + i b = eiϕ
r = |z|
z* = a - i b = e-iϕ
which squares to −1, defines the positive direction of the imaginary axis. This is
illustrated in figure 3.1.
Since ℂ is a plane, we can define vectors on it, just like on ℝ2 . A real 2vector
(𝑎, 𝑏) is an arrow in ℝ2 which points from the origin (0, 0) to the point that is 𝑎 steps
in the direction of the 𝑥 axis and 𝑏 steps in the direction of the 𝑦 axis. A complex
number 𝑧 = 𝑎 + i 𝑏 is similarly an arrow in ℂ which points from the origin 0 to the
point that is 𝑎 steps along the real axis and 𝑏 steps along the imaginary axis.
The complex conjugate 𝑧∗ = 𝑎−i 𝑏 is obtained by replacing i with − i. Since i defines
the direction of the imaginary axis, this is equivalent to flipping the imaginary axis.
In other words, 𝑧∗ is the reflection of 𝑧 along the real axis, as shown in figure 3.1.
From the Pythagorean theorem, we know that the magnitude (or length) of the
real 2vector (𝑎, 𝑏) is 𝑎2 + 𝑏2 . The magnitude or absolute value |𝑧| of the complex
number 𝑧 = 𝑎 + i 𝑏 is also 𝑎2 + 𝑏2 . (Inspect figure 3.1 to see how the Pythagorean
theorem fits in.) Furthermore, since 𝑧∗ is just a reflection of 𝑧, they both have
the same magnitude. A convenient way to calculate the magnitude of either 𝑧 or
𝑧∗ it to multiply them with each other:
2 2 2
|𝑧| = |𝑧 ∗ | ≡ 𝑧 ∗ 𝑧 = (𝑎 + i 𝑏) (𝑎 − i 𝑏) = 𝑎2 − i 𝑏2 = 𝑎2 + 𝑏2 , (3.15)
|𝑧| = |𝑧 ∗ | = √𝑎2 + 𝑏2 . (3.16)
For an abstract complex number (where we don’t necessarily know the explicit
values of the real and imaginary parts) one can also write
2 2
|𝑧| = |𝑧 ∗ | = √(Re 𝑧) + (Im 𝑧) . (3.17)
𝑎 + i 𝑏 ⟷ (𝑎, 𝑏) . (3.18)
We have already seen that the norm operation is preserved. Similarly, addition
of complex numbers
(𝑎 + i 𝑏) + (𝑐 + i 𝑑) = (𝑎 + 𝑐) + i (𝑏 + 𝑑) . (3.19)
Problem 3.6. Show that multiplications of a vector by a real number and reflec
tion of a vector with respect to the 𝑥 and 𝑦 axes map to equivalent operations on
the corresponding complex numbers.
3.1.4 Polar Coordinates and Complex Phases
⎧arctan( 𝑦 ) if 𝑥 > 0,
{ 𝑥
{arctan( 𝑦 ) + 𝜋 if 𝑥 < 0 and 𝑦 ≥ 0,
{ 𝑥
{arctan( 𝑦 ) − 𝜋 if 𝑥 < 0 and 𝑦 < 0,
𝜙=⎨ (3.23)
{ 2 if 𝑥 = 0 and 𝑦 > 0,
{ 𝜋
{− 2 if 𝑥 = 0 and 𝑦 < 0,
{undefined if 𝑥 = 0 and 𝑦 = 0.
This function is sometimes called atan2 (𝑥, 𝑦), and it is implemented in most pro
gramming languages. Note that 𝜙 is undefined at the origin since a vector of
length zero does not point in any direction.
Given that complex numbers are isomorphic to real 2vectors, we should be able
to write complex numbers in polar coordinates as well. Looking at equation (3.21),
and replacing 𝑥 and 𝑦 with 𝑎 and 𝑏, we see that
This is illustrated in figure 3.1. In this context, the angle 𝜙 is called the complex
phase. It is of extreme importance in quantum mechanics, as we shall see.
Exercise 3.7. Write 2 i −3 in polar coordinates.
Problem 3.8. Prove, using Euler’s formula, that ∣ei 𝜙 ∣ = 1, that is, the magnitude
of the complex number ei 𝜙 is 1. If 𝑧 = 𝑟 ei 𝜙 , what is |𝑧|?
Problem 3.9. Prove Euler’s formula. (You may need to use some calculus.)
|Ψ⟩ ≡ ( ). (3.26)
1. Closed – the sum of two vectors is another vector in the same space:
2. Commutative – the order of vectors doesn’t matter:
3. Associative – if three vectors are added, it doesn’t matter which two are
added first:
∀ |Ψ⟩ , |Φ⟩ , |Θ⟩ ∈ 𝒱 ∶ ( |Ψ⟩ + |Φ⟩) + |Θ⟩ = |Ψ⟩ + ( |Φ⟩ + |Θ⟩) . (3.30)
5. Inverse vector – for every vector there exists another (unique) vector such
that the two vectors sum to the zero vector:
1. Closed – the product of a vector and a scalar is a vector in the same space:
5. Identity scalar or unit scalar – there is a (unique) scalar 1 which, when
multiplied by any vector, does not change it:
Ψ1 Φ1 Ψ1 + Φ 1
|Ψ⟩ ≡ ( ) ∈ ℂ2 , |Φ⟩ ≡ ( ) ∈ ℂ2 ⟹ |Ψ⟩+|Φ⟩ = ( ) , (3.38)
Ψ2 Φ2 Ψ2 + Φ 2
Ψ1 𝜆Ψ1
|Ψ⟩ ≡ ( ) ∈ ℂ2 , 𝜆∈ℂ ⟹ 𝜆 |Ψ⟩ = ( ). (3.39)
Ψ2 𝜆Ψ2
3+i i −1
|Ψ⟩ ≡ ( ), |Φ⟩ ≡ ( ), 𝛼 = 7 i −2, 𝛽 = −4 − 8 i . (3.40)
−9 −10 i
A dual vector is defined by writing the vector as a row instead of a column, and
replacing each component with its complex conjugate. We denote the dual vector
of |Ψ⟩ as follows:
⟨Ψ| = ( Ψ∗1 Ψ∗2 ) . (3.41)
In terms of notation, there is now an opposite angle bracket ⟨ on the left of the
label, and the straight line | is on the right. Addition and multiplication by a scalar
are defined as for vectors, simply replacing columns with rows. However, you
may not add vectors and dual vectors together – adding a row to a column is
If we are given a dual vector, we can take its dual to get a “normal” (column)
vector. In this case, the operation of taking the dual involves writing the vector as
a column instead of a row and taking the complex conjugates of the components.
This means that the operation of taking the dual is an involution – taking the dual
of a vector twice gives back the same vector, since (𝑧∗ ) = 𝑧.
Using dual vectors, we may define the inner product. This product allows us to
take a vector and a dual vector and produce a (complex) number out of them,
similarly to the dot product of real vectors10 . Importantly, the inner product only
works for one vector and one dual vector, not for two vectors or two dual vectors.
To calculate it, we multiply the components of both vectors one by one and add
them up:
⟨Ψ|Φ⟩ = ( Ψ∗1 Ψ∗2 ) ( 1 ) = Ψ∗1 Φ1 + Ψ∗2 Φ2 . (3.42)
In braket notation, vectors |Ψ⟩ are called “kets” and dual vectors ⟨Ψ| are called
“bras” . Then the notation for ⟨Ψ|Φ⟩ is called a “bra(c)ket”.
We define the normsquared of a vector by taking its inner product with its dual
(“squaring” it):
2 Ψ1 2 2
‖Ψ‖ ≡ ⟨Ψ|Ψ⟩ = ( Ψ∗1 Ψ∗2 ) ( ) = |Ψ1 | + |Ψ2 | , (3.43)
‖Ψ‖ ≡ √‖Ψ‖ = √⟨Ψ|Ψ⟩. (3.44)
Observe how taking the dual of a vector generalizes taking the complex conjugate
of a number, and taking the norm of a vector generalizes taking the magnitude
of a number; indeed, for 1dimensional vectors, these operations are the same!
A vector space with an inner product is called a Hilbert space, provided it is also a
complete metric space11 and that the inner product satisfies the same properties
(which you will derive in problems 3.13, 3.14, and 3.15) as the standard inner
The dot product of the real vectors v ≡ (𝑣1 , 𝑣2 ) and w ≡ (𝑤1 , 𝑤2 ) in ℝ2 is defined as v ⋅ w ≡
𝑣1 𝑤1 + 𝑣2 𝑤2 . In principle, this definition does secretly involve a dual (row) vector and a (column)
vector, but since we do not need to take the complex conjugate, we don’t really need to worry
about dual vectors. However, it is important to note that in real vector spaces with curvature, such
as those used in general relativity, the dot product must be replaced with a more complicated inner
product which involves the metric, and it again becomes crucial to distinguish vectors from dual
vectors – which in this context are also called contravariant and covariant vectors respectively.
A vector space is a complete metric space if whenever an infinite series of vectors |Ψ𝑖 ⟩ converges
absolutely, that is, the series of the norms of the vectors converges:
∑ ‖Ψ𝑖 ‖ < ∞, (3.45)
then the series of the vectors themselves converges as well, to some vector |Ψ⟩ in the Hilbert
product on ℂ𝑛 . In particular, ℂ𝑛 itself is a Hilbert space, but there are many
other Hilbert spaces, some of them much more abstract. The usual notation for
a general Hilbert space is ℋ.
7 +7i −2 − 7 i
|Ψ⟩ ≡ ( ), |Φ⟩ ≡ ( ). (3.47)
−7 − 2 i i
Problem 3.14. Prove that ⟨Φ|Ψ⟩ = ⟨Ψ|Φ⟩∗ , that is, if we swap the order of vectors
in the inner product we get the complex conjugate of the original product. Thus,
unlike the dot product, the inner product on ℂ𝑛 is not symmetric. However, it
is conjugatesymmetric, and in particular, the magnitude of the inner product
remains the same, since |𝑧| = |𝑧 ∗ |.
1. They span ℂ𝑛 , which means that any vector |Ψ⟩ ∈ ℂ𝑛 can be written uniquely
as a linear combination of the basis vectors, that is, a sum of the vectors
|𝐵𝑖 ⟩ multiplied by some complex numbers 𝜆𝑖 ∈ ℂ:
|Ψ⟩ = ∑ 𝜆𝑖 |𝐵𝑖 ⟩ . (3.49)
∑ |Ψ𝑖 ⟩ = |Ψ⟩ . (3.46)
This property ensures that the basis can be used to define any single vector
in the space ℂ𝑛 , not just part of that space.
As a simple example, in ℝ3 the vector x̂ ≡ (1, 0, 0) pointing along the 𝑥 axis
and the vector ŷ ≡ (0, 1, 0) pointing along the 𝑦 axis span the 𝑥𝑦 plane, but
not all of ℝ3 . To get a basis for all of ℝ3 , we must add an appropriate third
vector, such as the vector ẑ ≡ (0, 0, 1) pointing along the 𝑧 axis. (But other
vectors, such as (1, 2, 3), would work as well.)
2. They are linearly independent, in that if the zero vector is a linear combina
tion of the basis vectors, then the coefficients in the linear combination must
all be zero:
∑ 𝜆𝑖 |𝐵𝑖 ⟩ = 0 ⟹ 𝜆𝑖 = 0, ∀𝑖. (3.50)
Linear independence means (as you will show in problem 3.17) that no vec
tor in the set can be written as a linear combination of the other vectors in
the set. If we could have done so, then that vector would have been redun
dant, and we would have needed to remove it in order to obtain a basis.
As a simple example, the set composed of x,̂ y,̂ and (1, 2, 0) is linearly depen
dent, since (1, 2, 0) = x̂ + 2y,̂ but the set {x,̂ y,̂ z}
̂ is linearly independent.
3. They are all orthogonal to each other, that is, the inner product of any two
different vectors evaluates to zero:
4. They are all unit vectors, that is, they have a norm (and normsquared) of
‖𝐵𝑖 ‖ = ⟨𝐵𝑖 |𝐵𝑖 ⟩ = 1, ∀𝑖. (3.52)
0 if 𝑖 ≠ 𝑗,
⟨𝐵𝑖 |𝐵𝑗 ⟩ = 𝛿𝑖𝑗 = { (3.53)
1 if 𝑖 = 𝑗,
where 𝛿𝑖𝑗 is called the Kronecker delta. If this combined property is satisfied, we
say that the vectors are orthonormal12 .
These requirements become much simpler in 𝑛 = 2 dimensions. An orthonormal
basis for ℂ2 is a set of 2 nonzero vectors |𝐵1 ⟩ , |𝐵2 ⟩ such that:
1. They span ℂ2 , which means that any vector |Ψ⟩ ∈ ℂ2 can be written as a
Actually, bases don’t have to be orthonormal in general, but in quantum mechanics they always
are, for reasons that will become clear later.
linear combination of the basis vectors:
2. They are linearly independent, which means that we cannot write one in
terms of a scalar times the other, i.e.:
3. They are orthonormal to each other, that is, the inner product between them
evaluates to zero and both of them have unit norm:
1 0
|11 ⟩ ≡ ( ), |12 ⟩ ≡ ( ). (3.58)
0 1
We similarly define the standard basis of ℂ𝑛 for any 𝑛 in the obvious way.
Problem 3.16. Show that the standard basis vectors satisfy the properties above.
Problem 3.17. Show that linear independence means that no vector in the basis
can be written as a linear combination of the other vectors in the basis.
Problem 3.18. Any basis which is orthogonal but not orthonormal, that is, does
not satisfy property 4, can be made orthonormal by normalizing each basis vector,
that is, dividing it by its norm:
|𝐵𝑖 ⟩
|𝐵𝑖 ⟩ ↦ . (3.59)
‖𝐵𝑖 ‖
Show that if an orthogonal but not orthonormal basis satisfies properties 13,
then it still satisfies them after normalizing it in this way.
|Ψ⟩ ≡ ( ). (3.60)
2 + 2i
Normalize |Ψ⟩ and find another complex vector |Φ⟩ such that the set {|Ψ⟩ , |Φ⟩} is
a basis of ℂ2 (i.e. satisfies all of the properties above).
Problem 3.20. Find an orthonormal basis of ℂ3 which is not the standard basis
or a scalar multiple of the standard basis. Show that it is indeed an orthonormal
𝐴11 𝐴12
𝐴=( ), 𝐴11 , 𝐴12 , 𝐴21 , 𝐴22 ∈ ℂ. (3.61)
𝐴21 𝐴22
A matrix can act on a vector to produce another vector. If it’s a ket (a verti
cal/column vector), the result is another ket. If it’s a bra (a horizontal/row dual
vector), the result is another bra.
If the matrix acts on a ket, then it must act from the left, and the element at
row 𝑖 of the resulting ket is obtained by taking the inner product of row 𝑖 of the
matrix with the ket:
If the matrix acts on a bra, then it must act from the right, and the element at
column 𝑖 of the resulting bra is obtained by taking the inner product of column
𝑖 of the matrix with the bra:
𝐴11 𝐴12
⟨Ψ| 𝐴 = ( Ψ∗1 Ψ∗2 ) ( ) = ( Ψ∗1 𝐴11 + Ψ∗2 𝐴21 Ψ∗1 𝐴12 + Ψ∗2 𝐴22 ) . (3.63)
𝐴21 𝐴22
Note that the dual vector ⟨Ψ| 𝐴 is not the dual of the vector 𝐴 |Ψ⟩, as you can see
by taking the dual of equation (3.62). However, we can define the adjoint of a
matrix by transposing rows into columns and then taking the complex conjugate
of all the components:
𝐴∗ 𝐴∗21
𝐴† = ( 11 ), (3.64)
𝐴∗12 𝐴∗22
where the notation † for the adjoint is called dagger. Then the vector dual to
𝐴 |Ψ⟩ is ⟨Ψ| 𝐴† , as you will check in problem 3.22. Actually, taking the adjoint of
a matrix is exactly the same operation as taking the dual of a vector! The only
difference is that for a matrix we have 𝑛 columns to transpose into rows, while
In fact, matrices don’t have to be square, they can have a different number of rows and columns,
that is, 𝑛 ×𝑚 where 𝑛 ≠ 𝑚; but nonsquare matrices are generally not of much interest in quantum
for a vector we only have one. Therefore, we have
† †
|Ψ⟩ = ⟨Ψ| , ⟨Ψ| = |Ψ⟩ , (3.65)
1 0
1=( ). (3.67)
0 1
Acting with it on any vector or dual vector does not change it: 1 |Ψ⟩ = |Ψ⟩.
cos 𝜃 − sin 𝜃
𝑅 (𝜃) ≡ ( ). (3.68)
sin 𝜃 cos 𝜃
Problem 3.22. Show that the vector dual to 𝐴 |Ψ⟩ is indeed ⟨Ψ| 𝐴† .
1 + 5i 2
𝐴≡( ), ⟨Ψ| ≡ ( i −2 i −3 ) . (3.69)
3 − 7i 4 + 8i
Calculate 𝐴 |Ψ⟩ and ⟨Ψ| 𝐴† separately, and then check that they are the dual of
each other.
Problem 3.24. Show that (𝐴† ) = 𝐴. This means that the adjoint operation is an
involution, exactly like complex conjugation and taking the dual of a vector. In
fact, all three are the exact same operation. By choosing an appropriate matrix,
explain how taking the complex conjugate of a number is a special case of taking
the adjoint of a matrix.
Problem 3.25. Show that the action of a matrix on a vector is linear, that is,
We have seen that vectors and dual vectors may be combined to generate a
complex number using the inner product. We can similarly combine a vector and
a dual vector to generate a matrix, using the outer product. Given
⟨Ψ| ≡ ( Ψ∗1 Ψ∗2 ) , |Φ⟩ ≡ ( ), (3.71)
we define the outer product as the matrix whose component at row 𝑖, column 𝑗 is
given by multiplying the component at row 𝑖 of |Φ⟩ with the component at column
𝑗 of ⟨Ψ|:
Φ Ψ∗ Φ Ψ∗2 Φ1
|Φ⟩⟨Ψ| = ( 1 ) ( Ψ∗1 Ψ∗2 ) = ( 1∗ 1 ). (3.72)
Φ2 Ψ1 Φ2 Ψ∗2 Φ2
Note how when taking an inner product the straight lines | face each other: ⟨Ψ|Φ⟩,
while when taking an outer product the angle brackets ⟩⟨ face each other. This
shows some of the elegance of the Dirac notation! A braket is an inner product,
while a ketbra is an outer product.
We can assign a rank to scalars, vectors, and matrices:
Then the inner product reduces the rank of the vectors from 1 to 0, while the
outer product increases the rank from 1 to 2.
1 3−i
|Ψ⟩ = ( ), |Φ⟩ = ( ). (3.73)
2+i 4i
Remember that when writing the dual vector, the components are complex con
3.2.6 The Completeness Relation
Taking the inner product of the above equation with ⟨𝐵𝑗 | and using the fact that
the basis vectors are orthonormal,
0 if 𝑖 ≠ 𝑗,
⟨𝐵𝑖 |𝐵𝑗 ⟩ = 𝛿𝑖𝑗 = { (3.75)
1 if 𝑖 = 𝑗,
we get:
𝑛 𝑛
⟨𝐵𝑗 |Ψ⟩ = ∑ 𝜆𝑖 ⟨𝐵𝑗 |𝐵𝑖 ⟩ = ∑ 𝜆𝑖 𝛿𝑖𝑗 = 𝜆𝑗 , (3.76)
𝑖=1 𝑖=1
since all of the terms in the sum vanish except the one with 𝑖 = 𝑗. Therefore, the
coefficients 𝜆𝑖 in equation (3.74) are given, for any vector |Ψ⟩ and for any basis
|𝐵𝑖 ⟩, by
𝜆𝑖 = ⟨𝐵𝑖 |Ψ⟩. (3.77)
We haven’t actually done anything here; where to write the scalar, on the left or
right of the vector, is completely arbitrary – it’s just conventional to write it on
the left. Then, replacing 𝜆𝑖 with ⟨𝐵𝑖 |Ψ⟩ as per equation (3.77), we get
|Ψ⟩ = ∑ |𝐵𝑖 ⟩⟨𝐵𝑖 |Ψ⟩. (3.79)
Note that what we did here is go from a vector |𝐵𝑖 ⟩ times a complex number
⟨𝐵𝑖 |Ψ⟩ to a matrix |𝐵𝑖 ⟩⟨𝐵𝑖 | times a vector |Ψ⟩, for each 𝑖. The fact that these two
different products are actually equal to one another (as you will prove in problem
3.28) is not at all trivial, but it is one of the main reasons we like to use braket
notation! The notation now suggests (see problem 3.29) that
∑ |𝐵𝑖 ⟩⟨𝐵𝑖 | = 1, (3.81)
where |𝐵𝑖 ⟩⟨𝐵𝑖 | is the outer product defined above, and the 1 on the righthand
side is the identity matrix. This extremely useful result is called the completeness
In ℂ2 , we simply have
|𝐵1 ⟩⟨𝐵1 | + |𝐵2 ⟩⟨𝐵2 | = 1. (3.82)
1 1 1 1
|𝐵1 ⟩ = √ ( ) , |𝐵2 ⟩ = √ ( ), (3.83)
2 1 2 −1
first show that it is indeed an orthonormal basis, and then show that it satisfies
the completeness relation given by equation (3.82).
⎜ ⎞
|Ψ⟩ ≡ ⎜ ⋮ ⎟, Ψ𝑖 ∈ ℂ. (3.85)
⎝ Ψ𝑛 ⎠
Given an orthonormal basis |𝐵𝑖 ⟩, we have seen that we can write |Ψ⟩ as a linear
combination of the basis vectors:
|Ψ⟩ = ∑ 𝜆𝑖 |𝐵𝑖 ⟩ . (3.86)
The coefficients 𝜆𝑖 ∈ ℂ depend on |Ψ⟩ and on the basis vectors, as we showed in
equation (3.77):
𝜆𝑖 ≡ ⟨𝐵𝑖 |Ψ⟩ ⟹ |Ψ⟩ = ∑ |𝐵𝑖 ⟩⟨𝐵𝑖 |Ψ⟩ (3.87)
With these coefficients, we can represent the vector |Ψ⟩ in the basis |𝐵𝑖 ⟩. This
representation will be a vector of the same dimension 𝑛, with the components
being the coefficients 𝜆𝑖 = ⟨𝐵𝑖 |Ψ⟩, and will be denoted as follows:
⟨𝐵1 |Ψ⟩ 𝜆1
⎛ ⎞ ⎛ ⎞
|Ψ⟩ ∣ ≡ ⎜
⎜ ⋮ ⎟
⎟ ⎜ ⋮ ⎟
= ⎜ ⎟. (3.88)
𝐵 ⎝ ⟨𝐵 𝑛 |Ψ⟩ ⎠ ⎝ 𝜆𝑛 ⎠
We say that 𝜆𝑖 are the coordinates of |Ψ⟩ with respect to the basis |𝐵𝑖 ⟩.
The correct way to understand the meaning of a vector is as an abstract entity,
like an arrow in space, which does not depend on any particular basis – it is just
there. However, if we want to do concrete calculations with a vector, we must
somehow represent it numerically. This is done by choosing a basis and writing
down the coordinates of the vector in that basis.
Therefore, whenever we define a vector using its components – as we have been
doing throughout this chapter – there is always a specific basis in which the
vector is represented, with the components being the coordinates in this basis. If
no particular basis is explicitly specified, it is implied that it is the standard basis.
But no representation is better than the other; we usually choose whatever basis
is most convenient to work with. In quantum mechanics, we often choose a basis
defined by some physical observable, as we will see below.
1 − 9i
|Ψ⟩ ≡ ( ). (3.89)
7 i −2
1 1 1 1
|𝐵1 ⟩ = √ ( ) , |𝐵2 ⟩ = √ ( ). (3.90)
2 1 2 −1
Problem 3.31. Prove that the inner product (and thus also the norm) is indepen
dent of the choice of basis. That is, for any two vectors |Ψ⟩ and |Φ⟩ and any two
bases |𝐵𝑖 ⟩ and |𝐶𝑖 ⟩,
3.2.8 Change of Basis
⟨𝐵1 |Ψ⟩ 𝑛
⎛ ⎞
|Ψ⟩ ∣ = ⎜
⎜ ⋮ ⎟
⎟ = ∑ |𝐵𝑖 ⟩⟨𝐵𝑖 |Ψ⟩. (3.92)
𝐵 ⎝ ⟨𝐵𝑛 |Ψ⟩ ⎠
⟨𝐶1 |Ψ⟩ 𝑛
⎜ ⎞
|Ψ⟩ ∣ = ⎜ ⋮ ⎟ = ∑ |𝐶𝑖 ⟩⟨𝐶𝑖 |Ψ⟩. (3.93)
𝐶 ⎝ ⟨𝐶𝑛 |Ψ⟩ ⎠
Inserting it in the middle of the inner product representing the coordinates ⟨𝐶𝑖 |Ψ⟩,
we get that for all 𝑖
𝑛 𝑛
⟨𝐶𝑖 |Ψ⟩ = ⟨𝐶𝑖 | (∑ |𝐵𝑗 ⟩⟨𝐵𝑗 |) |Ψ⟩ = ∑⟨𝐶𝑖 |𝐵𝑗 ⟩⟨𝐵𝑗 |Ψ⟩. (3.95)
𝑗=1 𝑗=1
Again, the Dirac notation proves to be pretty convenient! This relation can be
expressed in matrix form as follows:
or in other words,
where the changeofbasis matrix from |𝐵𝑖 ⟩ to |𝐶𝑖 ⟩, denoted 𝑃𝐶←𝐵 , is defined as
Exercise 3.32. Consider the two bases
1 1 1 1
|𝐵1 ⟩ = √ ( ) , |𝐵2 ⟩ = √ ( ), (3.99)
2 1 2 −1
1 1 1 −i
|𝐶1 ⟩ = √ ( ) , |𝐶2 ⟩ = √ ( ). (3.100)
2 i 2 −1
|Ψ⟩ = ( ). (3.101)
The matrix product of two matrices is another matrix. The element of that matrix
at row 𝑖, column 𝑗 is calculated by taking the inner product of row 𝑖 of the left
matrix with column 𝑗 of the right matrix:
then the matrix 𝐴 is called invertible and 𝐴−1 is called its inverse matrix. Note that
(𝐴−1 ) = 𝐴, so the operation of taking the inverse is an involution. Sometimes
matrices do not have an inverse; such matrices are called singular.
−1 3 9 − 8i 7
𝐴≡( ), 𝐵≡( ). (3.104)
−6 i 2 i −1 4i −2 i
Problem 3.34. Find a general formula for the inverse of a 2 × 2 matrix by taking
𝑎 𝑏 𝑒 𝑓
𝐴≡( ), 𝐴−1 ≡ ( ), (3.105)
𝑐 𝑑 𝑔 ℎ
1 2 − 4i
𝐴≡( ). (3.106)
−i −2
Problem 3.37. Matrix multiplication is not commutative in general. That is, for
two arbitrary matrices 𝐴 and 𝐵, it is not in general true that 𝐴𝐵 = 𝐵𝐴. Find
an example of two matrices which commute, and an example of two matrices
which do not commute. In each case, show that they indeed commute or don’t
𝜆 0
𝜆𝐴 = ( ) 𝐴. (3.107)
0 𝜆
Problem 3.39. Given two bases |𝐵𝑖 ⟩ and |𝐶𝑖 ⟩, show that the changeofbasis
matrix 𝑃𝐵←𝐶 is the inverse of the changeofbasis matrix in the other direction,
𝑃𝐶←𝐵 .
Since 𝐴 |Φ⟩ is itself a vector, we may calculate the inner product of that vector
with the dual vector ⟨Ψ|, which as usual gives us a complex number:
𝐴11 𝐴12 Φ
⟨Ψ|𝐴|Φ⟩ = ( Ψ∗1 Ψ∗2 ) ( )( 1 )
𝐴21 𝐴22 Φ2
= Ψ∗1 𝐴11 Φ1 + Ψ∗2 𝐴21 Φ1 + Ψ∗1 𝐴12 Φ2 + Ψ∗2 𝐴22 Φ2 .
If we take the dual of 𝐴 |Φ⟩ we get ⟨Φ| 𝐴† , as you proved in problem 3.22. Thus,
inverting the order of the inner product, we get
𝐴∗11 𝐴∗21 Ψ1
⟨Φ|𝐴† |Ψ⟩ = ( Φ∗1 Φ∗2 ) ( ∗ ∗ )( )
𝐴12 𝐴22 Ψ2
= Ψ1 𝐴∗11 Φ∗1 + Ψ2 𝐴∗21 Φ∗1 + Ψ1 𝐴∗12 Φ∗2 + Ψ2 𝐴∗22 Φ∗2 .
This is, of course, the complex conjugate of ⟨Ψ|𝐴|Φ⟩, since inverting the order of
the inner product results in the complex conjugate. In other words, we have the
⟨Ψ|𝐴|Φ⟩∗ = ⟨Φ|𝐴† |Ψ⟩. (3.108)
Taking the complex conjugate reverses the order of the inner product, and also
replaces the matrix with its adjoint.
5 + 2i 9 8i 3 + 4i
|Ψ⟩ = ( ), 𝐴=( ), |Φ⟩ = ( ). (3.109)
−3 i 6i 5 − 4i 2
If the matrix 𝐴, acting on the (nonzero) vector |Ψ⟩, results in a scalar multiple
of |Ψ⟩:
𝐴 |Ψ⟩ = 𝜆 |Ψ⟩ , 𝜆 ∈ ℂ, (3.110)
then we call |Ψ⟩ an eigenvector of 𝐴 and 𝜆 its eigenvalue. Note that |Ψ⟩ cannot
be the zero vector, but 𝜆 can be zero.
For example, if
1 0
𝐴=( ), (3.111)
0 −1
|Φ⟩ = ( ) (3.113)
Exercise 3.41. The matrix
1 2
𝐴≡( ) (3.115)
2 1
𝐴 = 𝐴† . (3.116)
Then we can take the inner product of both sides with ⟨Ψ|:
⟨Ψ|𝐴|Ψ⟩ = ⟨Ψ|𝜆|Ψ⟩ = 𝜆⟨Ψ|Ψ⟩ = 𝜆 ‖Ψ‖ , (3.119)
where were able to move 𝜆 out of the inner product because it’s just a number.
From equation (3.117), we have:
Let us take the inner product of the first equation with ⟨Φ| and of the second
equation with ⟨Ψ|:
⟨Φ|𝐴|Ψ⟩ = ⟨Φ|𝜆|Ψ⟩ = 𝜆⟨Φ|Ψ⟩, (3.122)
⟨Ψ|𝐴|Φ⟩ = ⟨Ψ|𝜇|Φ⟩ = 𝜇⟨Ψ|Φ⟩. (3.123)
From equation (3.117), the first equation is the complex conjugate of the second
equation. Since 𝜆 must be real – as we just proved – we get
𝜇⟨Ψ|Φ⟩ = (𝜆⟨Φ|Ψ⟩) = 𝜆⟨Ψ|Φ⟩. (3.124)
⟨Ψ|Φ⟩ = 0. (3.125)
Problem 3.43. Let 𝐴 and 𝐵 be Hermitian matrices. Under what conditions is the
product 𝐴𝐵 Hermitian?
0 2i
𝐴≡( ). (3.126)
𝑐 0
Problem 3.45. Find the most general 2 × 2 Hermitian matrix by demanding that
𝐴 = 𝐴† and finding conditions on the components of 𝐴.
3.2.13 Unitary Matrices
𝑈 −1 = 𝑈 † ⟹ 𝑈 𝑈 † = 𝑈 † 𝑈 = 1. (3.127)
since 𝑈 † 𝑈 = 1. Therefore, the inner product of these two vectors is the same
before and after acting on them with 𝑈 .
Now, let 𝜆 be an eigenvalue of the unitary matrix 𝑈 with the eigenvector |Ψ⟩:
𝑈 −1 = 𝑈 † or 𝑈 𝑈 † = 𝑈 † 𝑈 = 1 and finding conditions on the components of 𝑈 .
Problem 3.47. Find three 2 × 2 matrices that are both Hermitian and unitary
(other than the identity matrix).
Problem 3.49. Prove that the columns of a unitary matrix, treated as kets, form
an orthonormal basis on ℂ𝑛 . Then prove that the same is true for the rows of a
unitary matrix, treated as bras.
Problem 3.50. Let 𝐴 and 𝐵 be normal matrices. Under which condition are 𝐴𝐵
and 𝐴 + 𝐵 also normal?
In section 3.2.7 we saw that vectors are abstract entities which can have different
representations in different bases. The same is true for matrices. Consider a
matrix 𝐴 and a basis |𝐵𝑖 ⟩. Inserting the completeness relation (3.81) twice, one
time on each side of 𝐴, we get:
𝑛 𝑛
𝐴 = (∑ |𝐵𝑖 ⟩⟨𝐵𝑖 |) 𝐴 (∑ |𝐵𝑗 ⟩⟨𝐵𝑗 |)
𝑖=1 𝑗=1
𝑛 𝑛
= ∑ ∑ |𝐵𝑖 ⟩⟨𝐵𝑖 |𝐴|𝐵𝑗 ⟩⟨𝐵𝑗 |
𝑖=1 𝑗=1
𝑛 𝑛
= ∑ ∑ (𝐴𝑖𝑗 ) |𝐵𝑖 ⟩⟨𝐵𝑗 |,
𝑖=1 𝑗=1
(𝐴𝑖𝑗 ) ≡ ⟨𝐵𝑖 |𝐴|𝐵𝑗 ⟩ ∈ ℂ, 𝑖, 𝑗 ∈ {1, … , 𝑛} (3.134)
Inserting the completeness relation (3.81) into the coordinates twice, similarly to
what we did above, we get
𝑛 𝑛
⟨𝐶𝑖 |𝐴|𝐶𝑗 ⟩ = ⟨𝐶𝑖 | (∑ |𝐵𝑘 ⟩⟨𝐵𝑘 |) 𝐴 (∑ |𝐵ℓ ⟩⟨𝐵ℓ |) |𝐶𝑗 ⟩
𝑘=1 ℓ=1
𝑛 𝑛
= ∑ ∑⟨𝐶𝑖 |𝐵𝑘 ⟩⟨𝐵𝑘 |𝐴|𝐵ℓ ⟩⟨𝐵ℓ |𝐶𝑗 ⟩.
𝑘=1 ℓ=1
Problem 3.51. Show that this relation can be written in matrix form as follows:
and thus the relation between the representations of 𝐴 in different bases is given
(𝐴)𝐶 = 𝑃𝐶←𝐵 (𝐴)𝐵 𝑃𝐵←𝐶 , (3.138)
⟨𝐶1 |𝐵1 ⟩ ⋯ ⟨𝐶1 |𝐵𝑛 ⟩
⎛ ⎞
𝑃𝐶←𝐵 ≡⎜
⎜ ⋮ ⋱ ⋮ ⎟
⎟ (3.139)
⎝ ⟨𝐶𝑛 |𝐵1 ⟩ ⋯ ⟨𝐶𝑛 |𝐵𝑛 ⟩ ⎠
is the changeofbasis matrix (3.98), and 𝑃𝐵←𝐶 = 𝑃𝐶←𝐵 . This is analogous to the
relation between vectors in different bases, |Ψ⟩ ∣ = 𝑃𝐶←𝐵 |Ψ⟩ ∣ .
Problem 3.52. Let 𝑈 be a unitary matrix and let |𝐵𝑖 ⟩ be an orthonormal basis.
A. Prove that |𝐶𝑖 ⟩ ≡ 𝑈 |𝐵𝑖 ⟩ is also an orthonormal basis.
B. Prove that 𝑈 has the outer product representation
𝑈 = ∑ |𝐶𝑖 ⟩ ⟨𝐵𝑖 | . (3.140)
C. Conversely, prove that if |𝐵𝑖 ⟩ and |𝐶𝑖 ⟩ are two arbitrary orthonormal bases,
then the matrix 𝑈 defined by equation (3.140) is unitary.
A diagonal matrix is a matrix with all of its elements equal to zero except for the
elements on the diagonal, for example:
𝐷1 0
𝐷=( ). (3.141)
0 𝐷2
Now, consider the changeofbasis matrix (3.98), this time from the eigenbasis
|𝐵𝑖 ⟩, which is orthonormal to the standard basis |1𝑖 ⟩:
Note that each eigenvector |𝐵𝑖 ⟩ is represented in the standard basis |1𝑖 ⟩ as follows:
⟨1 |𝐵 ⟩
⎛ 1 𝑖 ⎞
|𝐵𝑖 ⟩ ∣ = ⎜
⎜ ⋮ ⎟
⎟, ∀𝑖. (3.144)
1 ⎝ ⟨1𝑛 |𝐵𝑖 ⟩ ⎠
Hence, the columns of 𝑃1←𝐵 are in fact the eigenvectors |𝐵𝑖 ⟩ themselves, as ex
pressed in the standard basis:
𝐴𝑃 = 𝐴 ( |𝐵1 ⟩ ⋯ |𝐵𝑛 ⟩ )
= ( 𝐴 |𝐵1 ⟩ ⋯ 𝐴 |𝐵𝑛 ⟩ )
= ( 𝜆1 |𝐵1 ⟩ ⋯ 𝜆𝑛 |𝐵𝑛 ⟩ ) .
How did we get this? Remember that in section 3.2.9 we said that the element
of the matrix 𝐴𝑃 at row 𝑖, column 𝑗 is calculated by taking the inner product of
row 𝑖 of 𝐴 with column 𝑗 of 𝑃 . But column 𝑗 of 𝑃 is just ∣𝐵𝑗 ⟩. The product 𝐴 |𝐵𝑖 ⟩
is another ket, whose rows are obtained by taking the inner product of each row
of 𝐴 with |𝐵𝑖 ⟩ respectively. The last equality follows from equation (3.142).
Next, we write the full matrix and decompose it into two matrices:
define a new diagonal matrix, with the eigenvalues on the diagonal:
𝜆1 0 0
⎛ ⎞
𝐷≡⎜ 0 ⋱ 0 ⎟
⎜ ⎟, (3.147)
⎝ 0 0 𝜆 𝑛 ⎠
𝐴𝑃 = 𝑃 𝐷. (3.148)
𝑃 −1 𝐴𝑃 = 𝐷. (3.149)
1 3
𝐴=( ). (3.150)
3 1
Problem 3.54. Prove that the changeofbasis matrix 𝑃 ≡ 𝑃1←𝐵 as defined above,
with |𝐵𝑖 ⟩ an orthonormal eigenbasis, is unitary. This means that we can also write
𝑃 † 𝐴𝑃 = 𝐷, since 𝑃 −1 = 𝑃 † for unitary matrices.
Problem 3.55. Show that if 𝐴 is a normal matrix then it has the outer product
𝐴 = ∑ 𝜆𝑖 |𝐵𝑖 ⟩ ⟨𝐵𝑖 | , (3.151)
where |𝐵𝑖 ⟩ is an orthonormal eigenbasis and 𝜆𝑖 are the eigenvalues of the eigen
vectors |𝐵𝑖 ⟩.
The CauchySchwarz inequality states that for any two vectors |Ψ⟩ and |Φ⟩, we
|⟨Ψ|Φ⟩| ≤ ‖Ψ‖ ‖Φ‖ . (3.152)
|𝐵1 ⟩ ≡ . (3.153)
Such a basis can always be generated using a method called the GramSchmidt process, which
we will not describe here.
Then, using the completeness relation (3.81), we find:
2 2 2
‖Ψ‖ ‖Φ‖ = ⟨Ψ|Ψ⟩ ‖Φ‖
= ⟨Ψ| (∑ |𝐵𝑖 ⟩⟨𝐵𝑖 |) |Ψ⟩ ‖Φ‖
= ⟨Ψ| (|𝐵1 ⟩⟨𝐵1 | + ∑ |𝐵𝑖 ⟩⟨𝐵𝑖 |) |Ψ⟩ ‖Φ‖
1 2
= ⟨Ψ| ( 2
|Φ⟩ ⟨Φ| + ∑ |𝐵𝑖 ⟩⟨𝐵𝑖 |) |Ψ⟩ ‖Φ‖
‖Φ‖ 𝑖=2
1 2
=( 2
⟨Ψ|Φ⟩⟨Φ|Ψ⟩ + ∑⟨Ψ|𝐵𝑖 ⟩⟨𝐵𝑖 |Ψ⟩) ‖Φ‖
‖Φ‖ 𝑖=2
1 2 2 2
=( 2
|⟨Ψ|Φ⟩| + ∑ |⟨Ψ|𝐵𝑖 ⟩| ) ‖Φ‖
‖Φ‖ 𝑖=2
2 2 2
= |⟨Ψ|Φ⟩| + ∑ |⟨Ψ|𝐵𝑖 ⟩| ‖Φ‖
≥ |⟨Ψ|Φ⟩| .
principle can take any real value. For simplicity, we will focus on discrete random
variables here.
A (discrete) probability distribution assigns a probability to each value of a random
variable. We denote by 𝑃 (𝑋 = 𝑥) the probability that the random variable 𝑋 will
have the value 𝑥. A probability is a number between 0 and 1, which denotes how
likely it is (in percentage) for the value to occur, so 0 means this value never
occurs and 1 (= 100%) means this value always occurs.
The probabilities for all the possible values must sum to 1, because if for example
they only sum to 0.9, this means that in 10% of the cases the random variable has
no value, which doesn’t really make sense. Also, if 𝑃 (𝑋 = 𝑥) = 0 then there must
be at least one other possible value that 𝑋 can take, since it will never evaluate
to 𝑥, and if 𝑃 (𝑋 = 𝑥) = 1 then there cannot be any other possible values that 𝑋
can take, since it always evaluates to 𝑥.
For example, for the coin toss we have
1 1
𝑃 (𝑋 = 0) = , 𝑃 (𝑋 = 1) = , (3.154)
2 2
and for the 6sided die roll we have
1 1 1
𝑃 (𝑋 = 1) = , 𝑃 (𝑋 = 2) = , 𝑃 (𝑋 = 3) = , (3.155)
6 6 6
1 1 1
𝑃 (𝑋 = 4) = , 𝑃 (𝑋 = 5) = , 𝑃 (𝑋 = 6) = . (3.156)
6 6 6
Note how the probabilities sum to 1 in each case. Of course, we could also say that
maybe the coin toss results in heads only 49.9% of the time, and tails another
49.9% of the time, and the remaining 0.2% is the probability for the coin to
balance perfectly on its edge... But usually we ignore subtleties like this and
assume we have idealized coins. Similarly, we could also have a loaded coin
which lands on heads more or less frequently than it lands on tails, but usually
we assume that the coins are fair unless stated otherwise. The same discussion
applies for dies, with any number of sides: they are, by default, assumed to be
idealized and fair.
These probability distributions are uniform, since they assign the same probability
to each value of 𝑋. However, probability distributions need not be uniform. A
simple example is a loaded coin, which perhaps has
1 2
𝑃 (𝑋 = 0) = , 𝑃 (𝑋 = 1) = . (3.157)
3 3
As a more interesting example, if we toss two fair coins 𝑋1 and 𝑋2 and define a
random variable to be the sum of the results, 𝑋 ≡ 𝑋1 + 𝑋2 , then we can get any
of the following 4 outcomes:
0 + 0 = 0, 0 + 1 = 1, 1 + 0 = 1, 1 + 1 = 2. (3.158)
1 1 1
⋅ = , (3.159)
2 2 4
but the outcome 1 appears twice; thus
1 1 1
𝑃 (𝑋 = 0) = , 𝑃 (𝑋 = 1) = , 𝑃 (𝑋 = 2) = . (3.160)
4 2 4
Of course, the probabilities still sum to 1.
Exercise 3.59. Calculate the probability distribution for the sum of two rolls of a
6sided die. This is known to players of roleplaying games (such as Dungeons &
Dragons) as a “2d6”, where we define 𝑛d𝑁 to be the sum of 𝑛 rolls of an 𝑁 sided
because the total probability to get 𝑋 = 𝑥𝑖 is the sum of all the different prob
abilities that involve 𝑋 = 𝑥𝑖 plus something else. To illustrate this, consider the
following random variables:
There are in total 4 different combinations, and their probabilities must sum to 1.
Maybe the probabilities are as follows:
Then clearly the total probability that you pass (whether or not you did the
homework) is 40% + 20% = 60%, and the total probability that you do not pass
is 10% + 30% = 40%. This is exactly what equation (3.162) means.
However, what you really want to know is the probability that you pass given that
you did the homework vs. the probability that you pass given that you did not do
the homework. This is called conditional probability. The probability for outcome
𝑋 given outcome 𝑌 is denoted 𝑃 (𝑋|𝑌 ), where | is read as “given that”. It is
related to 𝑃 (𝑋 ∩ 𝑌 ) as follows:
𝑃 (𝑋 ∩ 𝑌 )
𝑃 (𝑋|𝑌 ) = . (3.169)
𝑃 (𝑌 )
In other words, it is the probability that both 𝑋 and 𝑌 happened, divided by the
probability for 𝑌 to happen. Let us calculate:
𝑃 (pass | did homework) = = 80%, (3.170)
40% + 10%
𝑃 (pass | didn’t do homework) = = 40%. (3.171)
20% + 30%
So you better do all the homework, because that doubles your chances of passing
the course!
Exercise 3.60. There are six more conditional probabilities that we did not cal
culate here. Calculate them. What do you learn from the results?
Exercise 3.61. A test for COVID19 has17 a 1% chance of false positive, i.e.
the result is positive but the patient isn’t actually sick, and a 1% chance of false
negative, i.e. the result is negative but the patient is actually sick. Assume that
0.1% of the population is actually sick.
FYI: This exercise is not based on any real data!
A. Fill in the blanks in the following table:
B. Given that you tested positive, what is the conditional probability that you
actually have COVID19?
C. Given that you tested negative, what is the conditional probability that you
actually don’t have COVID19?
D. Which result should you trust, a positive one or a negative one?
The expected value (or expectation value or mean) ⟨𝑋⟩ of a random variable 𝑋 is
the average over all the possible values 𝑋 can take, weighted by their assigned
⟨𝑋⟩ ≡ ∑ 𝑃 (𝑋 = 𝑥𝑖 ) 𝑥𝑖 , (3.173)
1 1 1 1 1 1 7
⟨𝑋⟩ = ⋅ 1 + ⋅ 2 + ⋅ 3 + ⋅ 4 + ⋅ 5 + ⋅ 6 = = 3.5. (3.175)
6 6 6 6 6 6 2
Observe that the expected value in both cases is not an actual value the random
variable can take! This is often the case with discrete random variables.
We will now prove that the expected value is linear:
The first rule is easy to prove:
⟨𝛼𝑋⟩ = ∑ 𝑃 (𝛼𝑋 = 𝛼𝑥𝑖 ) (𝛼𝑥𝑖 )
= 𝛼 ∑ 𝑃 (𝑋 = 𝑥𝑖 ) 𝑥𝑖 .
To prove the second part, let 𝑋 have 𝑁 possible values 𝑥𝑖 and let 𝑌 have 𝑀
possible values 𝑦𝑖 , as in the previous section. Then in calculating ⟨𝑋 + 𝑌 ⟩ we need
to sum over both 𝑁 and 𝑀 , to ensure we take all possible combinations of 𝑋 and
𝑌 into account. Using equation (3.162), we get:
⟨𝑋 + 𝑌 ⟩ = ∑ ∑ 𝑃 (𝑋 = 𝑥𝑖 , 𝑌 = 𝑦𝑗 ) (𝑥𝑖 + 𝑦𝑗 )
𝑖=1 𝑗=1
= ∑ (∑ 𝑃 (𝑋 = 𝑥𝑖 , 𝑌 = 𝑦𝑗 )) 𝑥𝑖 + ∑ (∑ 𝑃 (𝑋 = 𝑥𝑖 , 𝑌 = 𝑦𝑗 )) 𝑦𝑗
𝑖=1 𝑗=1 𝑗=1 𝑖=1
= ∑ 𝑃 (𝑋 = 𝑥𝑖 ) 𝑥𝑖 + ∑ 𝑃 (𝑌 = 𝑦𝑗 ) 𝑦𝑗
𝑖=1 𝑗=1
= ⟨𝑋⟩ + ⟨𝑌 ⟩ ,
as we wanted to prove.
Exercise 3.62. Calculate the expected value for the sum of two coin tosses and
for a 2d6 roll (the sum of two 6sided dies). First, do it by defining one random
variable for the sum, calculating the probabilities, and then using the definition
of the expected value. Then, do it by considering just one coin or one 6sided die
respectively, and use equation (3.176). Compare your results.
The standard deviation18 measures how far the outcomes are expected to be from
the expected value. To calculate the standard deviation, we take the expected
value of (𝑋 − ⟨𝑋⟩) , that is, the square of the difference between the actual value
of 𝑋 and its expected value ⟨𝑋⟩. Then, we take the square root of the result to
obtain the standard deviation Δ𝑋:
Δ𝑋 ≡ √⟨(𝑋 − ⟨𝑋⟩) ⟩. (3.178)
By the way, the square of the standard deviation is called the variance, but it will not interest
us in this course.
To simplify this, first we note that
2 2
(𝑋 − ⟨𝑋⟩) = 𝑋 2 − 2𝑋 ⟨𝑋⟩ + ⟨𝑋⟩ . (3.179)
In this formula, 𝑋 2 is a random variable (whose values are the squares of the
values of 𝑋), but ⟨𝑋⟩ is just a number, not a random variable. Since it is a
number, we can treat it as a random variable that only returns one value with
100% probability, which means that
Δ𝑋 = √⟨𝑋 2 ⟩ − ⟨𝑋⟩ . (3.181)
This form is easier to do calculations with. For example, for the coin toss we have
from before
⟨𝑋⟩ = , (3.182)
and we also calculate:
1 2 1 2 1
⟨𝑋 2 ⟩ = ⋅0 + ⋅1 = , (3.183)
2 2 2
which gives us
1 1 1
Δ𝑋 = √ − = . (3.184)
2 4 2
This makes sense, as the two actual values of the outcomes, 0 and 1, lie exactly
1/2 away from the expected value ⟨𝑋⟩ = 1/2 in each direction. So they each
“deviate” from it by 1/2.
For the die roll, we have from before
⟨𝑋⟩ = , (3.185)
and we also calculate:
1 2 91
⟨𝑋 2 ⟩ = (1 + 22 + 32 + 42 + 52 + 62 ) = , (3.186)
6 6
which gives us
91 49 35
Δ𝑋 = √ − = √ ≈ 1.7. (3.187)
6 4 12
Exercise 3.63. Calculate the standard deviation for the sum of two coin tosses
and for a 2d6 roll.
The normal (or Gaussian) distribution is depicted in figure 3.2. Unlike the distri
butions we have considered so far, it is continuous; but we won’t worry about
that right now. The shape of the distributions is a “bell curve”, centered on some
mean (or expected) value 𝜇 (equal to 0 in the plot) and with a standard deviation
𝜎. The values of 𝜇 and 𝜎 can be any real numbers.
-3 σ -2 σ -σ σ 2σ 3σ
The “689599.7 rule” tells us the fraction of outcomes which lie within 1, 2 and
3 standard deviations of the mean:
The normal distribution is the most common probability distribution you will en
counter in this course, and in physics and math in general. The reason for that
is that there is a theorem, the central limit theorem, which states that whenever
we take the sum of independent random variables, the probability distribution of
the sum will gradually start to look like a normal distribution. As we add more
and more variables, the sum will get closer and closer to a normal distribution.
This can already be seen in the case of the die rolls. For a 1d6 roll we have a
uniform distribution, as depicted in figure 3.3. For a 2d6 roll, we get a triangular
distribution centered at the mean value of 7, as depicted in figure 3.4. When
solving exercise 3.59, you found that the probability for each possible combination
of die rolls is 16 ⋅ 16 = 36
, but as for the sum of the rolls, the outcomes 2 and 12
appear only once (corresponding to 1+1 and 6+6 respectively), while the outcome
7 appears six times (corresponding to 1+6, 2+5, 3+4, 4+3, 5+2, and 6+1) and
thus has a probability of 6/36 = 1/6, and so on.
For a 3d6 roll, the sum of three rolls of a 6sided die, as depicted in figure 3.5,
we see that the probability distribution is starting to obtain the signature “bell”
shape of the normal distribution. Its mean value is 10.5, as you can calculate
(3 × 3.5). We will get closer and closer to a normal distribution as we increase the
number of dice, that is, the 𝑛 in 𝑛d6. In the limit 𝑛 → ∞, we will precisely obtain
a normal distribution, but even for small values of 𝑛, the approximation is already
close enough for most practical purposes.
1 2 3 4 5 6
Figure 3.3: The distribution of results for one roll of a 6sided die, also
known as 1d6. It is a uniform distribution.
Exercise 3.64. Plot the probability distributions of the sum of 𝑛 coin tosses, from
𝑛 = 1 and up to a value of 𝑛 large enough for the distribution to start looking like
a normal distribution.
Problem 3.65. Write a computer program (I recommend using either Mathemat
ica or Python) that will generate a plot of the probability distribution for an 𝑛d𝑠
roll with an arbitrary number of rolls 𝑛 and an arbitrary number of sides 𝑠 (where
𝑠 = 2 corresponds to a coin). It should also plot the continuous normal distribution
(with the correct mean and standard deviation) over the discrete distribution, to
check how closely they match. Generate some plots using your program, and use
them to demonstrate the central limit theorem for different values of 𝑛 and 𝑠.
2 3 4 5 6 7 8 9 10 11 12
Figure 3.4: The distribution of results for the sum of two rolls of a
6sided die, also known as 2d6. It is triangular.
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Figure 3.5: The distribution of results for the sum of three rolls of a
6sided die, also known as 3d6. It is starting to obtain the “bell” shape
of a normal distribution.
Now that we have obtained the required mathematical tools, we can finally present
quantum theory! This theory provides the correct fundamental framework for vir
tually all of known physics. We will see that its fundamental ingredients are Hilbert
spaces with states and operators. These universal ingredients are then used to
create particular models describing specific physical systems.
In this chapter, we will work exclusively with discrete quantum systems, which
are based on finitedimensional Hilbert spaces. These are much simpler than
continuous quantum systems, which are based on infinitedimensional Hilbert
spaces. In particular, the math is much simpler – just linear algebra, without any
calculus. However, it turns out that finitedimensional Hilbert spaces are sufficient
to define all of the fundamental concepts in quantum theory, and derive almost
all of the most important results.
Consider the finestructure constant, which represents the strength of the elec
tromagnetic interaction:
𝛼 ≈ 0.0073. (4.1)
This constant is not specified in any particular units, such as meters or second; it
is a pure number. We call such a constant dimensionless.
In contrast, some constants in physics are dimensionful. This means that their
numerical value depends on the system of units we use. For example, the speed
of light 𝑐 has the following values in different systems of units:
What this means is that the numerical value of the speed of light does not have
any physical meaning whatsoever19 ! It is merely a consequence of choosing to
work with one system of units and not the other. But units are human constructs;
the universe could not care less what units humans choose to measure things
with. Therefore, none of the numbers written above have any actual meaning.
The numerical values of dimensionless constants are the only numbers that
have a physical meaning, as they do not depend on the system of units. However,
keep in mind that they are still, unavoidably, just parameters that are defined by
humans in a certain way. There’s nothing special about the number 𝛼 ≈ 0.0073
itself; we could also define another parameter 𝛽 ≡ 2𝛼 and use that in our equations
instead. So don’t try to do numerology20 with the specific value of 𝛼! What is
important here is not the numerical value itself, but the fact that it is independent
of the choice of units.
Indeed, in modern SI units the speed of light is defined to be 299,792,458 meters per second,
and this definition is used to measure the length of a meter – not the other way around.
Interestingly, in the past some physicists tried to claim that 𝛼 equals exactly 1/137, but more
precise measurements revealed that this is not actually the case. Still, you will often see it written
as 1/137 for that historical reason.
For this reason, it is most natural to work in Planck units, where:
𝑐=𝐺=ℏ= = 𝑘𝐵 = 1. (4.2)
Here 𝑐 is the speed of light, 𝐺 is the gravitational constant used in Newtonian grav
ity, ℏ is the (reduced) Planck constant used in quantum mechanics, 1/4𝜋𝜀0 is the
Coulomb constant used in electromagnetism, and 𝑘𝐵 is the Boltzmann constant
used in statistical mechanics.
All of these are dimensionful constants, which means we don’t really care about
their numerical values – so we might as well just set them to 1. This allows us to
simply remove them from our equations. For example, instead of writing √ℏ𝐺/𝑐3
– also known as the Planck length – we just write 1, and this allows us to write
the equation 𝐴 = ℏ𝐺𝛾√𝑗 (𝑗 + 1)/𝑐3 as 𝐴 = 𝛾√𝑗 (𝑗 + 1). Much simpler, right21 ?
Planck units are commonly used when doing research in theoretical physics, be
cause they make equations simpler, more elegant, and less cluttered. However,
sometimes we get numerical results that we wish to convert to realworld units
such as kilograms and meters. To do this, all we need to do is to find the combi
nation of the constants in equation (4.2) that has the desired units. For example,
if we know that our pure number represents length, then we can multiply it by
the Planck length √ℏ𝐺/𝑐3 ≈ 1.6 × 10−35 meters to find its value in meters.
Since this course is taught by a theorist, we will use Planck units exclusively. This
means that unlike in a traditional quantum mechanics course, ℏ will not appear
in any of our equations!
Exercise 4.1. Calculate your age, height, mass, and body temperature in Planck
units. For this, you will have to find combinations of the dimensionful constants
we set to 1 in equation (4.2) that give you the desired units, as we did for the
Planck length.
Recall that in section 3.2.2 we defined a Hilbert space as a vector space with an
inner product that is also a complete metric space with respect to that inner prod
uct. Quantum theory can be defined axiomatically using the theory of Hilbert
spaces. In this chapter we will list a total of seven fundamental axioms, plus an
eighth axiom that may or may not be fundamental.
The System Axiom: A system in quantum theory is the mathematical represen
tation of a physical system (such as a particle) as a Hilbert space. The type and
This is the equation for the eigenvalues of the area operator in loop quantum gravity. We will
learn about operators in the next section.
dimension of the Hilbert space will depend on the particular system. Note that
the dimension of the Hilbert space is unrelated to the dimension of spacetime.
In the finitedimensional case, for example when the system involves spin, the
Hilbert space will usually be ℂ𝑛 for some 𝑛, such as ℂ2 , which was used in
most of the examples above and will continue to be used below. In the infinite
dimensional case, for example when the system involves position and momentum
(which are, in general, continuous and not discrete) the Hilbert space it will usually
be a space of functions, which is much more complicated.
The State Axiom: A state of a quantum system is a vector with unit norm in the
system’s Hilbert space, that is, a vector |Ψ⟩ which satisfies
States represent the different configurations the system can have. It is important
to stress that only unit vectors can represent states. If for some reason we have
a vector with nonunit norm, we must normalize it (divide it by its norm) to obtain
a unit vector, which can then represent a state.
Another important aspect of states is that they are only defined up to a complex
phase. This means that, if the vector |Ψ⟩ represents a state, then all vectors of
the form ei 𝜙 |Ψ⟩ for 𝜙 ∈ ℝ represent the same22 state as |Ψ⟩. Note that adding a
phase to a vector does not change the norm, since
∥ei 𝜙 Ψ∥ = √(ei 𝜙 |Ψ⟩) ei 𝜙 |Ψ⟩ = √⟨Ψ| e− i 𝜙 ei 𝜙 |Ψ⟩ = √⟨Ψ|Ψ⟩ = ‖Ψ‖ . (4.4)
where |Ψ⟩ and |Φ⟩ are vectors and 𝜆 is a scalar. In the discrete case, operators
are just matrices on ℂ𝑛 . In the continuous case, where the vectors are actually
functions, the operators will be derivatives acting on the functions23 . In quantum
Actually, the more precise definition is that a state is a ray in a Hilbert space. Rays are defined
as equivalence classes of vectors such that a vector |Ψ⟩ is equivalent to 𝜆 |Ψ⟩ for any scalar 𝜆 ∈ ℂ.
The scalar can be separated into a polar representation, 𝜆 = 𝑟 ei 𝜙 , as we discussed in section 3.1.4.
The 𝑟 part stretches the magnitude of the vector by a factor of 𝑟, and the ei 𝜙 part (the phase)
rotates it by 𝜙 radians. Any vector in the same equivalence class represents the same state, so
multiplying the vector by a scalar will not change the state it represents, whatever the magnitude
and phase are. However, it is conventional to choose states to be represented specifically by a unit
vector from the equivalence class, since otherwise we would have to normalize vectors to 1 all the
time. This is also the reason we only use orthonormal bases in quantum theory.
In it interesting to note that there is often still a sense in which operators and states in a
continuous Hilbert space have the equivalent of indices and elements. You will learn about this in
theory, operators transform states into other states, and they represent an ac
tion performed on the system, such as a measurement, a transformation, or an
evolution in time.
Let the state of a quantum system be |Ψ⟩. Once we have chosen a Hermitian op
erator to represent our observable, we may obtain an orthonormal basis of states
|𝐵𝑖 ⟩ corresponding to the eigenvectors of that operator. In quantum mechanics,
these eigenvectors are called eigenstates.
The Probability Axiom: The inner product ⟨𝐵𝑖 |Ψ⟩ is called the probability am
plitude to measure the eigenvalue 𝜆𝑖 corresponding to the eigenstate |𝐵𝑖 ⟩, given
the state |Ψ⟩. When we take the magnitudesquared of a probability amplitude,
we get the corresponding probability. Thus
|⟨𝐵𝑖 |Ψ⟩| (4.6)
The first four axioms that we presented here simply defined the meaning of sys
tems, states, operators, and observables in mathematical terms. The Probability
Axiom, on the other hand, has to do with the relations between these mathemat
ical structures. One can thus justifiably ask: why would this be a probability in
the first place?
Unfortunately, since this is an axiom, it cannot be derived from anything more
fundamental, such as other axioms. However, at the very least, we can verify
that it indeed behaves exactly like a probability is expected to. This follows from
the fact that
𝑛 𝑛
∑ |⟨𝐵𝑖 |Ψ⟩| = ∑⟨𝐵𝑖 |Ψ⟩∗ ⟨𝐵𝑖 |Ψ⟩
𝑖=1 𝑖=1
= ∑⟨Ψ|𝐵𝑖 ⟩⟨𝐵𝑖 |Ψ⟩
= ⟨Ψ| (∑ |𝐵𝑖 ⟩⟨𝐵𝑖 |) |Ψ⟩
= ⟨Ψ|Ψ⟩
= 1,
• Taking the complex conjugate of an inner product switches the order of the
We could write down a classical theory which assigns probabilities to each mea
surement outcome; but since probabilities must be real nonnegative numbers,
when they are added the result is always a higher probability. Therefore, classi
cal probabilities interfere only constructively. In quantum theory, on the other
hand, one does not add probabilities, but probability amplitudes; and as we will
see, they can interfere with one another both constructively and destructively,
just as we discussed in section 2.1.3.
If the probability amplitudes for two events have opposite complex phases (for
example, one is positive and one is negative) they can even cancel each other out
completely – so that neither event happens, since their total probability amplitude
(and thus also probability) is zero! This, of course, can never happen with classical
Exercise 4.2. In the Hilbert space ℂ2 , consider the Hermitian operator
0 1
𝜎𝑥 ≡ ( ). (4.7)
1 0
Find its eigenstates (make sure they are normalized to 1!) and eigenvalues. Then,
calculate the probability to measure each of the eigenvalues given that the system
is in the state
1 1
|Ψ⟩ ≡ √ ( ) . (4.8)
10 3
4.1.5 Superposition
Remember that each coefficient ⟨𝐵𝑖 |Ψ⟩ is the probability amplitude to measure
the eigenvalue corresponding to the eigenstate |𝐵𝑖 ⟩ given that the system is in
the state |Ψ⟩. So this is a sum over the basis states |𝐵𝑖 ⟩, corresponding to the
possible measurement outcomes, with a probability amplitude attached to each
of these outcomes, which depends on the state |Ψ⟩. Such a linear combination of
states25 is called a superposition.
More generally, a superposition is any linear combination of states. The states don’t have to
be basis eigenstates and the coefficients don’t have to be probability amplitudes – but they usually
The concept of superposition is responsible for many of the weird properties of
quantum mechanics, as we will soon see. Importantly, superposition is not an
axiom, but simply an (almost trivial) mathematical property of vectors in Hilbert
spaces. This means that superposition follows automatically from the previous
axioms; it is not something that needs to be introduced separately.
You will often hear people (including physicists, if they are being sloppy) say that
superposition means that “the system is in multiple states at the same time”. For
example, it is frequently said about particles – which can be in a superposition
of eigenstates corresponding to different outcomes for the measurement of their
position – that “the particle is in multiple places at the same time”. However, this
is a common misconception – or at the very least, an overly literal interpretation
of the math.
The fact that a state |Ψ⟩ can be written in a superposition of eigenstates |𝐵𝑖 ⟩
doesn’t mean that the system is actually “in” all of these different states at once.
The system is, in fact, in only one state: the state |Ψ⟩. This state can be repre
sented in the eigenbasis |𝐵𝑖 ⟩, and doing this reveals the probability to measure
each of the eigenvalues. However, one can always find26 an orthonormal basis
where |Ψ⟩ itself is one of the basis states – and often, this can be an eigenbasis
corresponding to another observable of the system. In that basis, the system
will not be in a superposition – it will just be in the state |Ψ⟩, with a probability
amplitude of ⟨Ψ|Ψ⟩ = 1!
So instead of saying that “the system is in all of the states |𝐵1 ⟩ , … , |𝐵𝑛 ⟩ at once”, it
is more precise to say that the system is currently in the state |Ψ⟩, and a measure
ment of the observable with the eigenbasis |𝐵𝑖 ⟩ could yield different outcomes,
with the probability amplitude for outcome number 𝑖 given by the projection27 of
|Ψ⟩ on |𝐵𝑖 ⟩, calculated by taking the inner product ⟨𝐵𝑖 |Ψ⟩. It sounds less cool and
mysterious, but it is more accurate and less prone to confusion and misinterpre
Of course, this description is too technical for the average person, which is why
physicists usually choose to just say, incorrectly, that “the system is in multiple
states at the same time”. But now that you actually know the math of quantum
theory, you should be able to understand the correct definition of superposition!
I will let you digest all of this for now, and in section 4.2.4 we will discuss an
analogy, using a concrete quantum system, that should help you understand this
Using the GramSchmidt process mentioned in footnote (15).
In ℝ𝑛 , the projection of v on w (or w on v) is given by the dot product v ⋅ w. Projections in ℂ𝑛
generalize this concept, with the inner product replacing the dot product.
Exercise 4.3. Consider again the Hermitian operator from exercise 4.2,
0 1
𝜎𝑥 ≡ ( ). (4.10)
1 0
Exercise 4.4. A quantum system described by the Hilbert space ℂ3 has an ob
servable corresponding to a Hermitian operator 𝐴 with the matrix representation
0 1 0
⎛ ⎞
⎜ 1 0 0 ⎟
⎟. (4.11)
⎝ 0 0 2 ⎠
A. Find its eigenvalues and their corresponding eigenstates. Make sure the states
are normalized to 1.
B. Find three different states such that a measurement of the observable 𝐴
will produce the lowest eigenvalue with probability 1/7, the highest eigenvalue
with probability 2/7, and the middle eigenvalue with probability 4/7. When we
say different states, we mean that the vectors that represent them cannot be
scalar multiples of each other; recall from footnote (22) that such vectors are in
the same equivalence class, and thus represent the same state. Make sure the
states are normalized to 1.
C. Write the state
1 ⎛ ⎞
|Ψ⟩ ≡ √ ⎜ −2 ⎟ ⎟ (4.12)
⎝ 3 − i ⎠
as a superposition of eigenstates of 𝐴, and calculate the probabilities to measure
each eigenvalue of 𝐴 given that the system is in the state |Ψ⟩. Verify that the
probabilities sum to 1.
where 𝛿𝑖𝑗 is the Kronecker delta, which we defined in equation (3.53):
0 if 𝑖 ≠ 𝑗,
𝛿𝑖𝑗 = { (4.14)
1 if 𝑖 = 𝑗.
𝑛 𝑛
⟨Ψ|𝐴|Ψ⟩ = ⟨Ψ| (∑ |𝐵𝑖 ⟩⟨𝐵𝑖 |) 𝐴 (∑ |𝐵𝑗 ⟩⟨𝐵𝑗 |) |Ψ⟩
𝑖=1 𝑗=1
𝑛 𝑛
= ∑ ∑⟨Ψ|𝐵𝑖 ⟩⟨𝐵𝑖 |𝐴|𝐵𝑗 ⟩⟨𝐵𝑗 |Ψ⟩
𝑖=1 𝑗=1
𝑛 𝑛
= ∑ ∑ 𝜆𝑗 𝛿𝑖𝑗 ⟨Ψ|𝐵𝑖 ⟩⟨𝐵𝑗 |Ψ⟩.
𝑖=1 𝑗=1
When taking the sum over 𝑗, the Kronecker delta 𝛿𝑖𝑗 is always 0 except when 𝑗 = 𝑖.
Therefore the sum over 𝑗 always reduces to just one element, the one where 𝑗 = 𝑖.
We get:
⟨Ψ|𝐴|Ψ⟩ = ∑ 𝜆𝑖 ⟨Ψ|𝐵𝑖 ⟩⟨𝐵𝑖 |Ψ⟩
= ∑ 𝜆𝑖 ⟨Ψ|𝐵𝑖 ⟩⟨Ψ|𝐵𝑖 ⟩∗
= ∑ 𝜆𝑖 |⟨Ψ|𝐵𝑖 ⟩| ,
where in the second line we used the fact that switching the order of the vectors
in the inner product is equivalent to taking the complex conjugate.
Recall that |⟨Ψ|𝐵𝑖 ⟩| is the probability to measure the eigenvalue 𝜆𝑖 associated with
the eigenstate |𝐵𝑖 ⟩ given the state |Ψ⟩. Therefore, this is a sum of the possible
values of the measurement of 𝐴, weighted by their probabilities. But this exactly
the expected value for the measurement of 𝐴, as we defined in equation (3.173).
For this reason, we sometimes simply write ⟨𝐴⟩ (the usual notation for the ex
pected value) instead of ⟨Ψ|𝐴|Ψ⟩, as long as it is clear that the expected value is
taken with respect to the state |Ψ⟩. If we want to specify the state explicitly, we
can also use the notation
⟨𝐴⟩Ψ ≡ ⟨Ψ|𝐴|Ψ⟩. (4.16)
Note that the terms “expected value” and “expectation value” are often used
interchangeably, but the former seems to be more popular in classical probability
theory while the latter is more popular in quantum theory.
1 0
𝐴≡( ), (4.17)
0 −1
1 1
|Ψ1 ⟩ = √ ( ) , (4.18)
2 1
1 1
|Ψ2 ⟩ = √ ( ) , (4.19)
5 2
1 3
|Ψ3 ⟩ = √ ( ) . (4.20)
13 2
Exercise 4.6. Calculate ⟨𝐴⟩Ψ for 𝐴 and |Ψ⟩ as defined in exercise 4.4:
0 1 0 1
⎛ ⎞ 1 ⎛ ⎞
⎜ 1 0 0 ⎟
⎟, |Ψ⟩ ≡ √ ⎜ ⎜ −2 ⎟
⎟. (4.21)
⎝ 0 0 2 ⎠ ⎝ 3−i ⎠
Then, calculate the expected value explicitly as defined in equation (3.173), using
the probabilities you calculated in part (C) of exercise 4.4, and verify that you get
the same result.
To summarize, here are the axioms of quantum theory we formulated so far. Here
we formulate them specifically for discrete systems with finitedimensional Hilbert
1. The System Axiom: Discrete physical systems are represented by complex
𝑛dimensional Hilbert spaces ℂ𝑛 , where 𝑛 depends on the specific system.
2. The State Axiom: The states of the system are represented by unit 𝑛
vectors in the system’s Hilbert space, up to a complex phase.
3. The Operator Axiom: The operators on the system, which act on states
to produce other states, are represented by 𝑛 × 𝑛 matrices in the system’s
Hilbert space.
• Expectation Value: If the system is in the state |Ψ⟩, the expectation value
for the measurement of the observable 𝐴 is given by ⟨Ψ|𝐴|Ψ⟩.
There are some more axioms that we will add later, but first let us discuss a
concrete example of a physical quantum systems and see these axioms in action.
Problem 4.7. Are these axioms enough to actually do physics? If not, what do
you think is missing and why?
a 2dimensional Hilbert space, and is thus called a twostate system. All such
system can also be used as qubits, or quantum bits – where one state (doesn’t
matter which one) corresponds to 0 and the other state corresponds to 1. Let us
now describe such systems in detail.
0 1 0 −i 1 0
𝜎𝑥 ≡ ( ), 𝜎𝑦 ≡ ( ), 𝜎𝑧 ≡ ( ). (4.23)
1 0 i 0 0 −1
As the notation suggests, each matrix is associated with a spatial axis: 𝑥, 𝑦, and
𝑧. These three matrices have the following properties (here 𝑖 stands for 𝑥, 𝑦, or
• They are Hermitian: 𝜎𝑖† = 𝜎𝑖 . This means they can represent observables.
• They are unitary: 𝜎𝑖† = 𝜎𝑖−1 . This means they can represent transformations.
– Since they are both Hermitian and unitary, they are their own inverse:
𝜎𝑖 = 𝜎𝑖† = 𝜎𝑖−1 . This means that 𝜎𝑖2 = 1. A matrix which is its own inverse
is called involutory.
1 1 1 1
|+𝑥⟩ ≡ |+⟩ ≡ √ ( ) , |−𝑥⟩ ≡ |−⟩ ≡ √ ( ). (4.24)
2 1 2 −1
1 1 1 1
|+𝑦⟩ ≡ |+ i⟩ ≡ √ ( ) , |−𝑦⟩ ≡ |− i⟩ ≡ √ ( ). (4.25)
2 i 2 −i
1 0
|+𝑧⟩ ≡ |0⟩ ≡ ( ), |−𝑧⟩ ≡ |1⟩ ≡ ( ), (4.26)
0 1
• Since the Pauli matrices are normal, the eigenstates of each matrix form an
orthonormal eigenbasis of ℂ2 . As you can see, the eigenstates of 𝜎𝑧 are just
the standard basis.
1 1
|+⟩ = √ (|0⟩ + |1⟩) , |−⟩ = √ (|0⟩ − |1⟩) , (4.27)
2 2
1 1
|0⟩ = √ (|+⟩ + |−⟩) , |1⟩ = √ (|+⟩ − |−⟩) . (4.28)
2 2
Problem 4.8. Prove that 𝜎𝑥 , 𝜎𝑦 and 𝜎𝑧 are Hermitian.
Problem 4.10. Consider the real vector space of 2 × 2 Hermitian matrices. This is
a vector space where the vectors are Hermitian matrices and the scalars are
real numbers. Don’t get confused: in an abstract vector space, anything can be
a “vector” – including numbers, matrices, tensors of higher rank, functions, and
even weirder stuff.
A. Show that the real vector space of 2 × 2 Hermitian matrices satisfies all of the
conditions in our definition of a vector space in section 3.2.1.
B. Show that the set {1, 𝜎𝑥 , 𝜎𝑦 , 𝜎𝑧 }, composed of the identity matrix 1 and the
three Pauli matrices, is a basis of the real vector space of 2 × 2 Hermitian matrices.
(Since we haven’t defined an inner product on this space, you don’t need to show
that the basis is orthonormal.)
Recall that in section 2.1.4 we saw that, in the SternGerlach experiment, the
measurement of angular momentum of a particle had only one of two discrete
results: “spin up” (if the particle is deflected up) or “spin down” (if the particle is
deflected down).
More generally, in quantum theory, every particle has a property called spin, which
is a halfinteger 𝑠:
1 3
𝑠 ∈ {0, , 1, , 2, …} . (4.29)
2 2
The measurement of intrinsic angular momentum of a particle of spin 𝑠, in any
direction, always returns one of the results in the set
{−𝑠, −𝑠 + 1, … , 𝑠 − 1, 𝑠} . (4.30)
• A particle of spin 0 always has intrinsic angular momentum 0;
• A particle of spin 3/2 has intrinsic angular momentum −3/2, −1/2, +1/2, or
• and so on.
The particles in the SternGerlach experiment have spin 1/2, where “spin up”
corresponds to intrinsic angular momentum +1/2 and “spin down” corresponds to
−1/2. Since these particles have exactly two possible states, spin up and down,
they can be represented as a twostate quantum system.
The Pauli matrix 𝜎𝑖 is a Hermitian operator, and thus it should correspond to an
observable. That observable is twice the spin in the 𝑖 direction, since the Pauli
matrices have eigenvalues ±1, but the spin should be ±1/2. It is thus customary
to define
1 1 1
𝑆𝑥 ≡ 𝜎𝑥 , 𝑆𝑦 ≡ 𝜎𝑦 , 𝑆𝑧 ≡ 𝜎𝑧 , (4.31)
2 2 2
such that 𝑆𝑖 is a Hermitian operator corresponding to spin ±1/2 along the 𝑖 direc
tion. You can check that 𝑆𝑖 have the same eigenstates as 𝜎𝑖 , but they correspond
to the eigenvalues ±1/2 instead of ±1.
In problem 4.10 you proved that the set {1, 𝜎𝑥 , 𝜎𝑦 , 𝜎𝑧 } forms a basis for the real
vector space of 2 × 2 Hermitian matrices. This means that any Hermitian oper
ator on the Hilbert space ℂ2 can be written as a linear combination of these 4
matrices. Since Hermitian operators correspond to observables, this means that
every possible observable in ℂ2 can be written in terms of the Pauli matrices
and the identity matrix.
In particular, given a unit vector v∈ℝ3 pointing in an arbitrary direction in space
(the real space, not the Hilbert space!)
1 𝑧 𝑥 − i𝑦
𝑆v ≡ 𝑥𝑆𝑥 + 𝑦𝑆𝑦 + 𝑧𝑆𝑧 = ( ), (4.33)
2 𝑥 + i𝑦 −𝑧
1 1+𝑧 1 1−𝑧
|↑⟩ ≡ ( ), |↓⟩ ≡ ( ). (4.34)
√2 (1 + 𝑧) 𝑥 + i𝑦 √2 (1 − 𝑧) −𝑥 − i 𝑦
So we learn that, for a spin 1/2 particle, the measurement of intrinsic angular
momentum along any direction in space always yields one of exactly two possible
results – spin up, +1/2, or spin down, −1/2 – with the probability amplitudes
calculated using the Hermitian operator 𝑆v .
Exercise 4.11. Show that the eigenstates |↑⟩ and |↓⟩ indeed correspond to the
eigenstates of 𝑆𝑥 , 𝑆𝑦 , and 𝑆𝑧 – except the state |1⟩ (the −1/2 eigenstate of 𝑆𝑧 ),
which results in a division by zero in the bottom component.
1 1
|Ψ⟩ ≡ √ ( ) . (4.35)
10 3
Problem 4.13.
A. Let us define the matrix commutator (or operator commutator):
where the indices 𝑖, 𝑗, 𝑘 take the values {1, 2, 3} corresponding to {𝑥, 𝑦, 𝑧}, and 𝜖𝑖𝑗 𝑘
is the LeviCivita symbol, defined as
By even permutation or odd permutation we mean that the permutation involves
exchanging elements an even or odd number of times. For example, (1, 3, 2) is an
odd permutation, because we exchanged elements once: 2 ↔ 3. However, (3, 1, 2)
is an even permutation, because we exchanged elements twice: 2 ↔ 3 and then
1 ↔ 3.
C. The matrix anticommutator (or operator anticommutator) is defined as fol
{𝐴, 𝐵} ≡ 𝐴𝐵 + 𝐵𝐴. (4.40)
{𝑆𝑖 , 𝑆𝑗 } = 𝛿𝑖𝑗 , (4.41)
where 𝛿𝑖𝑗 is the Kronecker delta (times the identity matrix 1).
4.2.3 Qubits
A classical bit can be in one of two states: 0 or 1. A quantum bit, or qubit for
short, is instead in a superposition of two states, denoted |0⟩ and |1⟩:
2 2
|Ψ⟩ = 𝑎 |0⟩ + 𝑏 |1⟩ , |𝑎| + |𝑏| = 1, (4.42)
Since the system has two states, it can be represented by the Hilbert space ℂ2 ,
and it is conventional to choose |0⟩ and |1⟩ to be the vectors in the standard basis,
which in this case is called the computational basis:
1 0
|0⟩ ≡ ( ), |1⟩ ≡ ( ). (4.44)
0 1
Any twostate quantum system can serve as a qubit. In fact, even systems with
more than two states can be used, as long as two of these states can be decoupled
(separated) from the rest. Some examples include:
• Any spin 1/2 particle, such as an electron, where |0⟩ and |1⟩ are the eigen
states of the spin operator along the 𝑧 direction, 𝑆𝑧 , so they represent spin
up and spin down, respectively, along that direction.
• The polarization of a photon, where |0⟩ is horizontal and |1⟩ is vertical polariza
tion. (In classical electromagnetism, an electromagnetic wave is composed
of oscillating electric and magnetic fields, and the polarization is the direction
of the electric field.)
Qubits are used in quantum computers as the basic units of computations, just
like bits in classical computers. Since so many different systems can be rep
resented mathematically in the same way, we can build quantum computers in
many different ways. We will discuss quantum computers (from the theoretical
point of view) in more details later.
1 1 1 0 1 1
|+⟩ = √ (|0⟩ + |1⟩) = √ (( ) + ( )) = √ ( ) . (4.45)
2 2 0 1 2 1
For simplicity, let us forget for a second that we are dealing with complex vectors,
and imagine that they are vectors in ℝ2 , since that is much easier to visualize; see
figure 4.1. As vectors in ℝ2 , the state |0⟩ = (1, 0) points east and the eigenstate
|1⟩ = (0, 1) points north. This does not mean that |+⟩ is “pointing both north and
east at the same time”. It does not, in fact, point in either of these directions;
instead, it points in a third direction, namely northeast.
In other words, if we just look at the vector represented by |+⟩, without consid
ering any particular basis, it is just a vector pointing in one particular direction,
and in this direction only. The superposition only exists if we insist to represent
|+⟩ in this particular eigenbasis, but there can be another eigenbasis, e.g. the
eigenbasis composed of |+⟩ itself along with |−⟩, in which |+⟩ is not in a superpo
What this means is that a state only appears to be in a superposition when we
choose a particular observable and represent that state as a superposition of
|+〉= (1,1)
〈1|+〉= 1
〈+|+〉 = 1
〈0|±〉= |0〉=(1,0)
〈-|+〉 = 0
〈1|-〉=- 1
|-〉= (1,-1)
Figure 4.1: The eigenbasis {|0⟩ , |1⟩}, in red, and the eigenbasis {|+⟩ , |1⟩},
in blue. A qubit in the state |+⟩ is in a superposition of |0⟩ and |1⟩, but
this does not mean it is in the states |0⟩ and |1⟩ “at the same time” – it
is only in one state, |1⟩.
eigenstates with respect to that observable. But the system itself is still in the
same state, regardless of which eigenbasis we choose. The projections of the
state of the system on the basis eigenstates give us the probability amplitudes
relevant to that measurement; for example, in figure 4.1 we see that the probabil
ity amplitudes to measure |0⟩ or |1⟩ are both 1/ 2. However, in the basis consisting
of |+⟩ and |−⟩, we instead have that the probability amplitude to measure |+⟩ is 1
and the probability amplitude to measure |−⟩ is 0.
In the specific case where the qubit is the spin of a spin1/2 particle, we know
that if the qubit is in the state |+⟩, this means that a measurement of spin along
the 𝑥 axis will yield spin up with probability 1. We can say, if we want, that the
system is in a state of spin up along the 𝑥 axis, and this defines the state uniquely.
We also see that, in this basis, the system is not in a superposition; it is just one
However, in the basis corresponding to measurement of spin along the 𝑧 axis, we
may write the state as a superposition, |+⟩ = (|0⟩ + |1⟩) / 2. This doesn’t mean
that the qubit is in both the states |0⟩ and |1⟩ “at the same time”; it means that
it is in a state where a measurement of spin along the 𝑧 axis will yield spin up or
spin down with equal probability.
If being in the superposition (|0⟩ + |1⟩) / 2 doesn’t mean that the qubit is both |0⟩
and |1⟩ at the same time, perhaps it could mean that the qubit is either |0⟩ or
|1⟩, but we just don’t know which one it is, and when we perform a measurement
we will discover which state it was in all along? Unfortunately, that interpretation
doesn’t work either. Theories where the system is in only one particular unknown
(“hidden”) state, but we only discover which one after we measure it, are called
hidden variable theories. They are mostly thought to be incorrect, since they
violate a theorem called Bell’s theorem, which we will learn about in section 4.3.6.
Some theories of hidden variables that are compatible with Bell’s theorem do
exist, but most physicists don’t believe they could replace quantum mechanics,
because they are complicated, contrived, and nonlocal; the latter means that
they allow fasterthanlight or instantaneous communication29 . Indeed, some
nonlocal hidden variable theories, such as de Broglie–Bohm theory, require all
of the particles in the universe to be able to instantaneously communicate with
each other at all times!
So in conclusion, being in a superposition of two states doesn’t mean being in
both the first state and the second state, but also doesn’t mean being in either
the first state or the second state. Instead, we must conclude that the terms
“and” and “or” are classical terms that can only be used in a classical theory;
superposition is a new quantum term, which simply does not have any classical
Compare this with our discussion of waveparticle duality in section 2.1.3. This
duality doesn’t mean that light is “both a wave and a particle”, and it also doesn’t
mean that light is “either a wave or a particle”. What it really means is that the
classical concepts of “wave” and “particle” are not the proper way to describe re
ality. Similarly, it turns out that the classical terms “and” and “or” cannot be used
to describe reality at the deepest level; for that, we need to introduce quantum
This doesn’t necessarily mean the theory allows us to send information faster than light. The
components of the system can communicate with each other faster than light, but not necessarily in
a way that we can actually control or make use of. We will discuss this in more detail in section 4.3.6.
4.3 Composite Systems and Quantum Entanglement
is another Hilbert space, representing the composite system which combines the
two original systems. The dimension of the composite Hilbert space is the product
of the dimensions of the individual spaces:
However, not all states in ℋ𝐴 ⊗ ℋ𝐵 are necessarily of this form; this fact will
prove essential soon, when we discuss entanglement. Furthermore, if |𝐴𝑖 ⟩, 𝑖 ∈
{1, … , 𝑚} is an orthonormal basis of ℋ𝐴 and ∣𝐵𝑗 ⟩, 𝑗 ∈ {1, … , 𝑛} is an orthonormal
basis of ℋ𝐵 , then
and for |Ψ𝐴 ⟩ ∈ ℋ𝐴 and |Θ𝐵 ⟩ , |Ω𝐵 ⟩ ∈ ℋ𝐵 we have
In particular, notice from equation (4.50) that scalars commute with the tensor
product, so we can move them in or out of the product as we see fit – just as,
until now, we have been moving scalars in and out of inner and outer products.
Importantly, the tensor product itself is not commutative:
The order matters, since the first state must come from the first Hilbert space,
and the second state must come from the second Hilbert space – which may
be a completely different space with completely different states. For example,
in the tensor product ℂ2 ⊗ ℂ3 the first state must be represented by a 2vector
while the second state must be represented by a 3vector – so they cannot be
Now, if 𝑂𝐴 is an operator on ℋ𝐴 and 𝑂𝐵 is an operator on ℋ𝐵 , then 𝑂𝐴 ⊗ 𝑂𝐵 is
an operator on ℋ𝐴 ⊗ ℋ𝐵 , which is defined such that each operator acts only on
the state coming from the same space as that operator:
In other words, the first operator in the product 𝑂𝐴 ⊗ 𝑂𝐵 acts only on the first
state in the product |Ψ𝐴 ⟩ ⊗ |Ψ𝐵 ⟩, and the second operator in the product 𝑂𝐴 ⊗ 𝑂𝐵
acts only on the second state in the product |Ψ𝐴 ⟩ ⊗ |Ψ𝐵 ⟩. This has to be the case,
since e.g. in the tensor product ℂ2 ⊗ ℂ3 the first operator must be represented by
a 2×2 matrix and act on 2vectors while the second operator must be represented
by a 3 × 3 matrix and act on 3vectors. Note that, as for the tensor product of
states, not all operators in ℋ𝐴 ⊗ ℋ𝐵 are necessarily of this form.
If we have two bras ⟨Ψ𝐴 | ∈ ℋ𝐴 and ⟨Ψ𝐵 | ∈ ℋ𝐵 , their tensor product ⟨Ψ𝐴 | ⊗ ⟨Ψ𝐵 | is a
bra in ℋ𝐴 ⊗ ℋ𝐵 , and the inner product of this bra with a ket of the form |Φ𝐴 ⟩ ⊗ |Φ𝐵 ⟩
in ℋ𝐴 ⊗ ℋ𝐵 is defined by taking the inner products of each bra with the ket from
the same space:
The first bra acts only on the first ket and the second bra acts only on the
second ket. Once again, the inner product must work this way, since for example
in ℂ2 ⊗ ℂ3 we can only take the inner product of 2vectors with 2vectors and
3vectors with 3vectors – the inner product of a 2vector with a 3vector is
Similarly, if we have two operators 𝑂𝐴 , 𝑃𝐴 ∈ ℋ𝐴 and two operators 𝑂𝐵 , 𝑃𝐵 ∈ ℋ𝐵 ,
then the composite operator 𝑂𝐴 ⊗ 𝑂𝐵 ∈ ℋ𝐴 ⊗ ℋ𝐵 acts on the composite operator
𝑃𝐴 ⊗ 𝑃𝐵 ∈ ℋ𝐴 ⊗ ℋ𝐵 in the only way that makes sense, with each operator acting
on the operator from the same space:
Finally, above we stated the Composite System Axiom for two quantum systems,
but we can use it recursively to define the composite Hilbert space of any number
of systems: just take the tensor product of all the Hilbert spaces together,
ℋ𝐴 ⊗ ℋ 𝐵 ⊗ ℋ 𝐶 ⊗ … (4.57)
Consider the tensor product ℂ𝑚 ⊗ ℂ𝑛 . Since the dimension of this Hilbert space is
𝑚𝑛, and since in any finitedimensional Hilbert space we know how to represent
states as vectors and operators as matrices of the same dimension as the Hilbert
space (as discussed in section 3.2.7 and section 3.2.15 respectively), we conclude
that states in ℂ𝑚 ⊗ ℂ𝑛 can be represented as 𝑚𝑛vectors and operators in ℂ𝑚 ⊗ ℂ𝑛
can be represented as 𝑚𝑛 × 𝑚𝑛 matrices. In other words, ℂ𝑚 ⊗ ℂ𝑛 is isomorphic
to ℂ𝑚𝑛 .
Explicitly, for two states represented by the vectors30
Ψ1 Φ1
⎜ ⎞
⎟ ⎛ ⎞
|Ψ⟩ ≡ ⎜ ⋮ 𝑚
⎟∈ℂ , |Φ⟩ ≡ ⎜ ⋮ ⎟
⎜ 𝑛
⎟∈ℂ , (4.58)
⎝ Ψ𝑚 ⎠ ⎝ Φ𝑛 ⎠
Φ1 Ψ1 Φ1
⎛ ⎛ ⎞ ⎞ ⎛ ⎞
⎜ Ψ1 ⎜ ⋮ ⎟
⎜ ⎟ ⎟
⎟ ⎜
⎜ ⋮ ⎟
⎜ ⎟
⎟ ⎜
⎜ ⎟
Ψ1 |Φ⟩ ⎜ ⎝ Φ 𝑛 ⎠ ⎟ ⎜ Ψ1 Φ𝑛 ⎟
⎛ ⎞ ⎜ ⎟ ⎜ ⎟
|Ψ⟩ ⊗ |Φ⟩ ≡ ⎜
⎜ ⋮ ⎟ =
⎟ ⎜ ⎜ ⋮ ⎟
⎟ = ⎜
⎜ ⋮ ⎟
⎟ ∈ ℂ𝑚𝑛 . (4.59)
⎜ ⎟ ⎜ ⎟
⎝ Ψ𝑚 |Φ⟩ ⎠ ⎜ ⎜ Φ 1 ⎟
⎟ ⎜
⎜ Ψ𝑚 Φ1 ⎟
⎜ ⎛ ⎞
⎜ Ψ𝑚 ⎜ ⋮ ⎟ ⎟ ⎜
⎜ ⎟ ⎟ ⎜ ⋮ ⎟
⎝ ⎝ Φ𝑛 ⎠ ⎠ ⎝ Ψ𝑚 Φ𝑛 ⎠
For example:
3 1⋅3 3
⎜ 1⋅( ) ⎞ ⎟ ⎛ ⎞ ⎛ ⎞
1 3 ⎜ 4 ⎟ ⎜ 1⋅4 ⎟ ⎜ 4 ⎟
( )⊗( )=⎜ ⎟
⎟ =⎜
⎟ =⎜ ⎟
⎜ ⎟. (4.60)
2 4 ⎜ 2⋅( 3 ) ⎟
⎜ ⎟ ⎜ 2⋅3 ⎟ ⎜⎜ 6 ⎟
⎝ 4 ⎠ ⎝ 2⋅4 ⎠ ⎝ ⎠8
we define the tensor product as follows32 :
𝐴11 𝐵 ⋯ 𝐴1𝑚 𝐵
⎜ ⎞
𝐴⊗𝐵 ≡⎜ ⋮ ⋱ ⋮ ⎟
⎝ 𝑚1 𝐵 ⋯ 𝐴 𝑚𝑚 𝐵 ⎠
𝐵11 ⋯ 𝐵1𝑛 𝐵11 ⋯ 𝐵1𝑛
⎛ ⎛
⎜ ⎞
⎟ ⎛
⎜ ⎞
⎟ ⎞
⎜ 𝐴11 ⎜ ⋮ ⋱ ⋮ ⎟ ⋯ 𝐴1𝑚 ⎜ ⋮ ⋱ ⋮ ⎟ ⎟
⎜ ⎟
⎜ ⎝ 𝐵𝑛1 ⋯ 𝐵𝑛𝑛 ⎠ ⎝ 𝐵𝑛1 ⋯ 𝐵𝑛𝑛 ⎠ ⎟
⎜ ⋮ ⋱ ⋮ ⎟
⎜ 𝐵11 ⋯ 𝐵1𝑛 𝐵11 ⋯ 𝐵1𝑛 ⎟
⎜ ⎛ ⎞ ⎛ ⎞ ⎟
⎜ 𝐴𝑚1 ⎜ ⋮ ⋱ ⋮ ⎟ ⎜
⎟ ⋯ 𝐴𝑚𝑚 ⎜ ⋮ ⋱ ⋮ ⎟
⎟ ⎟
⎝ ⎝ 𝐵𝑛1 ⋯ 𝐵𝑛𝑛 ⎠ 𝐵
⎝ 𝑛1 ⋯ 𝐵 𝑛𝑛 ⎠ ⎠
𝐴11 𝐵11 ⋯ 𝐴11 𝐵1𝑛 ⋯ 𝐴1𝑚 𝐵11 ⋯ 𝐴1𝑚 𝐵1𝑛
⎜ ⎞
⎜ ⋮ ⋱ ⋮ ⋮ ⋱ ⋮ ⎟
⎜ ⎟
⎜ 𝐴11 𝐵𝑛1 ⋯ 𝐴11 𝐵𝑛𝑛 ⋯ 𝐴1𝑚 𝐵𝑛1 ⋯ 𝐴1𝑚 𝐵𝑛𝑛 ⎟
⎜ ⋮ ⋮ ⋱ ⋮ ⋮ ⎟
⎟ ∈ ℂ𝑚𝑛×𝑚𝑛 .
⎜ 𝐴𝑚1 𝐵11 ⋯ 𝐴𝑚1 𝐵1𝑛 ⋯ 𝐴𝑚𝑚 𝐵11 ⋯ 𝐴𝑚𝑚 𝐵1𝑛 ⎟
⎜ ⎟
⎜ ⋮ ⋱ ⋮ ⋮ ⋱ ⋮ ⎟
⎝ 𝐴𝑚1 𝐵𝑛1 ⋯ 𝐴𝑚1 𝐵𝑛𝑛 ⋯ 𝐴𝑚𝑚 𝐵𝑛1 ⋯ 𝐴𝑚𝑚 𝐵𝑛𝑛 ⎠
For example:
3 0 3 0
⎜ 0⋅( ) 1⋅( ) ⎞
0 1 3 0 ⎜ 0 4 0 4 ⎟
( )⊗( ⎜
)=⎜ ⎟
2 0 0 4 ⎜
⎜ 2⋅( 3 0 ) 0⋅( 3 0 ⎟
) ⎟
⎝ 0 4 0 4 ⎠
0⋅3 0⋅0 1⋅3 1⋅0
⎜ ⎞
0⋅0 0⋅4 1⋅0 1⋅4
⎜ 2⋅3 2⋅0 0⋅3 0⋅0 ⎟
⎝ 2⋅0 2⋅4 0⋅0 0⋅4 ⎠
0 0 3 0
⎜ ⎞
0 0 0 4
⎟ .
⎜ 6 0 0 0 ⎟
⎝ 0 8 0 0 ⎠
Exercise 4.16. For the specific |Ψ⟩, |Φ⟩, 𝐴, and 𝐵 we used above:
1 3
|Ψ⟩ ≡ ( ), |Φ⟩ ≡ ( ), (4.62)
2 4
Note that the tensor product of vectors is a special case of the tensor product of matrices, with
the vectors treated as singlecolumn matrices.
0 1 3 0
𝐴≡( ), 𝐵≡( ), (4.63)
2 0 0 4
(𝐴 ⊗ 𝐵) ( |Ψ⟩ ⊗ |Φ⟩) . (4.64)
Do so in two ways:
Then compare your results and verify that they are the same.
Problem 4.17.
A. Prove that the tensor product preserves the adjoint operation on both vectors
and matrices. That is,
( |Ψ⟩ ⊗ |Φ⟩) = ⟨Ψ| ⊗ ⟨Φ| , (𝐴 ⊗ 𝐵) = 𝐴† ⊗ 𝐵† . (4.65)
B. Prove that the tensor product of two Hermitian operators is Hermitian, and the
tensor product of two unitary operators is unitary.
Problem 4.18. Consider the tensor product ℂ𝑚 ⊗ ℂ𝑛 for arbitrary 𝑚 and 𝑛. Show
that the standard basis of ℂ𝑚 ⊗ ℂ𝑛 is obtained by taking the tensor products of
the standard basis states of ℂ𝑚 and ℂ𝑛 .
where |+⟩ and |−⟩ are the +1 and −1 eigenstates of 𝜎𝑥 respectively, and |0⟩ is the
+1 eigenstate of 𝜎𝑧 (see section 4.2.1).
Exercise 4.20.
A. Calculate the tensor product operator
𝐴 ≡ 𝑆 𝑥 ⊗ 𝑆𝑧 , (4.67)
B. Calculate the tensor product state
where in each of these, the first state is the state of qubit 𝐴 and the second is
the state of qubit 𝐵. Thus |0⟩ ⊗ |0⟩ corresponds to |0⟩ for both qubits, |0⟩ ⊗ |1⟩
corresponds to |0⟩ for qubit 𝐴 and |1⟩ for qubit 𝐵, |1⟩ ⊗ |0⟩ corresponds to |1⟩ for
qubit 𝐴 and |0⟩ for qubit 𝐵, and |1⟩ ⊗ |1⟩ corresponds to |1⟩ for both qubits.
These four eigenstates have the following representations in terms of vectors in
ℂ4 :
1 0
⎜ 0 ⎞
⎛ ⎟ ⎜ 1 ⎞
⎛ ⎟
1 1 1 0
|0⟩ ⊗ |0⟩ = ( ) ⊗ ( ) = ⎜
⎟, |0⟩ ⊗ |1⟩ = ( ) ⊗ ( ) = ⎜
⎜ ⎟
⎟ , (4.70)
0 0 ⎜ 0 ⎟
⎟ 0 1 ⎜ 0 ⎟
⎜ ⎟
⎝ 0 ⎠ ⎝ 0 ⎠
0 0
⎜ ⎞ ⎛ ⎞
0 1 0 ⎟ 0 0 ⎜ 0 ⎟
|1⟩ ⊗ |0⟩ = ( ) ⊗ ( ) = ⎜
⎟, |1⟩ ⊗ |1⟩ = ( ) ⊗ ( ) = ⎜
⎜ ⎟
⎟ . (4.71)
1 0 ⎜ 1 ⎟
⎟ 1 1 ⎜ 0 ⎟
⎜ ⎟
⎝ 0 ⎠ ⎝ 1 ⎠
So we see that they are, in fact, just the standard basis of ℂ4 .
The most general state of both qubits is described as a superposition of all possible
⎜ 𝛼01 ⎞
|Ψ⟩ = 𝛼00 |0⟩ ⊗ |0⟩ + 𝛼01 |0⟩ ⊗ |1⟩ + 𝛼10 |1⟩ ⊗ |0⟩ + 𝛼11 |1⟩ ⊗ |1⟩ = ⎜
⎟ , (4.72)
⎜ 𝛼10 ⎟
⎝ 𝛼11 ⎠
where 𝛼00 , 𝛼01 , 𝛼10 , 𝛼11 ∈ ℂ and, of course, the coefficients should be chosen such
that the state is normalized to 1:
2 2 2 2
|𝛼00 | + |𝛼01 | + |𝛼10 | + |𝛼11 | = 1. (4.73)
We would now like to ask: when do the two qubits depend on each other? More
precisely, under what conditions can qubit 𝐴 be |0⟩ or |1⟩ independently of the
state of qubit 𝐵, and vice versa? As we will now see, this depends on the coeffi
cients 𝛼𝑖𝑗 .
A separable state is a state which can be written as just one tensor product
instead of a sum of tensor products, that is, a state of the form
where |Ψ𝐴 ⟩ is the state of qubit 𝐴 and |Ψ𝐵 ⟩ is the state of qubit 𝐵. If we can write
the state in this way, then we have separated the states from one another, in
the sense that whatever value |Ψ𝐴 ⟩ has is completely independent of the value of
|Ψ𝐵 ⟩ (and vice versa). In other words, the overall state of the composite system
is just the tensor product of the independent states of the individual systems.
A simple example of a separable state would be:
This just means that both qubits are, with 100% probability, in the state |0⟩:
|Ψ⟩ = (|0⟩ ⊗ |0⟩ + |0⟩ ⊗ |1⟩ + |1⟩ ⊗ |0⟩ + |1⟩ ⊗ |1⟩) . (4.77)
To see that it is separable, we simplify it using the distributive property, and get:
1 1
|Ψ⟩ = √ (|0⟩ + |1⟩) ⊗ √ (|0⟩ + |1⟩) . (4.78)
2 2
In other words, both qubits are in a state where either 0 or 1 is possible with 50%
probability, that is:
1 1
|Ψ𝐴 ⟩ = √ (|0⟩ + |1⟩) , |Ψ𝐵 ⟩ = √ (|0⟩ + |1⟩) . (4.79)
2 2
of an entangled state:
|Ψ⟩ = √ (|0⟩ ⊗ |1⟩ + |1⟩ ⊗ |0⟩) . (4.80)
No matter how much we try, we can never write it as just one tensor product; it
is always going to be the sum of two tensor products! This means that the state
of each qubit is no longer independent of the state of the other qubit. Indeed,
if qubit 𝐴 is in the state |0⟩ then qubit 𝐵 must be in the state |1⟩ (due to the
first term), and if qubit 𝐴 is in the state |1⟩ then qubit 𝐵 must be in the state |0⟩
(due to the second term). This is precisely what it means for two systems to be
More generally, consider again a composite system in the state
⎜ ⎞
|Ψ⟩ = 𝛼00 |0⟩ ⊗ |0⟩ + 𝛼01 |0⟩ ⊗ |1⟩ + 𝛼10 |1⟩ ⊗ |0⟩ + 𝛼11 |1⟩ ⊗ |1⟩ = ⎜
⎟ , (4.81)
⎜ 𝛼10 ⎟
⎝ 𝛼11 ⎠
where 𝛼00 , 𝛼01 , 𝛼10 , 𝛼11 ∈ ℂ. If it is separable, then we should be able to write it in
the form
|Ψ⟩ = (𝛽0 |0⟩ + 𝛽1 |1⟩) ⊗ (𝛾0 |0⟩ + 𝛾1 |1⟩) , (4.82)
|Ψ⟩ = 𝛽0 𝛾0 |0⟩ ⊗ |0⟩ + 𝛽0 𝛾1 |0⟩ ⊗ |1⟩ + 𝛽1 𝛾0 |1⟩ ⊗ |0⟩ + 𝛽1 𝛾1 |1⟩ ⊗ |1⟩ . (4.83)
So we should have:
𝛼𝑖𝑗 = 𝛽𝑖 𝛾𝑗 , 𝑖, 𝑗 ∈ {0, 1} , (4.84)
or explicitly:
Now, if 𝛼𝑖𝑗 are the components of a matrix34 ,
𝛼00 𝛼01
𝛼=( ), (4.88)
𝛼10 𝛼11
then the quantity 𝛼00 𝛼11 − 𝛼01 𝛼10 is called the determinant of the matrix, denoted
det 𝛼:
det 𝛼 ≡ 𝛼00 𝛼11 − 𝛼01 𝛼10 . (4.89)
We have proven that, if the composite state is separable (not entangled), then
the matrix of the coefficients has vanishing determinant. Below you will prove
that this also works in the opposite direction; thus, a composite state of two
qubits is separable if and only if det 𝛼 = 0.
Let us check this. The state in equation (4.75) is separable, since it has
det 𝛼 = 1 ⋅ 0 − 0 ⋅ 0 = 0. (4.90)
1 1 1 1
det 𝛼 = ⋅ − ⋅ = 0. (4.91)
2 2 2 2
However, the state in equation (4.80) is entangled, since it has
1 1 1
det 𝛼 = 0 ⋅ 0 − √ ⋅ √ = − ≠ 0. (4.92)
2 2 2
Unfortunately, this simple rule only works for a composite system of 2 qubits.
The problem of finding whether a given state of a composite system is separable
or entangled is called the separability problem, and it is, for general states, a
difficult problem to solve!
Problem 4.21. Prove that, for a composite state of two qubits given by
|Ψ⟩ = 𝛼00 |0⟩ ⊗ |0⟩ + 𝛼01 |0⟩ ⊗ |1⟩ + 𝛼10 |1⟩ ⊗ |0⟩ + 𝛼11 |1⟩ ⊗ |1⟩ , (4.93)
This is actually the matrix that would be obtained if, instead of writing the composite state of
two qubits as a vector in ℂ4 , we wrote it as the outer products of the qubits, which would be a 2 × 2
matrix. Explicitly, you can check that:
1 0 0 1 0 0 0 0
|0⟩ ⟨0| = ( ), |0⟩ ⟨1| = ( ), |1⟩ ⟨0| = ( ), |1⟩ ⟨1| = ( ), (4.86)
0 0 0 0 1 0 0 1
𝛼00 𝛼01
|Ψ⟩ = 𝛼00 |0⟩ ⟨0| + 𝛼01 |0⟩ ⟨1| + 𝛼10 |1⟩ ⟨0| + 𝛼11 |1⟩ ⟨1| = ( ). (4.87)
𝛼10 𝛼11
The reason we do not use the outer product representation for twoqubit states is that writing them
as vectors in ℂ4 allows us to act on them with operators given by 4 × 4 matrices, just as we would
act on single qubit with operators given by 2 × 2 matrices.
the state is separable if
𝛼00 𝛼01
det ( ) = 𝛼00 𝛼11 − 𝛼01 𝛼10 = 0. (4.94)
𝛼10 𝛼11
This is the opposite direction to what we proved above, which is that if the deter
minant is zero, then the state is separable.
Problem 4.22. Find two separable states and two entangled states of three
qubits, and prove that they are separable/entangled.
Let us define the Bell states, also known as35 EPR states:
1 𝑥
∣𝛽𝑥𝑦 ⟩ ≡ √ (|0⟩ ⊗ |𝑦⟩ + (−1) |1⟩ ⊗ |1 − 𝑦⟩) , 𝑥, 𝑦 ∈ {0, 1} . (4.95)
|𝛽00 ⟩ ≡ √ (|0⟩ ⊗ |0⟩ + |1⟩ ⊗ |1⟩) , (4.96)
|𝛽01 ⟩ ≡ √ (|0⟩ ⊗ |1⟩ + |1⟩ ⊗ |0⟩) , (4.97)
|𝛽10 ⟩ ≡ √ (|0⟩ ⊗ |0⟩ − |1⟩ ⊗ |1⟩) , (4.98)
|𝛽11 ⟩ ≡ √ (|0⟩ ⊗ |1⟩ − |1⟩ ⊗ |0⟩) . (4.99)
It is useful to adopt a shorthand notation where we write
|00⟩ ≡ |0⟩ ⊗ |0⟩ , |01⟩ ≡ |0⟩ ⊗ |1⟩ , |10⟩ ≡ |1⟩ ⊗ |0⟩ , |11⟩ ≡ |1⟩ ⊗ |1⟩ . (4.101)
|𝛽00 ⟩ ≡ √ (|00⟩ + |11⟩) , (4.102)
|𝛽01 ⟩ ≡ √ (|01⟩ + |10⟩) , (4.103)
EPR stands for Einstein, Podolsky, and Rosen.
|𝛽10 ⟩ ≡ √ (|00⟩ − |11⟩) , (4.104)
|𝛽11 ⟩ ≡ √ (|01⟩ − |10⟩) . (4.105)
The Bell states have important applications in quantum information and compu
tation, as we will see below.
Exercise 4.23. Write down the representations of the four Bell states as 4
vectors in ℂ2 ⊗ ℂ2 ≃ ℂ4 .
Problem 4.24. Prove that the four Bell states form an orthonormal basis for the
composite Hilbert space of two qubits, by showing that they span that space, are
linearly independent, are orthogonal, and are normalized to 1.
Problem 4.25. Prove that each of the four Bell states is entangled.
Exercise 4.26. Write down the four Bell states in terms of |+⟩ and |−⟩, the eigen
states of 𝜎𝑥 . You may wish to use the shorthand notation |±±⟩ ≡ |±⟩ ⊗ |±⟩.
Now that we have rigorously defined quantum entanglement, let us debunk the
most common misconception associated with it: that quantum entanglement al
lows us to transmit information, and in particular, that it allows us to do so faster
than the speed of light (or even instantaneously), in violation of relativity. This
is, in fact, not true.
To illustrate this, imagine the following scenario. Alice and Bob create an entan
gled pair of qubits, for example in the Bell state
|𝛽01 ⟩ ≡ √ (|01⟩ + |10⟩) . (4.106)
Alice takes the first qubit in the pair, and Bob takes the second qubit. Alice then
stays on Earth, while Bob embarks on a long journey to Alpha Centauri, about
4.4 light years away. When Bob gets there, he measures his qubit. He has a 50%
chance to observe 0 and a 50% chance to observe 1. However, if he observes 0
he knows that Alice will surely observe 1 whenever she measures her qubit, and
if he observes 1 he knows that Alice will surely observe 0, since the qubits must
have opposite values.
So it seems that Bob now knows something about Alice’s qubit that he did not
know before. Furthermore, he knows that instantly – even though Alice is 4.4
light years away, and thus according to relativity, that information should have
taken at least 4.4 years to travel between them. But has any information actually
been transferred between them?
The answer is no! All Bob did was observe a random event. Bob cannot control
which value he observes when he measures the qubit, 0 or 1; he can only observe
it, and randomly get whatever he gets. He gains information about Alice’s qubit,
which is completely random, but he does not receive any specific message from
Alice, nor can he transmit any specific information to Alice by observing his qubit.
In fact, there is a theorem called the nocommunication theorem which rigor
ously proves that no information can be transmitted using quantum entanglement,
whether faster than light or otherwise. Whatever you measure, it must be com
pletely random. (Unfortunately, the proof of this theorem uses some advanced
tools that we will not learn in this course, so we will not present it here.)
The fact that a measurement of one qubit determines the measurement of another
qubit might seem like it indicates that some information must be transmitted
between the qubits themselves, so that they “know” about each other’s states.
However, there isn’t any actual need to transmit information between the two
entangled qubits in order for them to match their measurements! After all, the
entangled state does not depend on the distance between the qubits, whether in
time or in space; it is simply the combined state of the two qubits, wherever or
whenever they might be.
Consider now the following completely classical scenario. Let’s say I write 0 on
one piece of paper and 1 on another piece of paper. I then put each piece of
paper in a separate sealed envelope, and randomly give one envelope to Alice
and the other to Bob. When Bob gets to Alpha Centauri, he opens his envelope.
If he sees 0 he knows that Alice’s envelope says 1, and if he sees 1 he knows that
Alice’s envelope says 0.
Obviously, this does not allow any information to be transmitted between Alice
and Bob, nor does each envelope need to “know” what’s inside the other envelope
in order for the measurements to match. If Bob sees 0, then the piece of paper
saying 0 was inside the envelope all along, and the piece of paper saying 1 was
inside Alice’s envelope all along – and vice versa. The envelopes are classically
correlated, and nothing weird is going on. What, then, is the difference between
this classical correlation and quantum entanglement? The answer to this question
can be made precise using Bell’s theorem, which we will now formulate.
Bell’s theorem proves that the predictions of quantum theory cannot be explained
by theories of local hidden variables, which we first mentioned in section 4.2.4.
These are deterministic theories, where measurements of quantum systems such
as qubits have preexisting values. For example, if we measured 0, then the
qubit always had the value 0; we could have, in fact, predicted the exact value
0, and not just the probability to measure it (which is what quantum theory can
predict), if we knew the value of a “hidden variable” that quantum theory does
not take into account.
Local hidden variable theories are essentially no different than the envelope sce
nario described above; the envelope always had the number 0 inside it, and if we
were able to look inside the envelope (at the “hidden variable”) without opening it,
we would have been able to make a deterministic prediction. In this sense, local
hidden variable theories have classical correlation, and Bell’s theorem proves that
quantum entanglement is different, and in a precise sense we will discuss below,
stronger than classical correlation.
Consider the following experiment. I prepare two qubits, and give one to Alice
and another to Bob. Alice can measure one of two different physical observables36
of her qubit, 𝑄 or 𝑅, both having two possible outcomes, +1 or −1. Similarly, Bob
can measure one of two different physical observables of his qubit, 𝑆 or 𝑇 , both
having two possible outcomes, +1 or −1.
We now make two crucial assumptions:
1. Locality: Both Alice and Bob measure their qubits at the same time in dif
ferent places, so that their measurements cannot possibly disturb or influ
ence each other without sending information faster than light. This ensures
that the predicted probabilities for Alice’s and Bob’s measurements are com
pletely independent of each other. This condition puts the “local” in “local
hidden variable theory”.
Together, these two assumptions form the principle of local realism. Classical rela
tivity definitely satisfies this principle; there are no fasterthanlight interactions,
and everything is deterministic. Local hidden variable theories also satisfy this
principle. Nonlocal hidden variable theories satisfy realism, but not locality.
Now, whatever the values of 𝑞, 𝑟, 𝑠, 𝑡 are, we must always have
𝑟𝑠 + 𝑞𝑠 + 𝑟𝑡 − 𝑞𝑡 = (𝑟 + 𝑞) 𝑠 + (𝑟 − 𝑞) 𝑡 = ±2. (4.107)
Alice could take, for example,𝑄 = 𝜎𝑧 and 𝑅 = 𝜎𝑥 – which is indeed what we will take below.
However, for our purposes, it doesn’t matter what the physical observables being measured actually
are. For that matter, the physical systems don’t need to be qubits, either; it’s just easier to talk
about qubits since they are the simplest nontrivial quantum systems. This scenario is very general,
and does not depend on any specific systems or observables, which is good since we are trying to
capture a general property of quantum theory.
To see that, note that since 𝑟 = ±1 and 𝑞 = ±1, we must either have 𝑟 + 𝑞 = 0 if
they have opposite signs, or 𝑟 − 𝑞 = 0 if they have the same sign. So one of the
terms must always vanish. In the first case we have (𝑟 − 𝑞) 𝑡 = ±2 because 𝑡 = ±1
and in the second case we have (𝑟 + 𝑞) 𝑠 = ±2 because 𝑠 = ±1.
Using this information, we can calculate the expectation value of this expression.
To do that, we assign a probability 𝑝 (𝑞, 𝑟, 𝑠, 𝑡) to each outcome of 𝑞, 𝑟, 𝑠, 𝑡. For
example, we could simply assign a uniform probability distribution, where all
probabilities are equal:
𝑝 (𝑞, 𝑟, 𝑠, 𝑡) = , (4.108)
for any values of 𝑞, 𝑟, 𝑠, 𝑡. However, the probability distribution can be arbitrary.
Even though we don’t know the probabilities in advance, we we can nonetheless
still calculate an upper bound on the expectation value:
≤2 ∑ 𝑝 (𝑞, 𝑟, 𝑠, 𝑡)
= 2.
To summarize, we have proven that in any locally realistic theory, the expectation
value considered here must be less than or equal to 2.
Now, let us assume that I prepared the two qubits in the following Bell state:
|𝛽11 ⟩ = √ (|01⟩ − |10⟩) . (4.111)
Alice gets the first qubit, and Bob gets the second qubit. We define the observ
ables 𝑄, 𝑅, 𝑆, 𝑇 in terms of the Pauli matrices. Alice measures the observables
𝑄 = 𝜎𝑧 , 𝑅 = 𝜎𝑥 , (4.112)
More precisely, there are many different Bell inequalities, and this specific one is called the
CHSH (ClauserHorneShimonyHolt) inequality.
while Bob measures the observables
1 1
𝑆 = − √ (𝜎𝑥 + 𝜎𝑧 ) , 𝑇 = − √ (𝜎𝑥 − 𝜎𝑧 ) . (4.113)
2 2
1 1
⟨𝑅𝑆⟩ = ⟨𝑄𝑆⟩ = ⟨𝑅𝑇 ⟩ = √ , ⟨𝑄𝑇 ⟩ = − √ , (4.114)
2 2
where we used the shorthand notation 𝑅𝑆 ≡ 𝑅⊗𝑆 and so on, and the expectations
values are calculated with respect to the state |𝛽11 ⟩. We thus get:
⟨𝑅𝑆⟩ + ⟨𝑄𝑆⟩ + ⟨𝑅𝑇 ⟩ − ⟨𝑄𝑇 ⟩ = 2 2 ≈ 2.8, (4.115)
Another important lesson of Bell’s theorem is that there is something fundamen
tally profound and powerful about quantum entanglement, which classical corre
lation does not have. This property of quantum entanglement is exactly what
makes quantum computers more powerful than classical computers, as we will
see below. It also has some other interesting applications, such as quantum tele
portation, which we will discuss later, and quantum cryptography, which we will
not discuss here (unless we have time at the end of the course).
|𝛽11 ⟩ = √ (|01⟩ − |10⟩) . (4.116)
Since |0⟩ and |1⟩ are the eigenstates of the observable 𝑆𝑧 corresponding to positive
and negative spin respectively along the 𝑧 direction (recall section 4.2.2), it is easy
to see that a measurement of spin along the 𝑧 direction will always yield opposite
spins for the qubits: if one qubit has positive spin in the 𝑧 direction (i.e. |0⟩), then
the other qubit must have negative spin in the 𝑧 direction (i.e. |1⟩). This state is
historically known as a spin singlet.
Now, let v∈ℝ3 be a unit vector pointing in some direction in space (the real space,
not the Hilbert space!). Then the observable 𝑆v defined in equation (4.33) cor
responds to a measurement of spin along the direction of v. Prove that if the
system is in the state |𝛽11 ⟩, then the measurement of spin along any direction
v will always yield opposite spins for the qubits: if one qubit has positive spin
along the direction v, then the other must have negative spin along the same
direction v.
This is remarkable, since it means if Alice measures her qubit on Earth and Bob
measures his qubit on Alpha Centauri at the same time, and both of them mea
sure spin along the same direction, then somehow both qubits must “know” to
have opposite spins along this direction, no matter which direction Alice and Bob
If the operators commute, then 𝐴𝐵 = 𝐵𝐴 and thus the commutator vanishes:
[𝐴, 𝐵] = 0. Otherwise, 𝐴𝐵 ≠ 𝐵𝐴 and the commutator is nonzero: [𝐴, 𝐵] ≠ 0. The
commutator thus tells us if the operators commute or not. Note that any operator
commutes with itself: [𝐴, 𝐴] = 0 for any 𝐴.
Problem 4.29. Prove that the commutator is antisymmetric:
[𝐴, [𝐵, 𝐶]] + [𝐵, [𝐶, 𝐴]] + [𝐶, [𝐴, 𝐵]] = 0. (4.124)
[𝐴, 𝐵] ≠ 0. (4.126)
Recall that we are using units where ℏ = 1!
Recall that the (square of the) standard deviation Δ𝐴 of 𝐴 is given by
2 2
(Δ𝐴) = ⟨(𝐴 − ⟨𝐴⟩) ⟩ . (4.127)
We have seen that expectation values in quantum theory are calculated using the
inner product “sandwich”
⟨𝐴⟩ = ⟨Ψ|𝐴|Ψ⟩, (4.128)
where |Ψ⟩ is the state with respect to which the expectation value is calculated.
The (square of the) standard deviation is thus
2 2
(Δ𝐴) = ⟨Ψ| (𝐴 − ⟨𝐴⟩) |Ψ⟩
= ⟨Ψ| (𝐴 − ⟨𝐴⟩) (𝐴 − ⟨𝐴⟩) |Ψ⟩.
and get
2 2
(Δ𝐵) = ⟨𝑏|𝑏⟩ = ‖𝑏‖ . (4.132)
the fact that ⟨𝑏|𝑎⟩ = ⟨𝑎|𝑏⟩∗ .
Next, we note that
where we used the linearity of the expected value, equation (3.176). Similarly,
and so we get
2 2 1
(Δ𝐴) (Δ𝐵) ≥ ⟨ [𝐴, 𝐵]⟩ . (4.135)
Now, by definition, Δ𝐴 and Δ𝐵 are real and nonnegative. If ⟨ 21i [𝐴, 𝐵]⟩ is also real,
we could take the square root (but we have to add an absolute value because it
could actually be negative):
Δ𝐴Δ𝐵 ≥ |⟨[𝐴, 𝐵]⟩| . (4.136)
You will show in problem 4.34 that it is indeed always real. Note that the un
certainty relation we found still depends on the choice of state |Ψ⟩ with which to
calculate the expected values and standard deviations, but sometimes, as in the
positionmomentum uncertainty relation, the same relation applies to all states.
As we will explain in more details later, when we discuss continuous systems, the
operators 𝑥 and 𝑝 have the commutator
[𝑥, 𝑝] = i . (4.137)
By plugging this commutator into the uncertainty relation (4.136), we indeed get
the familiar result
Δ𝑥Δ𝑝 ≥ . (4.138)
Problem 4.34. Inequalities are only defined for real numbers, not complex num
bers. Let us prove that if 𝐴 and 𝐵 are Hermitian, then ⟨[𝐴, 𝐵]⟩ must always be an
imaginary number, and thus ⟨ 21i [𝐴, 𝐵]⟩ is always real, so the inequality we found
is welldefined.
An antiHermitian operator 𝑂 is an operator which satisfies
𝑂† = −𝑂. (4.139)
Comment on the consequences of the relation you found for choices of different
states, that is, different values of 𝑎 and 𝑏.
Why is there uncertainty when two observers don’t commute? Some insight may
be gained from the fact that two Hermitian operators may be simultaneously
diagonalizable if and only if they commute41 .
Recall that in section 3.2.16 we proved that for any Hermitian matrix42 𝐴 there
exists a unitary matrix 𝑃 such that
𝑃 † 𝐴𝑃 = 𝐷, (4.142)
This is a special case of a more general theorem: a set of diagonalizable matrices commute if
and only if they are simultaneously diagonalizable. Of course, here we are dealing specifically with
Hermitian matrices, and such matrices are always diagonalizable; furthermore, for our purposes it
is enough to talk about two matrices rather than a larger set.
Or more generally for any normal matrix, which satisfies 𝐴† 𝐴 = 𝐴𝐴† . As we mentioned before,
both Hermitian and unitary matrices are special cases of normal matrices.
where 𝐷 is a diagonal matrix. Furthermore, the elements on the diagonal are
none other than the eigenvalues of 𝐴. This is called diagonalizing the matrix 𝐴.
Now, let 𝐴1 and 𝐴2 be two Hermitian matrices. We say that 𝐴1 and 𝐴2 are si
multaneously diagonalizable if both matrices are diagonalizable using the same
unitary matrix 𝑃 :
𝑃 † 𝐴1 𝑃 = 𝐷 1 , 𝑃 † 𝐴2 𝑃 = 𝐷 2 , (4.143)
𝐴1 = 𝑃 𝐷 1 𝑃 † , 𝐴2 = 𝑃 𝐷2 𝑃 † . (4.144)
[𝐴1 , 𝐴2 ] ≡ 𝐴1 𝐴2 − 𝐴2 𝐴1
= (𝑃 𝐷1 𝑃 † ) (𝑃 𝐷2 𝑃 † ) − (𝑃 𝐷2 𝑃 † ) (𝑃 𝐷1 𝑃 † )
= 𝑃 𝐷1 (𝑃 † 𝑃 ) 𝐷2 𝑃 † − 𝑃 𝐷2 (𝑃 † 𝑃 ) 𝐷1 𝑃 †
= 𝑃 𝐷1 𝐷2 𝑃 † − 𝑃 𝐷2 𝐷1 𝑃 †
= 𝑃 (𝐷1 𝐷2 − 𝐷2 𝐷1 ) 𝑃 †
= 𝑃 [𝐷1 , 𝐷2 ] 𝑃 † .
However, any two diagonal matrices commute with each other. Indeed, if
𝜆1 0 0 𝜇1 0 0
⎛ ⎞ ⎛ ⎞
𝐷1 ≡ ⎜
⎜ 0 ⋱ 0 ⎟⎟, 𝐷2 ≡ ⎜
⎜ 0 ⋱ 0 ⎟⎟, (4.145)
⎝ 0 0 𝜆𝑛 ⎠ ⎝ 0 0 𝜇𝑛 ⎠
𝜆1 𝜇1 0 0
⎛ ⎞
𝐷1 𝐷2 = 𝐷1 𝐷2 = ⎜
⎜ 0 ⋱ 0 ⎟
⎟. (4.146)
⎝ 0 0 𝜆𝑛 𝜇𝑛 ⎠
[𝐴1 , 𝐴2 ] = 0. (4.147)
in this case diagonalizes both matrices, has for its columns an orthonormal eigen
basis |𝐵𝑖 ⟩:
𝑃 = ( |𝐵1 ⟩ ⋯ |𝐵𝑛 ⟩ ) . (4.148)
By inspecting equation (4.143) and equation (4.145), we see that the basis states
|𝐵𝑖 ⟩ are eigenstates of both 𝐴1 and 𝐴2 , with the eigenvalues:
This means that the eigenstates |𝐵𝑖 ⟩ are states where the system simultane
ously has the exact value 𝜆𝑖 for the observable 𝐴1 and the exact value 𝜇𝑖 for the
observable 𝐴2 .
Conversely, since this is an ifandonlyif relationship, if 𝐴1 and 𝐴2 don’t commute,
then one cannot find a basis of eigenstates of both observables simultaneously
(since if we found such a basis, then they would be simultaneously diagonalizable,
in contradiction). This is essentially where the uncertainty principle comes from:
if 𝐴1 and 𝐴2 don’t commute and the system is in an eigenstate of 𝐴1 , then in
general it can’t also be in an eigenstate of 𝐴2 . This means it must instead be in a
superposition of eigenstates of 𝐴2 , so there are many different possible values
for the measurement of 𝐴2 with different probabilities. So being certain of the
value of 𝐴1 means being necessarily uncertain of the exact value of 𝐴2 .
Exercise 4.36.
A. Show that the following Hermitian operator commutes with the Pauli operator
𝜎𝑥 :
1 −3 0 1
𝐴≡( ), 𝜎𝑥 ≡ ( ). (4.150)
−3 1 1 0
We have covered almost all of the basic properties of quantum theory. How
ever, notice that so far we only talked about quantum systems that are in one
given state, and never change. In real life, physical systems change all the time,
whether it’s because some transformation was explicitly done to the system, or
simply because time has passed. To account for that in the mathematical frame
work of quantum theory, let us introduce a new axiom:
The Evolution Axiom: If the system is in the state |Ψ1 ⟩ at some point in time,
and in another state |Ψ2 ⟩ at another point in time, then the two states must be
related by the action of some unitary operator 𝑈 :
This has to be the case, since quantum states must have norm 1! So if we
start with a properly normalized quantum state, we end up with another properly
normalized quantum state.
Furthermore, recall that probabilities must sum to one. This means that, for an
orthonormal eigenbasis |𝐵𝑖 ⟩, we must have
∑ |⟨𝐵𝑖 |Ψ⟩| = 1, (4.154)
as we indeed proved in section 4.1.4. Again, since each of the probability am
plitudes ⟨𝐵𝑖 |Ψ⟩ is preserved by unitary evolution, we are guaranteed that the
probabilities still sum to 1 after the states have evolved.
Lastly, observe that since any unitary operator is invertible (with the inverse of
𝑈 being 𝑈 −1 = 𝑈 † ), any unitary transformation has an inverse transformation.
This means that unitary evolution is always reversible, and therefore quantum
mechanics has timereversal symmetry: it works exactly the same forwards in
time and backwards in time.
If at time 𝑡1 the system is in the state |Ψ1 ⟩ and at time 𝑡2 > 𝑡1 the system is in
the state |Ψ2 ⟩, then they are either related by |Ψ2 ⟩ = 𝑈 |Ψ1 ⟩, evolving forward in
time, or |Ψ1 ⟩ = 𝑈 † |Ψ2 ⟩ for the same 𝑈 , evolving backwards in time. As far as
quantum mechanics is concerned, there is no distinction between the future and
the past, and everything works the same if we take 𝑡 ↦ −𝑡 so that 𝑡2 < 𝑡1 , as long
as we also replace every unitary evolution operator by its adjoint.
1 1
|Ψ1 ⟩ = √ ( ). (4.155)
5 2i
Which unitary operator 𝑈 was responsible for this evolution (such that |Ψ2 ⟩ =
𝑈 |Ψ1 ⟩)? What will be the state of the system after the same amount of time has
passed again (i.e. after another evolution with 𝑈 )?
In a classical computer, bits are manipulated using logic gates. In logic terms,
these gates treat 0 as “false” and 1 as “true”. Let us list some examples of logic
NOT gets a single bit as input, and outputs 1 minus that bit. In logic terms, it
outputs “true” if it gets “false” and vice versa, so the output is the negation of
the input:
Input NOT
0 1
1 0
AND gets two bits as input, and outputs 1 if both bits are 1, otherwise it outputs
0. In logic terms, it outputs “true” only if both bit A and bit B are “true”:
Input A Input B AND
0 0 0
0 1 0
1 0 0
1 1 1
OR gets two bits as input, and outputs 1 if at least one of the bits is 1, otherwise
it outputs 0. In logic terms, it outputs “true” if either bit A or bit B or both are
Input A Input B OR
0 0 0
0 1 1
1 0 1
1 1 1
XOR (eXclusive OR, pronounced “ex or”) gets two bits as input, and outputs 1
if exactly one of the bits is 1, otherwise it outputs 0. In logic terms, it outputs
“true” if either bit A or bit B, but not both, are “true”:
In quantum computers we have qubits instead of classical bits, and thus we must
use quantum logic gates, or quantum gates for short. Since they transform qubits
from one state to the other, quantum gates must take the form of unitary opera
tors, by the Evolution Axiom.
As a simple example, let us define the quantum NOT gate, which flips |0⟩ ↔ |1⟩,
just like a classical NOT gate flips 0 ↔ 1. This gate is none other than the Pauli
matrix 𝜎𝑥 , which is of course unitary:
0 1
NOT ≡ 𝑋 ≡ 𝜎𝑥 = ( ). (4.157)
1 0
(The notation 𝑋 for the NOT gate is common in quantum computing.) Indeed, we
0 1 1 0
NOT |0⟩ = ( ) ( ) = ( ) = |1⟩ , (4.158)
1 0 0 1
0 1 0 1
NOT |1⟩ = ( ) ( ) = ( ) = |0⟩ . (4.159)
1 0 1 0
Since unitary transformations are linear, this means that for a general qubit state
we have
NOT (𝑎 |0⟩ + 𝑏 |1⟩) = 𝑎 |1⟩ + 𝑏 |0⟩ , (4.160)
2 2
where of course |𝑎| + |𝑏| = 1.
In classical computers there is only one nontrivial singlebit gate, the NOT gate;
the two other options would be the gate 0 ↦ 0, 1 ↦ 0 and the gate 0 ↦ 1, 1 ↦ 1,
which are trivial gates since their output is fixed and does not depend on the
input. However, in quantum computers, since qubits are in a superposition of |0⟩
and |1⟩, we have more options; in fact, we have an infinite number of possible
singlequbit gates, since any unitary operator can be a singlequbit gate.
One example of a useful quantum gate is the 𝑍 gate, which is just the Pauli matrix
𝜎𝑧 :
1 0
𝑍 ≡ 𝜎𝑧 = ( ), (4.161)
0 −1
1 0 1 1
𝑍 |0⟩ = ( ) ( ) = ( ) = |0⟩ , (4.162)
0 −1 0 0
1 0 0 0
𝑍 |1⟩ = ( )( ) = ( ) = − |1⟩ , (4.163)
0 −1 1 −1
1 1 1
𝐻≡√ ( ), (4.164)
2 1 −1
which turns |0⟩ and |1⟩ (the eigenstates of 𝜎𝑧 ) into |+⟩ and |−⟩ respectively (the
eigenstates of 𝜎𝑥 ):
1 1 1 1 1
𝐻 |0⟩ = √ ( ) ( ) = √ (|0⟩ + |1⟩) = |+⟩ , (4.165)
2 1 −1 0 2
1 1 1 0 1
𝐻 |1⟩ = √ ( ) ( ) = √ (|0⟩ − |1⟩) = |−⟩ . (4.166)
2 1 −1 1 2
What about twoqubit gates? Notice that classical twobit gates such as AND, OR,
and XOR are irreversible, since if we are given the single output bit of any of
these gates, we cannot in general reconstruct the two input bits. For example,
if AND outputs 0, then the inputs could have been any of 00, 01, or 10. In
contrast, quantum gates must be represented by unitary operators, and as we
saw in section 4.5.1, unitary transformations are reversible. Thus we cannot
use AND, OR, XOR, and other irreversible logic gates in quantum computing.
We can, however, define other twoqubit quantum gates. A very useful example
is the controlledNOT or CNOT gate. Here, the first qubit controls whether the
second qubit gets flipped or not. If the first qubit is |0⟩, then the second qubit is
unchanged; if the first qubit is |1⟩, then the second qubit is flipped |0⟩ ↔ |1⟩. So,
given an input state of two qubits, we have:
As you will verify in exercise 4.38, the CNOT gate can be represented by the
unitary matrix
1 0 0 0
⎜ ⎞
0 1 0 0
CNOT = ⎜⎜
⎟ . (4.171)
⎜ 0 0 0 1 ⎟
⎝ 0 0 1 0 ⎠
Alternatively, as you will verify in exercise 4.39, the CNOT gate can be represented
by a tensor product of outer products:
CNOT = |0⟩ ⟨0| ⊗ ( |0⟩ ⟨0| + |1⟩ ⟨1|) + |1⟩ ⟨1| ⊗ ( |0⟩ ⟨1| + |1⟩ ⟨0|) . (4.172)
Exercise 4.38. Verify that the matrix definition of the CNOT operator given in
equation (4.171) indeed has the action described in equations (4.167), (4.168),
(4.169), and (4.170).
Exercise 4.39. Verify that the CNOT operator has the outer product representa
CNOT = |0⟩ ⟨0| ⊗ ( |0⟩ ⟨0| + |1⟩ ⟨1|) + |1⟩ ⟨1| ⊗ ( |0⟩ ⟨1| + |1⟩ ⟨0|) . (4.173)
Problem 4.42. Show how you can generate each of the four entangled Bell states
by acting on the separable state |0⟩ ⊗ |0⟩ with various quantum gates. This means
that quantum gates can be used to generate entanglement if it’s not already
To correct that, we now replace the Probability Axiom with a new and improved
axiom, which we call the Measurement Axiom. In order to formulate it, let us
recall that in problem 3.55 you proved that if 𝐴 is normal (so in particular, if it is
Hermitian and thus an observable), then it has the outer product representation
𝐴 = ∑ 𝜆𝑖 |𝐵𝑖 ⟩ ⟨𝐵𝑖 | , (4.174)
where |𝐵𝑖 ⟩ is an orthonormal eigenbasis and 𝜆𝑖 are the eigenvalues of the eigen
states |𝐵𝑖 ⟩. More generally, for any observable we can write
𝐴 = ∑ 𝜆𝑖 𝑃𝑖 , (4.175)
where 𝑃𝑖 is the projector onto the vector space of the eigenvectors correspond
ing to the eigenvalue 𝜆𝑖 , called the eigenspace of 𝜆𝑖 (see problem 4.43). Using
projectors allows us to:
1. Deal with the case of degenerate eigenvectors, where two eigenvectors have
the same eigenvalue; so far we have always implicitly assumed that observ
ables do not have any degenerate eigenvectors. A trivial example of an
operator with degenerate eigenvalues is the identity matrix 1, which has
only one eigenvalue – namely, 1 – for which every vector in the space is an
2. Measure only part of a composite Hilbert space, for example one qubit in a
composite system of two qubits, as we will see below.
In the simple case where there is no degeneracy of eigenvectors and the mea
surement is performed on the entire Hilbert space, the projector can take the
simple form
𝑃𝑖 ≡ |𝐵𝑖 ⟩ ⟨𝐵𝑖 | , (4.176)
and we recover equation (4.174). Using projectors, we can now define a very
general Measurement Axiom, which employs socalled projective measurements.
The Measurement Axiom (Projective): Consider an observable 𝐴 of the form
𝐴 = ∑ 𝜆𝑖 𝑃𝑖 . (4.177)
If the system is in the state |Ψ⟩, then the probability to measure the eigenvalue
𝜆𝑖 is given by
⟨Ψ|𝑃𝑖 |Ψ⟩. (4.178)
The measurement yields exactly one of the eigenvalues 𝜆𝑖 , and after the mea
surement, the system collapses to the state43
𝑃𝑖 |Ψ⟩
|Ψ⟩ ↦ , (4.179)
√⟨Ψ|𝑃𝑖 |Ψ⟩
Exercise 4.44. Find the eigenvalues of the CNOT operator (4.171) and their
corresponding eigenvectors and eigenspaces.
Notice that the square root of the probability is not necessarily the probability amplitude. For
example, if the amplitude is i /2 then the probability is 1/4, but the square root of that is 1/2, which
is not the amplitude we started with! However, recall that the two vectors |Ψ⟩ and ei 𝜙 |Ψ⟩, which
differ by an overall complex phase ei 𝜙 , represent the same state. Since the square root of the
probability is the same as the amplitude up to a complex phase, dividing by i /2 or 1/2 both result
in the same state.
4.5.4 Applications of the Measurement Axiom
Let us now see some examples of the Measurement Axiom in action. First of all,
consider a qubit in the general state
2 2
|Ψ⟩ = 𝑎 |0⟩ + 𝑏 |1⟩ , |𝑎| + |𝑏| = 1. (4.180)
The observable corresponding to the eigenbasis |0⟩ , |1⟩ is the Pauli matrix 𝜎𝑧 , which
has the outer product representation
1 0
𝜎𝑧 = ( ) = |0⟩ ⟨0| − |1⟩ ⟨1| . (4.181)
0 −1
This indeed matches the old Probability Axiom. The new part is that after the
measurement, if we measured 0, then the system will collapse to the state
𝑃0 |Ψ⟩ |0⟩⟨0|Ψ⟩ 𝑎
|Ψ⟩ ↦ = = |0⟩ ≃ |0⟩ , (4.186)
√⟨Ψ|𝑃0 |Ψ⟩ |𝑎| |𝑎|
where by ≃ we mean that |𝑎| |0⟩ and |0⟩ are the same state, since they only differ
by a complex phase (see footnote (43); in polar coordinates we have 𝑎 = |𝑎| ei 𝜙
where ei 𝜙 is the phase of 𝑎, so if we divide 𝑎 by |𝑎| we are left with just the phase).
Note that I decided to start counting 𝑖 from 0 to 1 instead of from 1 to 2, so that the subscript
of 𝜆𝑖 will correspond to the value of the qubit. Also, recall that the eigenvalue of |0⟩ is not 0, it’s
−1, and the eigenvalue of |1⟩ is not 1, it’s +1; this is confusing, but unfortunately it’s standard
notation, since qubits are analogous to classical bits which have the values 0 and 1.
Similarly, if we measured 1 then the system will collapse to the state
𝑃1 |Ψ⟩ |1⟩⟨1|Ψ⟩ 𝑏
|Ψ⟩ ↦ = = |1⟩ ≃ |1⟩ . (4.187)
√⟨Ψ|𝑃1 |Ψ⟩ |𝑏| |𝑏|
Consider now the general composite state of two qubits given in equation (4.72):
|Ψ⟩ = 𝛼00 |0⟩ ⊗ |0⟩ + 𝛼01 |0⟩ ⊗ |1⟩ + 𝛼10 |1⟩ ⊗ |0⟩ + 𝛼11 |1⟩ ⊗ |1⟩ , (4.188)
Let us first calculate45 the action of the operator 𝑃0 = |0⟩⟨0| ⊗ 1 on the ket |Ψ⟩:
= (|0⟩⟨0| ⊗ 1) (𝛼00 |0⟩ ⊗ |0⟩ + 𝛼01 |0⟩ ⊗ |1⟩ + 𝛼10 |1⟩ ⊗ |0⟩ + 𝛼11 |1⟩ ⊗ |1⟩)
= |0⟩ ⊗ (𝛼00 ⟨0|0⟩ |0⟩ + 𝛼01 ⟨0|0⟩ |1⟩ + 𝛼10 ⟨0|1⟩ |0⟩ + 𝛼11 ⟨0|1⟩ |1⟩)
since |0⟩ and |1⟩ form an orthonormal basis, so ⟨0|0⟩ = ⟨1|1⟩ = 1 and ⟨0|1⟩ = ⟨1|0⟩ = 0.
Then we act with the bra ⟨Ψ| from the left:
= (𝛼∗00 ⟨0| ⊗ ⟨0| + 𝛼∗01 ⟨0| ⊗ ⟨1| + 𝛼∗10 ⟨1| ⊗ ⟨0| + 𝛼∗11 ⟨1| ⊗ ⟨1|) (|0⟩ ⊗ (𝛼00 |0⟩ + 𝛼01 |1⟩))
= 𝛼∗00 ⟨0| (𝛼00 |0⟩ + 𝛼01 |1⟩) + 𝛼∗01 ⟨1| (𝛼00 |0⟩ + 𝛼01 |1⟩)
2 2
= |𝛼00 | + |𝛼01 | .
Similarly, we also find that the probability to measure 1 for the first qubit is
2 2
⟨Ψ|𝑃1 |Ψ⟩ = ⟨Ψ| (|1⟩⟨1| ⊗ 1) |Ψ⟩ = |𝛼10 | + |𝛼11 | . (4.193)
These very complicated calculations tell us what we could have just guessed from
common sense: the total probability to measure |0⟩ is the sum of the probabilities
to measure all the composite states which have |0⟩ as the state of the first qubit,
and similarly for |1⟩.
What about collapse? If we measured 0, then the system will collapse to the state
Again, we could have just guessed the result: the qubit that we measured col
lapses into either |0⟩ or |1⟩, while the other qubit stays in a superposition. The
denominator is there simply to normalize the vector so it has norm 1, and can
thus represent a state.
Problem 4.45. Consider a composite system of three qubits. Which projectors
will you use to measure only the state of the middle qubit in the |+⟩ , |−⟩ eigen
basis? Which projectors will you use to measure only the state of the first two
qubits in the |0⟩ , |1⟩ eigenbasis?
measurements, and will be sufficient for our purposes in the rest of this course.
The Measurement Axiom (Simplified):
After the measurement, if the eigenvalue 𝜆𝑖 was measured, then the system
will collapse to the eigenstate |𝐵𝑖 ⟩:
This works the same whether the system in question is composite or not,
provided that the measurement is performed on the entire system at once.
The process described by the Measurement Axiom, where the state of the sys
tem changes after a measurement, is what people mean when they talk about
wavefunction collapse. However, we haven’t yet defined what a “wavefunction” is.
This is because in the modern abstract formulation of quantum mechanics, which
is what we have been studying so far, states are the fundamental entities, not
wavefunctions. We will explain this in more detail when we define wavefunctions
in section 5.5.
|Ψ⟩ = √ (2 |00⟩ − i |10⟩ + 3 |11⟩) . (4.198)
A measurement is performed on only the first qubit, in the |+⟩ , |−⟩ eigenbasis.
For each of the two possible outcomes, what is the probability to measure that
outcome and what will be the state of the system after the measurement?
Problem 4.47. A composite system of three qubits is in the state
|Ψ⟩ = √ (|000⟩ + 2 |010⟩ − 3 i |011⟩ − 4 |101⟩ + i |110⟩ + 2 i |111⟩) , (4.199)
where |000⟩ ≡ |0⟩⊗|0⟩⊗|0⟩ and so on. A measurement is performed on only the first
two qubits in the |0⟩ , |1⟩ basis. For each of the four possible outcomes, what is
the probability to measure that outcome and what will be the state of the system
after the measurement? You can either solve this problem by inspection using
the simplified axiom, or by explicit calculation using the projectors you found in
problem 4.45.
If you consider the collapse process carefully, you will realize that it is actually
incompatible with the Evolution Axiom. This is because the collapse is a type of
time evolution: the system was in the state |Ψ⟩ before the measurement, and will
be in one of the eigenstates |𝐵𝑖 ⟩ after the measurement. However, this evolution
is not unitary, because it is not invertible.
Given the probabilistic nature of the measurement, the information that the sys
tem is currently in the eigenstate |𝐵𝑖 ⟩ is not enough to reconstruct the state |Ψ⟩ of
the system before the measurement, which was a superposition of all the eigen
states |𝐵1 ⟩ , |𝐵2 ⟩ , … , |𝐵𝑛 ⟩. The information about the coefficients of each eigenstate
in the superposition is lost forever.
This incompatibility, and more generally our failure to understand the exact nature
of measurement and collapse in quantum mechanics, is called the measurement
problem. Many physicists believe that quantum theory will remain fundamentally
incomplete until we manage to solve the measurement problem, and this is an
area of active research. The current approaches towards solving this problem
largely fall into several distinct groups, which more or less coincide with specific
interpretations of quantum mechanics. Let us list some of them.
“Shut up and calculate”: This approach simply ignores the measurement prob
lem. It is not necessarily associated with any particular interpretation, since it
doesn’t care about trying to interpret the theory in the first place. However, one
could associate it with the Copenhagen interpretation, the earliest interpretation
of quantum mechanics, which essentially just accepts the Measurement Axiom
at face value, without attempting to explain why there is a collapse. This inter
pretation regards quantum states as merely a tool to calculate probabilities, and
ignores questions like “what was the spin of the particle before I measured it”.
This approach is, by far, the most popular one among physicists, with a recent
survey indicating that around a third of physicists subscribe to the Copenhagen
interpretation and another third don’t have any preferred interpretation. How
ever, this definitely doesn’t mean it is the “best” approach. It is popular simply
because in practice, as long as quantum mechanics enables us to make accurate
predictions, it doesn’t matter how (or even if) the collapse happens.
The applications of quantum mechanics to theoretical, experimental, and applied
physics, as well as to other fields of science and technology, do not require us
to solve the measurement problem. However, as practical as this approach is,
adopting it means ignoring deep and fundamental questions about the nature of
reality which, if answered, could have farreaching consequences.
There is no collapse: This approach claims that collapse does not actually hap
pen. The most wellknown example of this approach is the Everett or “many
worlds” interpretation, which gets rid of the collapse by considering the state of
every system to be part of a huge composite state which describes the entire
universe. Measurements then simply correspond to entangling two parts of that
composite state – the system being measured, and the observer. Instead of a col
lapse, the observer is now in a superposition of having measured each eigenvalue.
For example, if I measured a qubit, I will then be in a superposition of “I mea
sured 0” and “I measured 1”. This process is completely unitary (and invertible),
thus there is no collapse and no incompatibility with the Evolution Axiom.
It is a common misconception that the name “many worlds” means measurements
somehow “create” new “parallel universes”, one for each measurement outcome.
What really happens is that there is just one universe, but that universe is in
a superposition of many different possibilities – the sum total of every single
superposition of every individual system since the Big Bang. For example, a toy
universe made of 𝑛 qubits will be in a superposition of 2𝑛 different possibilities or
“parallel universes”. However, it’s important to stress that the defining property
of this interpretation is not the “many worlds” part – it is the “no collapse” part!
Let’s see how exactly this works. Say Alice is measuring a qubit. The individual
states of the qubit and Alice before the measurement are
The composite state of both of them together before the measurement is thus
|Ψ1 ⟩ ≡ |qubit⟩ ⊗ |Alice⟩ = (𝑎 |0⟩ + 𝑏 |1⟩) ⊗ |Alice hasn’t measured yet⟩ . (4.201)
Notice that |Ψ1 ⟩ is separable – it is just a tensor product of the state of the qubit
with the state of Alice, and those states are independent of each other.
After the qubit is measured, the system undergoes evolution with a unitary oper
ator 𝑈 into:
|Ψ2 ⟩ ≡ 𝑈 |Ψ1 ⟩ , (4.202)
|Ψ2 ⟩ = 𝑎 |0⟩ ⊗ |Alice measured 0⟩ + 𝑏 |1⟩ ⊗ |Alice measured 1⟩ . (4.203)
Intuitively, we can see that this evolution is unitary because it works similarly to a
CNOT gate; 𝑈 essentially checks the state of the qubit, and changes Alice’s state
accordingly. In problem 4.49 you will find the exact form of this unitary operator.
We can see that the new state |Ψ2 ⟩ is entangled – the states of the qubit and
Alice are now correlated.
We can think of each term in the superposition as a different “parallel universe”
or “world”, but this isn’t quite the same as the typical (incorrect) sciencefiction
treatment of the manyworlds interpretation, since the two versions of Alice, the
Alice who measured 0 and the Alice who measured 1, can never communicate
with each other, and there is no sense in which you can “travel” from one “parallel
universe” to another – since you can’t change which term in the superposition you
are in!
Crucially, notice that in the calculation we did above, there is no collapse. It
looks like there is a collapse from the point of view of each of the Alices, since
the Alice who measured 0 can only access the qubit in the state |0⟩ (with which
she is entangled) and the Alice who measured 1 can only access the qubit in the
state |1⟩. However, the overall state of the qubit and Alice (and more broadly, of
the entire universe) in fact evolves in a way that is perfectly compatible with the
Evolution Axiom, and at no point does it reduce to a single eigenstate.
This interpretation is probably the most popular among the approaches which are
not Copenhagen or “shut up and calculate”. This is perhaps due to its simplicity
– it does not introduce any new assumptions, as most other interpretations do,
and in fact it even gets rid of an assumption, namely the collapse part of the
Measurement Axiom, so it arguably makes quantum theory even simpler.
However, it has several unresolved issues. One of its main problems is that it is
unclear where exactly probabilities come from. If I split into several observers
after the measurement, and the different versions of me collectively measured
every single possible outcome of the measurement, then why is the probability for
me to find myself as one observer different from the probability to find myself as
another observer? And what does this probability have to do with the coefficients
of the superposition?
Hidden variables: This approach is associated with interpretations such as
De Broglie–Bohm theory, which we already mentioned in section 4.2.4 and sec
tion 4.3.6 in the context of nonlocality.
To remind you, theories of hidden variables involve adding supplemental variables
which make the theory deterministic “behind the scenes”, but we can’t actually
know the values of these variables and use them to make deterministic predic
tions, since they’re “hidden”. As the system is deterministic, there is no collapse.
One serious problem with this approach is, as we discussed earlier, that theories of
hidden variables tend to be complicated, and many physicists find them contrived
and adhoc. Therefore, if we subscribe to the principle of Occam’s razor, which
states that theories with less assumptions should be preferred, we should discard
hidden variables in favor of simpler interpretations.
Collapse models: This approach modifies quantum mechanics by adding an ac
tual physical mechanism for collapse. This can be done by assuming that there
is a more general type of evolution, which is compatible with both unitary evo
lution and collapse. Collapse models have the same problem as hidden variable
theories; they require additional assumptions and more complicated equations,
which are not necessarily justified except in that they give the desired results.
For example, one collapse model, the GRW model, assumes that quantum sys
tems collapse spontaneously – at random, without any relation to measurements.
This happens very rarely, but when you have a big enough composite system
with a very large number of subsystems, it happens frequently enough to explain
Problem 4.49. Find the unitary operator 𝑈 in equation (4.202). Treat Alice as a
3state system with an orthonormal basis
Suppose that, inside a box, there is a cat and a qubit in the state |+⟩:
|+⟩ = √ (|0⟩ + |1⟩) . (4.207)
Figure 4.2: Schrödinger’s Cat. Source: Found via Google Image
Search, original source unknown.
system has many orders of magnitude more than two dimensions.
Now, even a qubit, which is described by a 2dimensional Hilbert space, is al
ready extremely fragile. As soon as it interacts with the environment, it gets
entangled with it, and loses its superposition and other quantum properties in a
process called quantum decoherence. This is one of the reasons it is so hard to
build quantum computers: qubits will inevitably interact with the environment,
since they cannot be completely isolated. There is a certain time, called the
decoherence time, after which different physical realizations of qubits undergo
decoherence; the time it takes the quantum gate to operate must be shorter
than the decoherence time.
It should therefore not be a surprise that the cat, which is incredibly more com
plicated, is also incredibly harder to keep in a superposition. The cat is still a
quantum system, just like anything else in the universe, but it is so complicated,
that it can’t be in arbitrary states. Instead, with almost certain probability, it will
be in one of the states |dead⟩ or |alive⟩.
Finally, let us address two common misconceptions about Schrödinger’s cat. The
first one (which is also a misconception about quantum mechanics in general) is
that a conscious observer is needed to collapse the cat into being alive or dead.
In fact, consciousness plays no role whatsoever in quantum mechanics! There is
nothing special about conscious observers that unconscious measurement devices
do not have. In both cases, the interaction of the quantum system with a larger
system – whether it’s a human or a particle detector – causes it to undergo
decoherence and appear classical.
The second misconception occurs when Schrödinger’s cat is invoked in any situ
ation where the state of something is unknown until it is measured. Usually this
takes the form of “Schrödinger’s X” for some X. For example, I heard the term
“Schrödinger’s millionaire” being used to describe someone who has a lottery
ticket which they have not yet checked to see if it’s the winning ticket; there
fore, that person is “both a millionaire and not a millionaire until the ticket is
checked”. However, the fact that you don’t know the state of something until
you measure it is completely trivial, and has nothing to do with Schrödinger’s cat,
or even with quantum mechanics in general. The purpose of the Schrödinger’s
cat thought experiment is to illustrate the difference between the classical and
quantum worlds.
tum state as many times as we want; all we need to do is repeat whatever process
is known to generate that state. However, if someone gives you an unknown
quantum state |Ψ⟩ and doesn’t tell you anything about it, the nocloning theorem
states that you will never be able to make another copy of |Ψ⟩.
To prove the theorem, let us assume that we have a “copying operator” 𝑈 which
gets a tensor product of two states as input, and copies the state from the first
slot into the second slot:
The second state |?⟩ in the input can be anything – it doesn’t matter what it was
originally, since it will be overwritten with the state |Ψ⟩ that we are copying.
We are looking for a universal copying operator, which can copy any state |Ψ⟩,
even if we don’t know in advance what the state is. If this operator only works for
a specific state |Ψ⟩, that means we must know what |Ψ⟩ is in advance, in order
to choose the specific 𝑈 that copies it. Let us use 𝑈 to copy two states, |Ψ1 ⟩ and
|Ψ2 ⟩:
𝑈 ( |Ψ1 ⟩ ⊗ |?⟩) = |Ψ1 ⟩ ⊗ |Ψ1 ⟩ , (4.210)
We can take the inner product of the last two equations by turning the second
equation into a bra:
On the righthand side, we have ⟨Ψ2 |Ψ1 ⟩⟨Ψ2 |Ψ1 ⟩ = ⟨Ψ2 |Ψ1 ⟩2 :
Finally, even though we haven’t specified the state |?⟩ (since we don’t care what
it is), we still know it must be normalized such that ⟨?|?⟩ = 1, since otherwise it
won’t be a proper state. Therefore, we obtain:
This is a quadratic equation, so it has two solutions:
• The first solution is ⟨Ψ2 |Ψ1 ⟩ = 1, in which case the states must be the same
state: |Ψ1 ⟩ = |Ψ2 ⟩. So 𝑈 is a copying operator that can only copy one specific
state, in contradiction with our requirement above that 𝑈 is universal.
• The second solution is ⟨Ψ2 |Ψ1 ⟩ = 0, in which case |Ψ1 ⟩ and |Ψ2 ⟩ must be
orthogonal. Again, this means that 𝑈 cannot be universal, since it can only
copy states that are orthogonal to a specific state, and thus we cannot clone
an unknown quantum state.
where we again used the shorthand notation |𝑥𝑦⟩ ≡ |𝑥⟩ ⊗ |𝑦⟩. Alice takes the first
qubit, Bob takes the second, and they go their separate ways. In this entangled
state, if Alice measures 0, Bob will also measure 0, and if Alice measures 1, Bob
will also measure 1.
Later, Alice receives an arbitrary qubit
2 2
|Ψ⟩ = 𝑎 |0⟩ + 𝑏 |1⟩ , |𝑎| + |𝑏| = 1, (4.218)
but she does not know the state of the qubit, that is, the coefficients 𝑎 and 𝑏.
Alice needs to transfer this unknown qubit in its entirety to Bob using only two
classical bits. This seems impossible, for two different reasons:
1. The exact state of the qubit is determined by the two arbitrary complex
numbers 𝑎 and 𝑏. Even if Alice did know the values of these numbers, trans
ferring that information requires much more than two classical bits – in fact,
to transmit the precise value of an arbitrary complex (or even real) number,
an infinite number of bits are required.
2. Even if Alice was somehow able to magically transmit two complex numbers
using only two classical bits, there is no way she could determine the values
of 𝑎 and 𝑏 in the first place. Any measurement that Alice makes on her qubit
will simply result in either 0 or 1; it does not tell Alice anything about the
probabilities, not to mention the probability amplitudes. To get information
about the probabilities, Alice must make a large number of measurements
(in fact, an infinite number of them, if she wants to know the precise values of
the probabilities). However, this is impossible due to the nocloning theorem;
Alice can only measure the qubit once, and that’s it.
To make the impossible possible, Alice can use the fact that her half of the Bell
state is entangled with Bob’s half. All three qubits can be represented together
by the composite state
Here the first qubit is the one that is to be teleported from Alice to Bob, the second
is Alice’s half of the Bell state, and the third is Bob’s half.
First, Alice sends the first qubit (the unknown qubit |Ψ⟩) and the second qubit (her
half of the Bell state) through a CNOT gate, which as you recall, flips the second
qubit only if the first qubit is |1⟩:
CNOT1,2 |𝛾⟩ = √ (𝑎 (|000⟩ + |011⟩) + 𝑏 (|110⟩ + |101⟩))
= √ (𝑎 |0⟩ ⊗ (|00⟩ + |11⟩) + 𝑏 |1⟩ ⊗ (|10⟩ + |01⟩)) .
Here we used the notation CNOT1,2 to indicate that the gate only acts on qubits 1
and 2 out of the three qubits. Explicitly, this would be the tensor product of the
CNOT gate on the left with the 2 × 2 identity matrix on the right:
1 0 0 0
⎜ ⎞
0 1 0 0 1 0
CNOT1,2 ≡ CNOT ⊗ 1 = ⎜
⎟ ⊗( ). (4.220)
⎜ 0 0 0 1 ⎟ 0 1
⎝ 0 0 1 0 ⎠
Next, she sends the first qubit through the Hadamard gate, which as you recall,
√ √
takes |0⟩ to |+⟩ ≡ (|0⟩ + |1⟩) / 2 and |1⟩ to |−⟩ ≡ (|0⟩ − |1⟩) / 2:
𝐻1 ⋅ CNOT1,2 |𝛾⟩ = √ (𝑎 |+⟩ ⊗ (|00⟩ + |11⟩) + 𝑏 |−⟩ ⊗ (|10⟩ + |01⟩))
= (𝑎 (|0⟩ + |1⟩) ⊗ (|00⟩ + |11⟩) + 𝑏 (|0⟩ − |1⟩) ⊗ (|10⟩ + |01⟩))
= (𝑎 ((|000⟩ + |011⟩) + |100⟩ + |111⟩) + 𝑏 (|010⟩ + |001⟩ − |110⟩ − |101⟩))
= 𝑎 (|00⟩ ⊗ |0⟩ + |01⟩ ⊗ |1⟩ + |10⟩ ⊗ |0⟩ + |11⟩ ⊗ |1⟩) +
+ 𝑏 (|01⟩ ⊗ |0⟩ + |00⟩ ⊗ |1⟩ − |11⟩ ⊗ |0⟩ − |10⟩ ⊗ |1⟩) .
Again, the notation 𝐻1 means we act with the Hadamard gate only on the first
1 1 1 1 0 1 0
𝐻1 ≡ 𝐻 ⊗ 1 ⊗ 1 = √ ( )⊗( )⊗( ). (4.221)
2 1 −1 0 1 0 1
We can rearrange the transformed state as follows:
𝐻1 ⋅ CNOT1,2 |𝛾⟩ = |00⟩ ⊗ (𝑎 |0⟩ + 𝑏 |1⟩) +
+ |01⟩ ⊗ (𝑎 |1⟩ + 𝑏 |0⟩) +
+ |10⟩ ⊗ (𝑎 |0⟩ − 𝑏 |1⟩) +
+ |11⟩ ⊗ (𝑎 |1⟩ − 𝑏 |0⟩) .
Finally, Alice performs a measurement on the first two qubits (the one to be
teleported, and her half of the Bell state), and obtains one of four results: 00, 01,
10, or 11. These are two classical bits, which she can then send to Bob. With
this information, Bob can read from the last equation exactly which operations
he has to perform on his qubit (which you will determine in problem 4.53) in
order to obtain the original qubit |Ψ⟩ = 𝑎 |0⟩ + 𝑏 |1⟩. The qubit has been successfully
teleported from Alice to Bob!
Note that since Alice measured the original qubit, it collapsed and its quantum
state has been destroyed. Therefore, quantum teleportation does not violate the
nocloning theorem; the state of the qubit was not cloned or copied, it was just
moved from one qubit to another. Also, since Alice had to send two classical bits
to Bob – for example through a cable or radio waves – the speed of teleportation
is limited by the speed of light, and there is no violation of relativity.
Finally, since quantum teleportation requires Alice and Bob to already have one
half of an entangled pair each, and the entanglement is destroyed in the process
due to Alice’s measurement, the number of qubits they can teleport is limited by
the number of entangled pairs they have. Once they run out of entangled pairs,
they can no longer teleport any qubits until they physically exchange more en
tangled pairs. This means that you can’t just establish two teleportation stations
on, say, two planets, and teleport qubits between them forever; you will have
to actually send a spaceship from one planet to the other with a fresh supply of
entangled particles every once in a while.
Problem 4.52. Quantum teleportation has been demonstrated experimentally in
many different experiments, over distances of up to 1400 km, and not just with
qubits but even with more complicated systems. Whenever a new quantum tele
portation experiment happens, articles appear in the media with sensationalist
headlines such as “scientists demonstrate teleportation is possible!” or “is tele
portation closer than we think?”, where by “teleportation” they actually mean the
sciencefiction concept of “teleportation”, where a macroscopic object is sent
from one place to another without going through the space in between. Is the
word “teleportation” in “quantum teleportation” indeed justified? In what ways
is quantum teleportation the same as sciencefiction teleportation, and in what
ways is it different? Think about this, and discuss with your classmates.
Problem 4.53. For each of the four results of Alice’s measurement, 00, 01, 10,
and 11, determine which unitary transformations Bob must perform on his qubit
in order to obtain the original |Ψ⟩ = 𝑎 |0⟩ + 𝑏 |1⟩.
Problem 4.54. Write a computer program48 that gets an arbitrary composite
state of 𝑁 qubits49 as input and allows the user to perform the following actions:
• Act on one or more of the qubits with a quantum gate; for example, act with
Hadamard on one qubit or with CNOT on two qubits.
Use your program to simulate quantum teleportation, and show that it indeed
2. The State Axiom: The states of the system are represented by unit 𝑛
vectors in the system’s Hilbert space, up to a complex phase.
3. The Operator Axiom: The operators on the system, which act on states
to produce other states, are represented by 𝑛 × 𝑛 matrices in the system’s
Hilbert space.
• Superposition: Any state |Ψ⟩ can be written as a linear combination of
the eigenstates |𝐵𝑖 ⟩ of an observable:
|Ψ⟩ = ∑ |𝐵𝑖 ⟩⟨𝐵𝑖 |Ψ⟩. (4.222)
6. The Evolution Axiom: If the system is in the state |Ψ1 ⟩ at some point in
time, and in another state |Ψ2 ⟩ at another point in time, then the two states
must be related by the action of some unitary operator 𝑈 :
If the system is in the state |Ψ⟩, then the probability to measure the eigen
value 𝜆𝑖 is given by
⟨Ψ|𝑃𝑖 |Ψ⟩. (4.225)
After the measurement, if the eigenvalue 𝜆𝑖 was measured, then the system
will collapse to the state
𝑃𝑖 |Ψ⟩
|Ψ⟩ ↦ . (4.226)
√⟨Ψ|𝑃𝑖 |Ψ⟩
If a measurement is performed only on part of a composite system,
the total probability to measure the eigenvalue 𝜆𝑖 is the sum of the
probabilities for all the possible ways in which this eigenvalue can be
measured. After the measurement, if the eigenvalue 𝜆𝑖 was measured,
then only the system we measured will collapse to the eigenstate |𝐵𝑖 ⟩,
while the other systems will stay in a superposition.
• Expectation Value: If the system is in the state |Ψ⟩, the expectation
value for the measurement of the observable 𝐴 is given by ⟨Ψ|𝐴|Ψ⟩.
• Uncertainty Principle: If two observables 𝐴 and 𝐵 don’t commute,
the standard deviations of their measurements satisfy the uncertainty
Δ𝐴Δ𝐵 ≥ |⟨[𝐴, 𝐵]⟩| . (4.229)
The mathematical framework we have defined here is not enough on its own; one
must use the framework to define different models, which map the framework to
specific physical systems. A model is a specific choice of the following ingredients:
• The states on which these operators act, which correspond to different con
figurations of the system.
In the simple case of a qubit, we saw that the Hilbert space is ℂ2 , the Hermitian
operators corresponding to observables are linear combinations of the Pauli ma
trices, the unitary operators corresponding to transformations are the quantum
gates, and the states are the two possible values of the qubits, 0 and 1 (and
superpositions thereof).
Of course, not every possible model we can make will actually correspond to a
physical system that we can find in nature. However, amazingly, the opposite
statement does seem to be true: every physical system that we find in nature50
can be precisely described by a model built using the ingredients of quantum
We can think of quantum theory as a sort of language. Just like English is
a language with rules such as grammar and spelling, so is quantum theory a
Except perhaps general relativity, but we are pretty sure that there is a quantum theory of
general relativity, we just don’t have a consistent formulation of it yet. If time permits, we will
discuss this theory – quantum gravity – at the end of this course.
language with its own rules: observables must be Hermitian operators, possible
measurement results are given by the eigenvalues of these operators, and so on.
And just like we can use English to make any sentence we want, both true and
false, we can use quantum theory to make any model we want, both models that
correspond to real physical systems and those that do not.
Quantum mechanics is a confusing and unintuitive theory, and requires the in
troduction of many new concepts. One of the main goals of this course is to
introduce quantum mechanics to students in a way that is mathematically as sim
ple as possible, so that they won’t have to struggle with complicated math on top
of trying to understand new physical concepts.
It is quite remarkable that we have managed to describe all of the axioms of
quantum theory, and almost all of its important aspects such as superposition,
entanglement, and the uncertainty principle, using only linear algebra – without
any calculus. Moreover, by focusing on discrete twostate systems, or qubits, we
actually managed to do everything almost exclusively in ℂ2 , the simplest non
trivial complex vector space.
Unfortunately, in real life not all systems are discrete, and the time has finally
come to start introducing some calculus and talking about continuous quantum
systems, which are described by infinitedimensional Hilbert spaces. However,
the student may take comfort in the fact that this is going to be merely a straight
forward generalization of what we’ve already learned. The only real difference is
that now states are going to be functions instead of vectors, and operators are
going to be derivatives instead of matrices.
The complex number 𝑧 is called the exponent. If the exponent is zero, then all the
terms in the series vanish except the first one, and we get e0 = 1. If the exponent
is a natural number 𝑛 ∈ ℕ, then (5.1) turns out to be the same as taking the real
number51 e ≈ 2.718 to the power of 𝑛, that is, multiplying it by itself 𝑛 times. This
can then be expanded to negative integers using the formula
e−𝑛 ≡ , (5.2)
and to rational numbers using
𝑎 √
e𝑎/𝑏 ≡ e𝑎 .
∈ℚ ⟹ (5.3)
However, for arbitrary real or complex numbers, we generally use the power
series definition (5.1) directly, or an equivalent definition such as the ones you
will prove in problems 5.2 and 5.3 below.
By taking the complex conjugate of the series (5.1), we get:
∗ ∗
(e𝑧 ) = e𝑧 , (5.4)
𝑧 = 𝑟 ei 𝜙 ⟹ 𝑧 ∗ = 𝑟 e− i 𝜙 , 𝑟, 𝜙 ∈ ℝ. (5.5)
This indeed makes sense, as taking the complex conjugate mean reflecting 𝑧
across the real line, and thus turns the angle 𝜙, which is the angle with respect
to the real line, into its negative – see 3.1.
One can also prove from the definition (5.1) that e𝑧+𝑤 = e𝑧 e𝑤 , so
𝑒i 𝜙 𝑒− i 𝜙 = 𝑒i 𝜙−i 𝜙 = e0 = 1. (5.6)
as expected.
The exponential function is its own derivative:
d 𝑧
e = e𝑧 . (5.8)
In fact, it can be defined using this property, as you will prove in problem 5.1.
Using the chain rule, we get the more general result
d 𝜆𝑧 d
e = (𝜆𝑧) e𝜆𝑧 = 𝜆 e𝜆𝑧 , (5.9)
d𝑧 d𝑧
But of course, in order to know the value of the number e in the first place, we need to calculate
the power series (5.1) for 𝑧 = 1!
where 𝜆 is any constant complex number (i.e. independent of 𝑧).
The inverse function of the exponential is the logarithm:
This is also called the natural logarithm, since it is taken with respect to the
“natural” base e ≈ 2.718. More generally, a logarithm with respect to the base 𝑏
𝑤 = 𝑏𝑧 ⟺ 𝑧 = log𝑏 𝑤, 𝑏log𝑏 𝑧 = log𝑏 𝑏𝑧 = 𝑧. (5.11)
d 𝑧
𝑏 = 𝑏𝑧 loge 𝑏, (5.12)
and the extra term vanishes when 𝑏 = e, since loge e = 1. This explains why the
base e ≈ 2.718 is “natural”; it is the unique base for which the function 𝑏𝑧 is its
own derivative, without the extra term. Sometimes the notation ln is also used
for the natural logarithm: ln ≡ log𝑒 . Since 𝑏 = eln 𝑏 , the power series definition
(5.1) can be used to define the exponential of any base 𝑏 with respect to arbitrary
complex numbers 𝑧 using the formula
𝑏𝑧 = (eln 𝑏 ) = e𝑧 ln 𝑏 . (5.13)
prove that if 𝑓 (𝑧) is its own derivative, then it must be the exponential function,
i.e. 𝑎𝑛 = 1/𝑛!.
Problem 5.2. The power series expansions of the trigonometric functions cos 𝑥
and sin 𝑥 are 𝑛
(−1) 2𝑛 1 1
cos 𝑥 ≡ ∑ 𝑥 = 1 − 𝑥2 + 𝑥4 + ⋯ , (5.15)
(2𝑛)! 2 4!
∞ 𝑛
(−1) 1 1
sin 𝑥 ≡ ∑ 𝑥2𝑛+1 = 𝑥 − 𝑥3 + 𝑥5 + ⋯ . (5.16)
(2𝑛 + 1)! 3! 5!
As a corollary, show that
ei 𝑥 + e− i 𝑥
cos 𝑥 = Re (ei 𝑥 ) = , (5.18)
ei 𝑥 − e− i 𝑥
sin 𝑥 = Im (ei 𝑥 ) = . (5.19)
Problem 5.3. The binomial theorem states that for 𝑥, 𝑦 ∈ ℂ and 𝑛 ∈ ℕ:
𝑛 𝑛
(𝑥 + 𝑦) = ∑ ( )𝑥𝑛−𝑘 𝑦𝑘 , (5.20)
𝑛 1
(𝑥 + 𝑦) = 𝑥𝑛 + 𝑛𝑥𝑛−1 𝑦 + 𝑛 (𝑛 − 1) 𝑥𝑛−2 𝑦2 + ⋯ . (5.22)
Using the binomial theorem and the power series definition of the exponential
(5.1), prove the equivalent definition
𝑧 𝑛
e𝑧 = lim (1 + ) . (5.23)
𝑛→∞ 𝑛
where 1 is now the identity matrix, and 𝐴𝑛 means the product of the matrix 𝐴
with itself 𝑛 times. Note that it satisfies
e0 = 1, (5.25)
that is, the exponential of the zero matrix is the identity matrix, in analogy with
the fact that the exponential of zero is one. It also satisfies, as you will prove in
problem 5.4,
† †
(e𝐴 ) = e𝐴 , (5.26)
exponentials, e𝐴 e𝐵 . For numbers (which can be considered 1 × 1 matrices) we
have e𝑧 e𝑤 = e𝑧+𝑤 , but to prove that, we used the fact that numbers commute.
For arbitrary 𝑛 × 𝑛 matrices 𝐴 and 𝐵, it is in general not true that e𝐴 e𝐵 = e𝐴+𝐵 .
However, in problem 5.5 you will prove that this identity is true if [𝐴, 𝐵] = 0, that
is, if 𝐴 and 𝐵 commute.
The matrix logarithm is the inverse function of the matrix exponential:
𝑈 ≡ e− i 𝐻𝑡 . (5.29)
so e− i 𝐻𝑡 is indeed unitary. Note that here we used the fact that 𝐻 commutes with
itself, and therefore the product of the exponentials is the exponential of the sum,
as we discussed above. In fact, since all unitary matrices are invertible, and all
invertible matrices have a logarithm, any unitary matrix 𝑈 can be written as e− i 𝐻
In fact, every complex number has an infinite number of logarithms. The arbitrary complex
number 𝑧 = 𝑟 ei 𝜙 can also be written as 𝑧 = 𝑟 ei(𝜙+2𝜋𝑛) for all integer 𝑛, since adding a multiple of
2𝜋 to the angle 𝜙 results in the same angle. Thus we have
where 𝑛 can be any integer. Here we used the identity log (𝑎𝑏) = log 𝑎 + log 𝑏, which follows from
the identity e𝑧 e𝑤 = e𝑧+𝑤 .
The minus sign here is a convention; the inverse matrix, ei 𝐻𝑡 , is of course unitary as well.
for some Hermitian matrix 𝐻, where
𝐻 = i log 𝑈 ⟹ 𝑈 = e− i 𝐻 . (5.32)
d 𝐴𝑡
e = 𝐴 e𝐴𝑡 , (5.33)
where 𝐴 is any constant complex matrix (i.e. independent of 𝑡).
Everything we described here was defined for matrices; however, it actually also
applies to general operators on any Hilbert space – and in the infinitedimensional
case it is less convenient to think about operators as matrices, since those matri
ces would be infinitedimensional as well. The operator exponential is defined is
exactly the same way as the matrix exponential, with the identity matrix replaced
by the identity operator (which does not change the state it acts on), the power
𝐴𝑛 means the operator 𝐴 is applied 𝑛 times, and so on.
† †
Problem 5.4. Prove that (e𝐴 ) = e𝐴 .
Problem 5.5. Prove that if two matrices 𝐴 and 𝐵 commute, that is, [𝐴, 𝐵] = 0,
e𝐴 e𝐵 = e𝐴+𝐵 . (5.34)
𝜆1 0 0 e𝜆1 0 0
⎛ ⎞ ⎛ ⎞
exp ⎜
⎜ 0 ⋱ ⎟
⎜ 0 ⋱ ⎟
⎟. (5.35)
⎝ 0 0 𝜆𝑛 ⎠ ⎝ 0 0 e𝜆𝑛 ⎠
C. Using (A) and (B), prove that if 𝐴 is diagonalizable, that is, 𝐴 = 𝑃 𝐷𝑃 −1 for
some matrix 𝑃 and a diagonal matrix 𝐷 (recall section 3.2.16), then
e𝐴 = 𝑃 e𝐷 𝑃 −1 . (5.37)
Problem 5.7. Find the matrix exponential e− i 𝜃𝜎𝑦 where 𝜎𝑦 is the Pauli matrix
0 −i
𝜎𝑦 ≡ ( ). (5.38)
i 0
When we described the Evolution Axiom, we only talked about evolution from one
discrete point in time to another. As a first step towards quantum mechanics of
continuous systems, let us discuss time evolution with a continuous time variable.
to a continuous time variable, it is useful in practice to replace the Evolution
Axiom, which is very abstract, with the Schrödinger equation, which is a concrete
differential equation that can be solved, either exactly or approximately, for a
variety of different systems. The focus then shifts from the unitary evolution
operator of the Evolution Axiom to a Hermitian operator called the Hamiltonian.
We will see below precisely how these two operators are related to each other.
To further illustrate the fact that the Evolution Axiom is more fundamental than
the Schrödinger equation, consider the fact that the Evolution Axiom is an almost
inevitable result of the mathematical framework of quantum theory – indeed, if
quantum states evolved with nonunitary operators, then probabilities would no
longer sum to 1, and the theory wouldn’t make any sense. While the Schrödinger
equation also preserves probabilities (as it must), this fact is not immediately
obvious from the form of the equation.
Let us recall the Evolution Axiom from section 4.5.1, with slightly different nota
tion. If the system is in the state |Ψ (𝑡1 )⟩ at time 𝑡1 , and in another state |Ψ (𝑡2 )⟩
at time 𝑡2 , then the two states must be related by the action of some unitary
operator 𝑈 (𝑡2 ← 𝑡1 ):
|Ψ (𝑡2 )⟩ = 𝑈 (𝑡2 ← 𝑡1 ) |Ψ (𝑡1 )⟩ . (5.39)
The main difference between this formulation and the one we had for discrete
systems is that now we are letting 𝑈 be a continuous function of 𝑡1 and 𝑡2 , so
that we can encode the unitary evolution of the system from any point in time
to any other point in time. This is very different than what we discussed in the
discrete case, where for example, a quantum gate is not a function of time – it is
the same quantum gate at all times.
However, this is still just a special case of the Evolution Axiom; the axiom simply
states that evolution between any two points in time must be encoded in some
unitary operator, but it will in general be a different operator for different start
and end times, so here we have explicitly encoding the different operators as one
universal function 𝑈 (𝑡2 ← 𝑡1 ).
In equation (5.39), if we assume that 𝑡2 = 𝑡1 (that is, no time has passed) then
we get
|Ψ (𝑡1 )⟩ = 𝑈 (𝑡1 ← 𝑡1 ) |Ψ (𝑡1 )⟩ . (5.40)
Since this must be true for every state |Ψ (𝑡1 )⟩ and for every time 𝑡1 , we see55
that if no time has passed, 𝑈 (𝑡1 ← 𝑡1 ) must be the identity operator:
Let us now assume that the system is in the state |Ψ (𝑡3 )⟩ at time 𝑡3 . Then from
equation (5.39) we must have on the one hand
In particular, if 𝑡3 = 𝑡1 we get
or in other words, evolution to the past is given by the adjoint (or inverse) of the
evolution to the future, as we discussed in section 4.5.1.
We now change notation slightly by taking 𝑡1 ↦ 𝑡0 and 𝑡2 ↦ 𝑡 in equation (5.39):
For any arbitrary time 𝑡, the evolution of the system from a fixed time 𝑡0 is given
by this equation. Let us take the time derivative of the equation:
d d𝑈 (𝑡 ← 𝑡0 )
|Ψ (𝑡)⟩ = |Ψ (𝑡0 )⟩ , (5.48)
d𝑡 d𝑡
where we consider |Ψ (𝑡0 )⟩ to be independent of 𝑡 since 𝑡0 is a fixed time. From
equation (5.47) we find, by multiplying both sides by 𝑈 † (𝑡 ← 𝑡0 ) from the left, that
We plug that into equation (5.48) and find
d d𝑈 (𝑡 ← 𝑡0 ) †
|Ψ (𝑡)⟩ = 𝑈 (𝑡 ← 𝑡0 ) |Ψ (𝑡)⟩ , (5.50)
d𝑡 d𝑡
where the time derivative only acts on 𝑈 and not on 𝑈 † . Now, let us define a new
operator 𝐻 called the Hamiltonian as follows:
d𝑈 (𝑡 ← 𝑡0 ) †
𝐻 (𝑡) ≡ i 𝑈 (𝑡 ← 𝑡0 ) . (5.51)
Note that 𝐻 can in general be a function of 𝑡, but it is independent of 𝑡0 , which
is why we called it 𝐻 (𝑡) and not 𝐻 (𝑡, 𝑡0 ) or 𝐻 (𝑡 ← 𝑡0 ). Also, the Hamiltonian is
Hermitian. You will prove both of these facts in problem 5.8.
In terms of the Hamiltonian, equation (5.50) becomes
i |Ψ (𝑡)⟩ = 𝐻 (𝑡) |Ψ (𝑡)⟩ . (5.52)
Problem 5.8.
A. Prove that 𝐻 (𝑡) as defined in equation (5.51) is independent of 𝑡0 , thus jus
tifying the notation 𝐻 (𝑡), as well as its use in the Schrödinger equation (5.52),
where 𝑡 is the only variable.
B. Prove that 𝐻 (𝑡) is a Hermitian operator.
In nonnatural units, this equation features the reduced Planck constant ℏ:
iℏ |Ψ (𝑡)⟩ = 𝐻 (𝑡) |Ψ (𝑡)⟩ . (5.53)
Of course, as we discussed in section 4.1.1, ℏ is dimensionful and therefore its numerical value
doesn’t matter, so we can just choose units such as the Planck units, where it simply has the value
ℏ ≡ 1.
In the Schrödinger equation, a time derivative d/d𝑡 is acting on the state |Ψ (𝑡)⟩. Therefore,
one might wonder whether d/d𝑡 is an operator on the Hilbert space. However, the answer is no.
This is because here we are dealing with nonrelativistic quantum mechanics, and nonrelativistic
theories – both classical and quantum – treat space and time differently: while 𝑥 is an operator
(as we will see below), 𝑡 is just a label. See also footnote (68).
In this section we defined a function |Ψ(𝑡)⟩, which takes some real number 𝑡 as input, and returns
some state in the Hilbert space as output. The derivative d/d𝑡 doesn’t act on the vectors in the
Hilbert space, which is what operators do; instead, it acts on this function. Therefore, d/d𝑡 is not
an operator on the Hilbert space.
To illustrate this further, consider a system with a finite Hilbert space, such as a qubit. We can
define a function |Ψ(𝑡)⟩ which returns a particular state of the qubit given a particular point 𝑡 in
time. Then d/d𝑡 would be the derivative of that function with respect to time. But as we have seen,
operators on finite Hilbert spaces take the form of matrices acting on vectors in the space. d/d𝑡 is
not a matrix, so it is not an operator on the Hilbert space – it’s just a derivative with respect to a
5.2.3 TimeIndependent Hamiltonians
Let us now assume that the Hamiltonian is constant, that is, timeindependent.
Although in some quantum systems the Hamiltonian does depend on time, this is
not very common; most quantum systems have timeindependent Hamiltonians.
We can rewrite equation (5.51) as follows:
d𝑈 (𝑡 ← 𝑡0 )
= − i 𝐻𝑈 (𝑡 ← 𝑡0 ) . (5.54)
Compare this with equation (5.33):
d 𝐴𝑡
e = 𝐴 e𝐴𝑡 , (5.55)
which we derived assuming that 𝐴 is constant. If the Hamiltonian 𝐻 is constant,
then 𝐴 ≡ − i 𝐻 is also constant. In addition, we can replace 𝑡 with 𝑡 − 𝑡0 in the
exponent, since that does not change the derivative (because 𝑡0 is constant).
Hence, we see that the solution59 to the differential equation (5.54) is
𝑈 (𝑡 ← 𝑡0 ) ≡ e− i 𝐻(𝑡−𝑡0 ) . (5.56)
𝑈 (𝑡0 ← 𝑡0 ) = 1. (5.57)
The evolution operator between any two arbitrary points in time, 𝑡1 and 𝑡2 , is given
by equation (5.59).
It is interesting that, since 𝐻 is constant, the unitary evolution operator is not a
function of both 𝑡1 and 𝑡2 , but only the difference between them, 𝑡2 − 𝑡1 . So for
example, the evolution from time 𝑡1 = 3 to time 𝑡2 = 4 and from time 𝑡1 = 4 to time
𝑡2 = 5 will be given by the same unitary operator, e− i 𝐻 , since in both cases the
time difference is 𝑡2 − 𝑡1 = 1.
𝑡1 ↦ 𝑡1 + 𝑡, 𝑡2 ↦ 𝑡2 + 𝑡, (5.60)
where 𝑡 ∈ ℝ.
C. Verify that under a timereversal transformation
the evolution operator is replaced with its adjoint (or inverse). Thus the evolution
equation (5.39) is invariant under time reversal if we also replace 𝑈 by its adjoint.
This is an explicit example of the timereversal symmetry of quantum mechanics,
which we discussed in section 4.5.1.
Problem 5.10. In section 4.5.2 we discussed several unitary operators which act
on qubits. For example, the quantum 𝑍 gate is given by the Pauli matrix 𝜎𝑧
1 0
𝑍 ≡ 𝜎𝑧 = ( ), (5.62)
0 −1
and its action is to leave |0⟩ unchanged but flip the phase of |1⟩. Find the Hamil
tonian corresponding to this unitary evolution operator. Since this is a discrete
evolution, the time coordinate is discrete and not continuous, and we can take
the time interval to be 1. In other words, you need to find the 𝐻 in the equation
𝑍 = e− i 𝐻 .
In problem 5.8, you proved that the Hamiltonian is a Hermitian operator. There
fore, it should correspond to an observable. Indeed, it does; this observable is
the energy of the system. Its (real) eigenvalues 𝐸𝑖 correspond to energy eigen
states |𝐸𝑖 ⟩ which, as usual, make up an orthonormal basis60 :
⎜ ⎞
p (𝑡, 𝑥, 𝑦, 𝑧) ≡ ⎜
𝑥 ⎟
⎟. (5.65)
⎜ 𝑝𝑦 ⎟
⎝ 𝑝𝑧 ⎠
We have seen that in order to create a model for a specific physical system in
quantum theory, we must choose a specific Hilbert space with specific states
and specific operators. But how do we know which Hilbert space, states, and
operators to use for a given physical system? This is often a hard question to
answer. For example, we currently do not have a consistent and experimentally
verified quantum model for general relativity; the problem of finding such a model
is known as quantum gravity, and it is one of the hardest problems in physics.
Here we used slightly different notation than usual, with the basis eigenstates being |𝐸𝑖 ⟩ instead
of |𝐵𝑖 ⟩ and the eigenvalues being 𝐸𝑖 instead of 𝜆𝑖 – compare equation (3.142).
Luckily, it turns out that there is a certain prescription that allows us to take a
classical theory and turn it into a quantum theory in a straightforward way. The
properties of the classical theory will dictate the type of Hilbert space, states, and
operators we should use in the corresponding quantum theory. This process is
known as quantization. It doesn’t work for every classical theory; for example,
it doesn’t work for general relativity, which is why quantizing gravity is so hard.
However, it does work, in an experimentally verifiable way, for most classical
theories of interest.
where 𝑇 is the kinetic energy, which depends only on the momentum 𝑝, and 𝑉 is
the potential energy, which depends only on the position 𝑥.
Let us consider the specific case of a single particle of mass 𝑚. In Newtonian
mechanics, the particle’s momentum is defined as
𝑝 ≡ 𝑚𝑣, where 𝑣 ≡ 𝑥̇ ≡ . (5.67)
The kinetic energy is defined as 21 𝑚𝑣2 , and we can write it in terms of the momen
tum as follows:
1 1 2 𝑝2
𝑇 = 𝑚𝑣2 = (𝑚𝑣) = . (5.68)
2 2𝑚 2𝑚
We conclude that for a particle of mass 𝑚, the Hamiltonian will generally be of
the form
𝐻= + 𝑉 (𝑥) . (5.69)
The kinetic energy of a particle will always be 𝑝2 /2𝑚, but the potential energy 𝑉 (𝑥)
depends on the forces acting on the particle, such as gravity or electromagnetism.
Now, let us define the Poisson brackets of two functions 𝑓, 𝑔 of position 𝑥 and
momentum 𝑝 as follows:
𝜕𝑓 𝜕𝑔 𝜕𝑔 𝜕𝑓
{𝑓, 𝑔} ≡ − . (5.70)
𝜕𝑥 𝜕𝑝 𝜕𝑥 𝜕𝑝
In problem 5.11 you will prove some properties of these brackets; in particular
they are antisymmetric, {𝑔, 𝑓} = − {𝑓, 𝑔} which means that {𝑓, 𝑓} = 0 for any 𝑓.
For 𝑥 and 𝑝 themselves we have
𝜕𝑥 𝜕𝑝 𝜕𝑝 𝜕𝑥
{𝑥, 𝑝} = − = 1, (5.72)
𝜕𝑥 𝜕𝑝 𝜕𝑥 𝜕𝑝
since 𝑥 and 𝑝 are assumed to be independent variables, so their derivatives with
respect to each other vanish. Even though in Newtonian mechanics we define
the momentum to be 𝑝 ≡ 𝑚𝑥,̇ in Hamiltonian mechanics we “forget” about this
relation and just assume that 𝑥 and 𝑝 are two completely independent degrees of
freedom of the system, thus generalizing the concept of momentum to any kind
of system.
The dynamics of the system in Hamiltonian mechanics are determined as follows.
If 𝐴 is any function of 𝑥 and 𝑝, then its time derivative is given by61
𝐴̇ ≡ = {𝐴, 𝐻} . (5.74)
For 𝑥 and 𝑝 themselves, we get
d𝑥 𝜕𝑥 𝜕𝐻 𝜕𝐻 𝜕𝑥 𝜕𝐻
𝑥̇ ≡ = {𝑥, 𝐻} = − = , (5.75)
d𝑡 𝜕𝑥 𝜕𝑝 𝜕𝑥 𝜕𝑝 𝜕𝑝
d𝑝 𝜕𝑝 𝜕𝐻 𝜕𝐻 𝜕𝑝 𝜕𝐻
𝑝̇ ≡ = {𝑝, 𝐻} = − =− . (5.76)
d𝑡 𝜕𝑥 𝜕𝑝 𝜕𝑥 𝜕𝑝 𝜕𝑥
In other words, the evolution of each parameter depends on the derivative of the
Hamiltonian with respect to the other parameter. Equations (5.75) and (5.76)
are called Hamilton’s equations.
Here we are assuming that 𝐴 does not depend on 𝑡 explicitly, but only implicitly via its
dependence on 𝑥 and 𝑝. If 𝐴 does have explicit dependence on 𝑡, then this equation becomes
d𝐴 𝜕𝐴
= {𝐴, 𝐻} + . (5.73)
d𝑡 𝜕𝑡
For a point particle with Hamiltonian equation (5.69), we get
d𝑥 𝜕 𝑝2 𝑝
𝑥̇ ≡ = ( + 𝑉 (𝑥)) = , (5.77)
d𝑡 𝜕𝑝 2𝑚 𝑚
d𝑝 𝜕 𝑝2 𝜕
𝑝̇ ≡ =− ( + 𝑉 (𝑥)) = − 𝑉 (𝑥) . (5.78)
d𝑡 𝜕𝑥 2𝑚 𝜕𝑥
The first equation relates the two independent variables 𝑥 and 𝑝 to each other:
𝑝 = 𝑚𝑥.̇ Of course, this is just the definition of the momentum of a particle in
Newtonian mechanics, but Hamiltonian mechanics allows us to consider more
general systems and define a generalized momentum for any kind of system. For
example, in a rotating system 𝑝 will be the angular momentum, and so on.
The second equation is Newton’s second law: the time derivative of momentum
is the force, and the force is given by minus the derivative of the potential62 . We
can take the derivative of (5.77) and plug (5.78) into it to get
d2 𝑥 d𝑥 ̇ 1 d𝑝 1 𝜕
𝑥̈ ≡ 2
= = =− 𝑉 (𝑥) . (5.80)
d𝑡 d𝑡 𝑚 d𝑡 𝑚 𝜕𝑥
Multiplying by 𝑚, we get the familiar form of Newton’s law:
d2 𝑥 𝜕
𝐹 = 𝑚𝑎 = 𝑚𝑥̈ ≡ 𝑚 2
= − 𝑉 (𝑥) , (5.81)
d𝑡 𝜕𝑥
where 𝑎 is the acceleration.
• Jacobi identity:
{𝑓, {𝑔, ℎ}} + {𝑔, {ℎ, 𝑓}} + {ℎ, {𝑓, 𝑔}} = 0. (5.85)
Recall the definition of the expectation value for the measurement of an observ
able 𝐴 when the system is in the state |Ψ⟩:
Let us take the time derivative of this, assuming that the state |Ψ⟩ depends on
time but the observable 𝐴 doesn’t (which is usually the case):
d ⟨𝐴⟩ d d
= ( ⟨Ψ|) 𝐴|Ψ⟩ + ⟨Ψ|𝐴 ( |Ψ⟩) . (5.87)
d𝑡 d𝑡 d𝑡
|Ψ⟩ = − i 𝐻 |Ψ⟩ . (5.88)
We can take the adjoint of this equation to get (remember that 𝐻 is Hermitian so
𝐻 = 𝐻 †)
⟨Ψ| = i ⟨Ψ| 𝐻. (5.89)
Plugging into equation (5.87), we get
d ⟨𝐴⟩
= i⟨Ψ|𝐻𝐴|Ψ⟩ − i⟨Ψ|𝐴𝐻|Ψ⟩
= − i⟨Ψ| (𝐴𝐻 − 𝐻𝐴) |Ψ⟩
= − i⟨Ψ| [𝐴, 𝐻] |Ψ⟩
= − i ⟨[𝐴, 𝐻]⟩ .
= {𝐴, 𝐻} , (5.90)
we find a very interesting result: the quantum expectation value of the observ
able 𝐴 evolves in time just as classical Hamiltonian mechanics predicts, provided
we relate the Poisson brackets of functions and the commutator of operators as
[𝐴, 𝐻] ≡ i {𝐴, 𝐻} , (5.91)
or more generally for any two observables 𝐴 and 𝐵,
[𝑥, 𝑝] = i . (5.93)
What we have derived (or at least, motivated) here is called canonical quantiza
tion. Given a classical system described by a Hamiltonian, we can turn it into a
quantum system – quantize it – by “promoting” classical functions on the phase
space, including the variables 𝑥 and 𝑝 themselves, to Hermitian operators. We
are not provided with any specific information about these operators, except that
they are Hermitian (which they must be, since in classical physics all variables
are real!) and that the quantum commutators should be related to the classical
Poisson brackets according to the prescription equation (5.92).
These Hermitian operators now represent observables in the quantum theory;
they have eigenstates and eigenvalues which represent possible measurement
outcomes. This means that the values of 𝑥 and 𝑝 are no longer uniquely deter
mined from some initial conditions, as in the classical theory; they become prob
abilistic. In addition, the time evolution of the system is no longer described by
Hamilton’s equations, but rather, by the Schrödinger equation.
Note that what we did here does not constitute a proof that all classical theories
are related to quantum theories in this way. Canonical quantization merely en
sures that expectation values of the observables in the quantum theory evolve in
time in the same way as the observables in the classical theory, which is some
thing that we expect to be true, but it is not by itself a sufficient condition for cre
ating a sensible quantum theory. Indeed, there are known cases where canonical
quantization doesn’t quite work, or is at least ambiguous, because two Poisson
brackets which in the classical theory are equal to each other will have different
values in the quantum theory, generating an inconsistency64 .
Nevertheless, canonical quantization works incredibly well in the vast majority of
With ℏ, this equation will take the form [𝑥, 𝑝] = i ℏ.
There are better ways than canonical quantization to turn classical theories into quantum the
ories, the most popular being path integral quantization. However, these alternative quantization
methods generally require much more advanced math, so we will only discuss canonical quantiza
tion in this course.
cases – and indeed, most classical theories, from a single point particle to very
complicated systems with many different particles and forces, can be quantized
in this way, and the results have been verified experimentally to high precision!
Just as in the case of the Schrödinger equation, in introductory quantum mechan
ics courses canonical quantization is usually just presented as an arbitrary axiom.
I hope I managed to motivate it and give you some intuition as to why classical
and quantum theories are related in this way.
𝑝2 1
𝐻= + 𝑚𝜔2 𝑥2 . (5.94)
2𝑚 2
We have the standard kinetic energy term 𝑇 (𝑝) = 𝑝2 /2𝑚, where 𝑚 is the mass of
the particle, and the potential energy
𝑉 (𝑥) ≡ 𝑚𝜔2 𝑥2 , (5.95)
where 𝜔 is a numerical constant called the frequency or angular frequency, be
cause it represents the frequency in which the oscillator oscillates.
It is easy to find the equations of motion using Hamilton’s equations (5.75) and
(5.76). Alternatively, since this is a particle with a Hamiltonian of the standard
form (5.69), we can just use Newton’s second law (5.81) directly:
d2 𝑥 1 𝜕 1 𝜕 1
=− 𝑉 (𝑥) = − ( 𝑚𝜔2 𝑥2 ) = −𝜔2 𝑥. (5.96)
d𝑡 𝑚 𝜕𝑥 𝑚 𝜕𝑥 2
To solve this differential equation, we can use the fact that
d d
cos 𝑡 = − sin 𝑡, sin 𝑡 = cos 𝑡, (5.97)
d𝑡 d𝑡
which means that
d2 d
cos 𝑡 = − sin 𝑡 = − cos 𝑡. (5.98)
d𝑡 d𝑡
If we replace 𝑡 by 𝜔𝑡 + 𝜙, where both 𝜔 and 𝜙 are constant (independent of 𝑡), then
(𝜔𝑡 + 𝜙) = 𝜔, (5.99)
we get, by the chain rule, that each derivative generates a factor of 𝜔, so
d2 d
cos (𝜔𝑡 + 𝜙) = −𝜔 ( sin (𝜔𝑡 + 𝜙)) = −𝜔2 cos (𝜔𝑡 + 𝜙) . (5.100)
d𝑡 d𝑡
Therefore, this differential equation has the solution:
where the integration constants 𝐴 and 𝜙 are real numbers determined by the initial
conditions. Now we see why this is called a harmonic oscillator: the position of
the particle oscillates repeatedly between +𝐴 and −𝐴 over time.
Problem 5.12. Prove that the most general solution for the classical harmonic
oscillator can also be written as
𝑥 (𝑡) = 𝐷 ei 𝜔𝑡 +𝐸 e− i 𝜔𝑡 , (5.103)
where 𝐷 and 𝐸 are integration constants. All of these solutions are equivalent;
find the relationships between the integration constants {𝐴, 𝜙}, {𝐵, 𝐶}, and {𝐷, 𝐸}
– that is, write each pair in terms of another pair.
Problem 5.13. As an example of solving the equation of motion for specific initial
conditions, if the particle starts at time 𝑡 = 0 at position 𝑥 (0) = 1 with velocity
𝑥̇ (0) = 0, then we have
𝑥 (0) = 𝐴 = 1 ⟹ 𝐴 = 0, (5.105)
and thus the solution is
𝑥 (𝑡) = cos (𝜔𝑡) . (5.106)
Similarly, find a solution for the classical harmonic oscillator with the initial con
ditions 𝑥 (0) = 0 and 𝑥̇ (0) = 𝜔.
Problem 5.14. By plugging the general solution (5.101) into the Hamiltonian
(5.94), show that the total energy of the system is
𝐻 = 𝑚𝜔2 𝐴2 . (5.107)
Thus the Hamiltonian is timeindependent, and energy is conserved.
Let us now quantize the simple harmonic oscillator by promoting 𝑥 and 𝑝 to opera
tors. We are interested in finding the energy eigenstates of this quantum system.
Instead of finding them by solving a differential equation, we will use an easier
and more intuitive method. We define the ladder operators:
𝑚𝜔 i 𝑚𝜔 i
𝑎=√ (𝑥 + 𝑝) , 𝑎† = √ (𝑥 − 𝑝) , (5.108)
2 𝑚𝜔 2 𝑚𝜔
where 𝑎† is called the creation operator and 𝑎 is called the annihilation operator.
Notice that 𝑎† is indeed the adjoint of 𝑎, since the numbers 𝑚, 𝜔 are real and the
operators 𝑥, 𝑝 are Hermitian. These definitions may be inverted to get the position
and momentum operators in terms of the ladder operators:
1 𝑚𝜔 †
𝑥=√ (𝑎† + 𝑎) , 𝑝 = i√ (𝑎 − 𝑎) . (5.109)
2𝑚𝜔 2
𝑚𝜔 i 𝑚𝜔 i
𝜔𝑎† 𝑎 = 𝜔√ (𝑥 − 𝑝) ⋅ √ (𝑥 + 𝑝)
2 𝑚𝜔 2 𝑚𝜔
1 i i
= 𝑚𝜔2 (𝑥 − 𝑝) (𝑥 + 𝑝)
2 𝑚𝜔 𝑚𝜔
1 i i i
= 𝑚𝜔2 (𝑥2 + 𝑥𝑝 − 𝑝𝑥 − ( 𝑝) )
2 𝑚𝜔 𝑚𝜔 𝑚𝜔
1 𝑝2 i
= 𝑚𝜔2 ( 2 2 + 𝑥2 + [𝑥, 𝑝])
2 𝑚 𝜔 𝑚𝜔
𝑝2 1 1
= + 𝑚𝜔2 𝑥2 + i 𝜔 [𝑥, 𝑝] .
2𝑚 2 2
Recall that in the classical theory we have {𝑥, 𝑝} = 1, so in the quantum theory we
have [𝑥, 𝑝] = i. Therefore:
𝑝2 1 1
𝜔𝑎† 𝑎 = + 𝑚𝜔2 𝑥2 − 𝜔. (5.110)
2𝑚 2 2
Comparing this to the Hamiltonian operator (5.94):
𝑝2 1
𝐻= + 𝑚𝜔2 𝑥2 , (5.111)
2𝑚 2
we see that we can write
𝐻 = 𝜔 (𝑎† 𝑎 + ) . (5.112)
Finally, we define a new operator called the number operator:
𝑁 = 𝑎† 𝑎. (5.113)
𝐻 = 𝜔 (𝑁 + ) . (5.114)
The Hamiltonian has been simplified considerably! Since both 𝜔 and 1/2 are just
numbers, the problem of finding the eigenvalues and eigenstates of 𝐻 now re
duces to finding the eigenvalues and eigenstates of 𝑁 .
Problem 5.15.
A. Show that 𝑁 is Hermitian.
B. Show that if |𝑛⟩ is an eigenstate of 𝑁 with the eigenvalue 𝑛, that is,
Since 𝑁 is Hermitian, we know that 𝑛 must be a real number. Let us calculate the
expectation value of the observable 𝑁 with respect to the eigenstate |𝑛⟩:
⟨𝑁 ⟩𝑛 = ⟨𝑛|𝑁 |𝑛⟩ = ⟨𝑛|𝑎† 𝑎|𝑛⟩ = ‖𝑎𝑛‖ , (5.118)
where we used the fact that ⟨𝑛| 𝑎† is the bra of 𝑎 |𝑛⟩. On the other hand, we have
where we used equation (5.117) and the fact that the state |𝑛⟩ is normalized to
1. By comparing the two equations, we see that
𝑛 = ‖𝑎𝑛‖ ≥ 0, (5.120)
𝑁 𝑎 − 𝑎𝑁 = [𝑁 , 𝑎] = −𝑎, (5.121)
𝑁 𝑎† − 𝑎† 𝑁 = [𝑁 , 𝑎† ] = 𝑎† , (5.122)
so we have
𝑁 𝑎 = 𝑎𝑁 − 𝑎 = 𝑎 (𝑁 − 1) , 𝑁 𝑎† = 𝑎† 𝑁 + 𝑎† = 𝑎† (𝑁 + 1) , (5.123)
and thus
𝑁 𝑎 |𝑛⟩ = 𝑎 (𝑁 − 1) |𝑛⟩ = (𝑛 − 1) 𝑎 |𝑛⟩ , (5.124)
where we used equation (5.117) and the fact that since 𝑛 ± 1 is a number, it
commutes with operators and can be moved to the left. Writing this result in a
different way, we see that
‖𝑎𝑛‖ = ⟨𝑛|𝑎† 𝑎|𝑛⟩ = ⟨𝑛|𝑁 |𝑛⟩ = 𝑛. (5.128)
and thus
𝑎𝑎† = 𝑎† 𝑎 + 1 = 𝑁 + 1. (5.130)
We therefore get
‖𝑎† 𝑛‖2 = ⟨𝑛|𝑎𝑎† |𝑛⟩ = ⟨𝑛| (𝑁 + 1) |𝑛⟩ = ⟨𝑛|𝑁 |𝑛⟩ + ⟨𝑛|𝑛⟩ = 𝑛 + 1. (5.131)
The normalized eigenstates are now obtained, as usual, by dividing by the norm:
1 1
|𝑛 − 1⟩ = √ 𝑎 |𝑛⟩ , |𝑛 + 1⟩ = √ 𝑎† |𝑛⟩ . (5.133)
𝑛 𝑛+1
Another way to write this, from a different point of view, is as the action of the
operators 𝑎 and 𝑎† on the state |𝑛⟩:
√ √
𝑎 |𝑛⟩ = 𝑛 |𝑛 − 1⟩ , 𝑎† |𝑛⟩ = 𝑛 + 1 |𝑛 + 1⟩ . (5.134)
We see that 𝑎 reduces the energy eigenvalue by 1, while 𝑎† increases the energy
eigenvalue by 1. In other words, 𝑎† gets us to the state of next higher energy
(it “creates one quantum of energy”) while 𝑎 gets us to the state of next lower
energy (it “annihilates one quantum of energy”). This is the reason we called 𝑎†
the creation operator and 𝑎 the annihilation operator. We call them the ladder
operators because they let us “climb the ladder” of energy eigenstates.
Going back to the definition of the Hamiltonian in terms of the number operator,
we see that
𝐻 |𝑛⟩ = 𝜔 (𝑛 + ) |𝑛⟩ , (5.135)
and thus, as you proved in problem 5.15, |𝑛⟩ is an energy eigenstate with eigen
𝐸𝑛 ≡ 𝜔 (𝑛 + ) . (5.136)
In particular, since we showed above that 𝑛 must be nonnegative, and since we
now also see that it has to be an integer (as it can only be increased or decreased
by 1!), the possible eigenstates are found to be
We found that the energy of the quantum harmonic oscillator is discrete, or quan
tized, and the system can only have energy which differs from 𝜔/2 by equal steps
of 𝜔. The state of lowest energy, also called the ground state, is |0⟩. It has the
energy eigenvalue
𝐸0 = 𝜔. (5.138)
If we act on the ground state with the annihilation operator, we get
𝑎 |0⟩ = 0, (5.139)
which is not a state, because it has norm 0 and cannot be normalized. This means
that we cannot generate states with energy lower than that of the ground state.
If we act on |0⟩ with the creation operator, we get
We say that 𝑎† , which takes us from |0⟩ to |1⟩, excites the harmonic oscillator from
the ground state to the first excited state, which has exactly one “quantum” of
energy. The state |𝑛⟩ has exactly 𝑛 quanta, while the ground state |0⟩ has no
As we mentioned above, the quantum harmonic oscillator may be used to de
scribe many different physical systems. In quantum field theory, the operator 𝑁
corresponds to the number of particles excited from the field. So |0⟩ is the vacuum
state, or a state with no particles65 ; |1⟩ is a state where one particle has been ex
cited from the field (e.g. one photon has been excited from the electromagnetic
field); |2⟩ is a state with two particles; and so on.
Problem 5.17. Find ⟨𝑉 ⟩ for the harmonic oscillator given that the system is in
the energy eigenstate |𝑛⟩. How is the potential energy related to the total energy?
Notice that the vacuum state, despite having no particles, still has nonzero energy 𝜔/2! This
is called zeropoint energy, and it is simply the energy of the field itself.
5.5 Wavefunctions, Position, and Momentum
0 if 𝑖 ≠ 𝑗,
⟨𝐵𝑖 |𝐵𝑗 ⟩ = 𝛿𝑖𝑗 = { (5.143)
1 if 𝑖 = 𝑗.
The Kronecker delta 𝛿𝑖𝑗 has the property that, when evaluated inside a sum over
an index 𝑖, it “chooses” the term in the sum with index 𝑗:
∑ 𝑓𝑖 𝛿𝑖𝑗 = 𝑓𝑗 , (5.144)
where 𝑓𝑖 represents the terms to be summed upon. You don’t actually need to
evaluate the sum, since all of the terms with 𝑖 ≠ 𝑗 vanish, and you are left with
just one term, the one with 𝑖 = 𝑗.
The infinitedimensional version of this is that for two basis states |𝑥⟩ and |𝑥′ ⟩,
where 𝑥, 𝑥′ ∈ ℝ, we have
⟨𝑥|𝑥′ ⟩ = 𝛿 (𝑥 − 𝑥′ ) , (5.145)
where 𝛿 (𝑥 − 𝑥′ ) is the Dirac delta function. This function is zero everywhere except
when 𝑥 = 𝑥′ , in which case it is divergent. More precisely, the Dirac delta isn’t
actually a function, it is a distribution, which basically means it is only welldefined
when used inside an integral. For any function 𝑓, the Dirac delta satisfies the
∫ 𝑓 (𝑥) 𝛿 (𝑥 − 𝑥′ ) d𝑥 = 𝑓 (𝑥′ ) . (5.146)
In other words, when evaluated inside an integral over a variable 𝑥, the delta
function 𝛿 (𝑥 − 𝑥′ ) simply “chooses” the value of the integrand for which 𝑥 = 𝑥′ . This
Our Hilbert space is now infinitedimensional, and a rigorous discussion of such a space requires
dealing with many mathematical subtleties, but we will mostly ignore them in this course due to
lack of time.
is simply a generalization the property of the Kronecker delta in equation (5.144).
You don’t need to evaluate the integral, since all of the terms with 𝑥 ≠ 𝑥′ vanish,
and you are left with just one term, the one with 𝑥 = 𝑥′ .
Problem 5.18. Prove the following properties of the Dirac delta function:
A. +∞
∫ 𝑓 (𝑥) 𝛿 (𝑥) d𝑥 = 𝑓 (0) . (5.147)
B. +∞
∫ 𝛿 (𝑥) d𝑥 = 1. (5.148)
𝛿 (𝑥) = 𝛿 (−𝑥) . (5.149)
𝛿 (𝜆𝑥) =𝛿 (𝑥) , 𝜆 ∈ ℝ. (5.150)
Problem 5.19. Let us define the Heaviside step function:
⎧0 𝑥 < 0,
Θ (𝑥) ≡ ⎨ 12 𝑥 = 0, (5.151)
⎩ 𝑥 > 0.
Prove that
Θ (𝑥) = 𝛿 (𝑥) , (5.152)
where 𝛿 (𝑥) is the Dirac delta function.
Since |𝑥⟩ is an orthonormal eigenbasis, we should be able to write down any state
|Ψ⟩ as a linear combination – or superposition – of the basis eigenstates. Let us
recall that in the finitedimensional case, with a finite basis |𝐵𝑖 ⟩, we have
|Ψ⟩ = ∑ |𝐵𝑖 ⟩⟨𝐵𝑖 |Ψ⟩. (5.153)
In section 3.2.7 we said that ⟨𝐵𝑖 |Ψ⟩ – the probability amplitudes – are the coor
dinates of the representation of the vector |Ψ⟩ with respect to the basis |𝐵𝑖 ⟩, and
they can be collected into an 𝑛dimensional vector:
⟨𝐵1 |Ψ⟩
⎜ ⎞
|Ψ⟩ ∣ ≡ ⎜ ⋮ ⎟. (5.154)
𝐵 ⎝ ⟨𝐵𝑛 |Ψ⟩ ⎠
In the infinitedimensional case, we simply replace the sum with an integral (and
optionally add time dependence, since we now have a continuous time coordi
|Ψ (𝑡)⟩ = ∫ |𝑥⟩ ⟨𝑥 | Ψ (𝑡)⟩ d𝑥. (5.155)
In this case, ⟨𝑥|Ψ (𝑡)⟩ are the coordinates of the representation of the vector |Ψ (𝑡)⟩
with respect to the basis |𝑥⟩. Since there is one coordinate for each real number
𝑥, we cannot collect them into a vector; instead, we define a function:
The complexvalued function 𝜓 (𝑡, 𝑥), which returns the probability amplitude to
measure the particle at position 𝑥 at time 𝑡, is called the wavefunction.
Given a wavefunction 𝜓 (𝑡, 𝑥), the probability density to find the particle at position
𝑥 at time 𝑡 is given by the magnitude squared of the probability amplitude:
2 2
|𝜓 (𝑡, 𝑥)| = |⟨𝑥 | Ψ (𝑡)⟩| . (5.157)
The reason this is a probability density, and not a probability, is that continuous
probability distributions behave a bit differently than discrete ones. The probabil
ity to find the particle somewhere in the real interval [𝑎, 𝑏] ⊂ ℝ at time 𝑡 is given
by the integral
∫ |𝜓 (𝑡, 𝑥)| d𝑥. (5.158)
If 𝑎 = 𝑏, then the integral evaluates to zero. This means that the probability to
find a particle at any one specific point 𝑥 is actually zero! A set containing just
one point, or even a countable number of discrete points, is a set of Lebesgue
measure zero, which means it has no length. It only makes sense to talk about
finding a particle inside an interval such as [𝑎, 𝑏] with 𝑎 ≠ 𝑏, which has nonzero
Lebesgue measure and thus nonzero length.
Also, instead of the probabilities summing to 1, we must demand that the integral
of the probability densities over the entire real line evaluates to 1:
∫ |𝜓 (𝑡, 𝑥)| d𝑥 = 1. (5.159)
This makes sense, because there is 100% probability to find the particle some
where on the real line, that is, inside the interval (−∞, +∞).
Using the wavefunction 𝜓 (𝑡, 𝑥) = ⟨𝑥 | Ψ (𝑡)⟩, we can rewrite equation (5.155) as
|Ψ (𝑡)⟩ = ∫ 𝜓 (𝑡, 𝑥) |𝑥⟩ d𝑥. (5.160)
If we are given a state |Ψ (𝑡)⟩, we can use equation (5.156) to convert it to a
wavefunction, and conversely, if we are given a wavefunction 𝜓 (𝑡, 𝑥), we can use
equation (5.160) to convert it to a state. This is, of course, a consequence of
the wavefunction being a representation of the state in a specific basis. For this
reason, you will sometimes hear the term “wavefunction” used as a synonym for
“state”; for systems where a wavefunction description exists, such as a quantized
particle, these two descriptions are equivalent.
However, it should be noted that wavefunctions are not fundamental entities in
modern quantum theory. The fundamental entities are the states, since any quan
tum system has states, but only some systems have wavefunctions. For example,
there is no wavefunction for a qubit, since there are no continuous variables with
respect to which the wavefunction can be defined67 . Even for systems that do
have wavefunctions, the description using states is more general, since a state is
independent of a basis, while a wavefunction is only defined in a particular basis.
Next, recall the completeness relation (3.81):
∑ |𝐵𝑖 ⟩⟨𝐵𝑖 | = 1. (5.161)
This relation allows us to define an explicit inner product between states on our
This is why, in the discussion of the Measurement Axiom, I used the term “collapse” rather than
the more popular “wavefunction collapse”. Qubits also collapse, but they do not have wavefunc
infinitedimensional Hilbert space as follows:
⟨Ψ (𝑡) |Φ (𝑡′ )⟩ = ⟨Ψ (𝑡) | (∫ |𝑥⟩ ⟨𝑥| d𝑥) |Φ (𝑡′ )⟩
=∫ ⟨Ψ (𝑡) |𝑥⟩⟨𝑥|Φ (𝑡′ )⟩d𝑥
=∫ 𝜓∗ (𝑡, 𝑥) 𝜙 (𝑡′ , 𝑥) d𝑥,
where 𝜓∗ (𝑡, 𝑥) ≡ ⟨Ψ (𝑡) |𝑥⟩ is the complex conjugate of the wavefunction for |Ψ (𝑡)⟩
defined in equation (5.156) (since as usual, switching the order of states in the
inner product turns it into its complex conjugate), and 𝜙 (𝑡′ , 𝑥) ≡ ⟨𝑥|Φ (𝑡′ )⟩ is the
wavefunction for the state |Φ (𝑡′ )⟩.
This is really nothing more than the familiar inner product we defined all the way
back in section 3.2.2, except instead of summing on the components of a vector,
we are integrating on the values of a function! The vector in the discrete case
was the representation of the state in a particular basis (such as the standard
basis), while the function in the continuous case is also the representation of the
state in a particular basis, in this case the position basis.
Now we can see that the normalization condition in equation (5.159) simply says
that the norm of a state has to be 1, as usual:
‖Ψ (𝑡)‖ ≡ √⟨Ψ (𝑡) |Ψ (𝑡)⟩ = √∫ |𝜓 (𝑡, 𝑥)| d𝑥 = 1. (5.164)
Problem 5.20. The expectation value of the position, given that the state of the
system is |Ψ (𝑡)⟩, is defined as usual by
By inserting the completeness relation (5.163), show that, in terms of the wave
function 𝜓 (𝑡, 𝑥), the expectation value of 𝑥 is
⟨𝑥⟩ = ∫ 𝑥 |𝜓 (𝑡, 𝑥)| d𝑥. (5.166)
As a corollary, show that
Find a value of 𝐴 for which the wavefunction is properly normalized, that is, equa
tion (5.159) is satisfied. Then, calculate the expectation value ⟨𝑥⟩ for this wave
Everything that we discussed in the previous two sections also applies to the
momentum operator and its eigenstates – simply replace 𝑥 with 𝑝. This also
includes the wavefunction, which can be represented in the momentum basis as
Now, let us recall that in section 5.2 we found out that the unitary operator re
sponsible for shifts in time can be written as the exponential of the Hamiltonian.
This can be written in slightly different notation
From this relation, we derived the Schrödinger equation (5.52), which tells us
that the Hamiltonian – the Hermitian operator corresponding to energy – acts on
states as a time derivative:
𝐻 |Ψ (𝑡)⟩ = i |Ψ (𝑡)⟩ . (5.173)
Since the energy is just the momentum in the time direction, we expect, in anal
ogy, that the momentum operator will act on states as a derivative with respect to
position, and that its exponential will translate states in space. However, here
we encounter a complication: in nonrelativistic quantum mechanics, time is con
sidered to be just a label on the states |Ψ (𝑡)⟩, while position is the eigenvalue
of the position operator68 . Due to this complication, we won’t give the derivation
here, but simply state the result:
𝜕 𝜕
̂ (𝑡)⟩ = − i ⟨𝑥 | Ψ (𝑡)⟩ = − i 𝜓 (𝑡, 𝑥) . (5.174)
𝜕𝑥 𝜕𝑥
This means that the representation of the momentum operator in the position
basis is given by the derivative with respect to position (times − i, which is a con
vention). This result will be very useful in section 5.6, when we discuss solution
to the Schrödinger equation. Equation equation (5.174) is often written simply
𝑝̂ = − i , (5.175)
but this actually incorrect (or at the very least, serious abuse of notation), since
the momentum operator is an abstract operator, and only becomes a derivative
when represented in the position basis!
By exponentiating the momentum operator, we get the translation operator e− i 𝑝𝑎̂ ,
a unitary operator (as it has to be, since it must preserve norms) which translates
position eigenstates a distance 𝑎 in space:
By taking the adjoint of this expression and acting on a state |Ψ (𝑡)⟩, we get
Therefore, the translation operator translates not only position eigenstates but
also wavefunctions.
Problem 5.23. Calculate the expectation value of the momentum, ⟨𝑝⟩, given that
the state of the system is |Ψ (𝑡)⟩, in terms of the wavefunction 𝜓 (𝑡, 𝑥).
Let us consider the doubleslit experiment, which we discussed all the way back
in section 2.1.3. Schematically, the particle’s state can be described as a super
position of passing through slit 𝐴 and passing through slit 𝐵:
2 2
|Ψ⟩ = 𝑎 |Ψ𝐴 ⟩ + 𝑏 |Ψ𝐵 ⟩ , |𝑎| + |𝑏| = 1. (5.178)
This is, in fact, a big problem when trying to combine quantum mechanics with special relativity,
since relativity merges space and time into a 4dimensional spacetime, and this means space and
time must be treated on equal footing. However, we won’t go into that here. See also footnote (58).
We suppress the time dependence here, for brevity. The probability amplitude to
measure the particle at the position 𝑥 is given by
The probability density is then, as usual, the magnitude squared of the amplitude:
2 2
|𝜓 (𝑥)| = |𝑎𝜓𝐴 (𝑥) + 𝑏𝜓𝐵 (𝑥)|
= (𝑎∗ 𝜓𝐴
(𝑥) + 𝑏∗ 𝜓𝐵
(𝑥)) (𝑎𝜓𝐴 (𝑥) + 𝑏𝜓𝐵 (𝑥))
= 𝑎∗ 𝑎𝜓𝐴
(𝑥) 𝜓𝐴 (𝑥) + 𝑏∗ 𝑏𝜓𝐵
(𝑥) 𝜓𝐵 (𝑥) + 𝑎∗ 𝑏𝜓𝐴
(𝑥) 𝜓𝐵 (𝑥) + 𝑏∗ 𝑎𝜓𝐵
(𝑥) 𝜓𝐴 (𝑥)
2 2 2 2
= |𝑎| |𝜓𝐴 (𝑥)| + |𝑏| |𝜓𝐵 (𝑥)| + 2 Re (𝑎∗ 𝑏𝜓𝐴
(𝑥) 𝜓𝐵 (𝑥)) .
2 2 2 2
The terms |𝑎| |𝜓𝐴 (𝑥)| and |𝑏| |𝜓𝐵 (𝑥)| are always positive, for any 𝑥. However,
the third term 2 Re (𝑎∗ 𝑏𝜓𝐴∗
(𝑥) 𝜓𝐵 (𝑥)), called the interference term or sometimes
the cross term (because it “crosses” 𝜓𝐴 and 𝜓𝐵 ), is a real number which can be
either positive or negative, depending on the specific values of 𝑎 and 𝑏, as well as
the specific position 𝑥 in which 𝜓𝐴 (𝑥) and 𝜓𝐵 (𝑥) are calculated.
The interference term will either increase or decrease the probability to find the
particle at 𝑥. If it increases the probability, this is constructive interference, and
it if decreases the probability, this is destructive interference. This is precisely
what is responsible for the interference pattern in the doubleslit experiment,
illustrated in figure 2.5; for different values of 𝑥, there will be different amounts
of constructive and destructive interference.
i |Ψ (𝑡)⟩ = 𝐻 |Ψ (𝑡)⟩ . (5.180)
For a particle, we have the Hamiltonian (5.69):
𝐻= + 𝑉 (𝑥) . (5.181)
Therefore, the Schrödinger equation becomes
d 𝑝̂ 2
i |Ψ (𝑡)⟩ = ( + 𝑉 (𝑥))
̂ |Ψ (𝑡)⟩ , (5.182)
d𝑡 2𝑚
where we promoted the position and momentum to operators. To find the rep
resentation of this equation in the position basis, we multiply by ⟨𝑥| from the
d 𝑝̂2
⟨𝑥| i |Ψ (𝑡)⟩ = ⟨𝑥| ( + 𝑉 (𝑥))
̂ |Ψ (𝑡)⟩ . (5.183)
d𝑡 2𝑚
On the lefthand side, since the position eigenstate |𝑥⟩ is independent of time, we
can move the time derivative out of the inner product:
d d d
⟨𝑥| i |Ψ (𝑡)⟩ = i ⟨𝑥 | Ψ (𝑡)⟩ = i 𝜓 (𝑡, 𝑥) . (5.184)
d𝑡 d𝑡 d𝑡
On the righthand side, since in the position representation we have
𝑝̂ = − i , (5.185)
the first term will be
𝑝̂2 1 𝜕
⟨𝑥| |Ψ (𝑡)⟩ = (− i ) 𝜓 (𝑡, 𝑥)
2𝑚 2𝑚 𝜕𝑥
1 𝜕 𝜕
= (− i ) (− i ) 𝜓 (𝑡, 𝑥)
2𝑚 𝜕𝑥 𝜕𝑥
1 𝜕2
=− 𝜓 (𝑡, 𝑥) .
2𝑚 𝜕𝑥2
As for the second term, in problem 5.21 you showed that
In total, we get:
d 1 𝜕2
i 𝜓 (𝑡, 𝑥) = (− + 𝑉 (𝑥)) 𝜓 (𝑡, 𝑥) . (5.187)
d𝑡 2𝑚 𝜕𝑥2
This is the Schrödinger equation in the position basis. It is a concrete differential
equation that one can solve for a variety of different potentials 𝑉 (𝑥).
and in problem 5.23, you calculated ⟨𝑝⟩. Using equation (5.187), show that
d ⟨𝑥⟩
⟨𝑝⟩ = 𝑚 . (5.189)
You will have to use integration by parts, and assume69 that 𝜓 (𝑡, 𝑥) → 0 as 𝑥 → ±∞.
This is pretty much always assumed to be true about wavefunctions in quantum mechanics.
This shows that the expectation values of the position and momentum in the
quantum theory satisfy the same relation as the position and momentum in the
classical theory. Similarly, show that
d ⟨𝑝⟩ 𝜕𝑉 (𝑥)
= −⟨ ⟩, (5.190)
d𝑡 𝜕𝑥
They don’t depend on 𝑡, since we are assuming the Hamiltonian doesn’t depend
on 𝑡 either, and energy is constant. Show that (for a point particle with mass 𝑚)
these wavefunctions satisfy the equation
1 𝜕2
(− + 𝑉 (𝑥)) 𝜓𝑖 (𝑥) = 𝐸𝑖 𝜓𝑖 (𝑥) . (5.193)
2𝑚 𝜕𝑥2
Let us assume that the wavefunction can be separated into a part which depends
only on 𝑥 and a part which depends only on 𝑡:
then the righthand side would have to be a function of 𝑡 also, in contradiction with
our assumption that it only depends on 𝑥. This is called separation of variables.
Let 𝐸𝑖 be the constant that both sides are equal to. Then we get two equations.
The first equation will just be the eigenvalue equation (5.193), which therefore
implies that 𝐸𝑖 is the energy (and thus must be real). The other equation will be
= − i 𝐸𝑖 𝜓𝑡 . (5.196)
Recalling equation (5.9), we see that the solution to equation (5.196) is simply
𝜓𝑡 = e− i 𝐸𝑖 𝑡 . (5.197)
These are called stationary states. Since these states are energy eigenstates,
they have a welldefined energy 𝐸𝑖 .
As it turns out, since the Schrödinger equation is linear, the most general solu
tion to the equation is a linear combination of stationary states:
where 𝛼𝑖 ∈ ℂ are constant coefficients and 𝐸𝑖 are all the possible energy eigen
states, of which there can be infinitely many. Of course, this is nothing other
than a superposition of energy eigenstates, represented in the position basis,
and therefore the coefficients 𝛼𝑖 are none other than the probability amplitudes
to measure each energy 𝐸𝑖 given the state |Ψ (𝑡)⟩.
In other words, the general solution to the Schrödinger equation simply amounts
to writing the state of the system as a superposition with respect to the eigenbasis
of a particular observable – the Hamiltonian. With the time dependence out of
the way, all that remains is to solve the timeindependent Schrödinger equation
(5.193) for 𝜓𝑖 , and find the coefficients 𝛼𝑖 . The solution will depend on the explicit
form of the potential 𝑉 (𝑥). However, this is, of course, the hard part! Thousands
upon thousands of pages have been written in the last 100 years or so about
solutions (or even just approximations of solutions) to the Schrödinger equation
for all kinds of different potentials.
Unfortunately, our course has come to an end, and we won’t have time to work
out any specific solutions. The focus of this course has been on developing deep
intuition and conceptual understanding of quantum theory, as it is formulated in
modern 21stcentury theoretical physics. For this reason, we spent the vast ma
jority of the course developing the entire mathematical framework of the theory
from scratch, highlighting and debunking common misconceptions, focusing on
concepts and their meaning rather than calculations, and giving examples from
discrete systems, where the math is simple, so we could concentrate our efforts
on understanding the physics without being bogged down by the math.
Still, solving the Schrödinger equation is something every physicist should know
how to do, and in the final project of the course, presented in problem 5.28, you
will find the solutions corresponding to two simple potentials, related to scattering
and tunneling of particles in one dimension.
Problem 5.26. Show that the probability density of a stationary state, as well
as the expectation value of any observable 𝐴 with respect to that state, are inde
pendent of 𝑡.
What is the wavefunction 𝜓 (𝑡, 𝑥) at some other time 𝑡, and what is the correspond
ing probability density?
Problem 5.28. (Final project) You should now have all the tools needed to
solve the Schrödinger equation for particular potentials. Solve it for the following
two simple potentials:
⎧0 𝑥 < −𝑎,
𝑉 (𝑥) = ⎨−𝑉0 −𝑎 < 𝑥 < 𝑎, (5.201)
⎩ 𝑥 > 𝑎.
⎧0 𝑥 < −𝑎,
𝑉 (𝑥) = +𝑉0 −𝑎 < 𝑥 < 𝑎, (5.202)
⎩ 𝑥 > 𝑎.
In both cases, 𝑎 and 𝑉0 are two positive numbers. Make nice plots of the potentials
and the wavefunctions. You are allowed, and even encouraged, to make use of
textbooks and online resources; however, you should write the solutions in your
own words and summarize what you learned from the results. You are also
encouraged to collaborate with classmates on this project.
4momentum, 137 Braket notation, 23
689599.7 rule, 55
Canonical commutation relation, 142
Absolute value, 21 Canonical coordinates, 138
Addition of vectors, 23 Canonical quantization, 142
Adjoint, 30 CauchySchwarz inequality, 47
Algebraically closed field, 17 Central limit theorem, 55
AND gate, 101 Changeofbasis matrix, 36
Angular frequency, 143 CHSH inequality, 91
Annihilation operator, 145, 148 Classical bit, 73
Anticommutation relation, 73 Classical gate
Anticommutator, 73 AND, 101
AntiHermitian operator, 97 NOT, 101
Associative operation, 24 OR, 102
Axioms of quantum mechanics XOR, 102
Composite System Axiom, 77 Classical harmonic oscillator, 143
Evolution Axiom, 100 Classical limit, 14
Measurement Axiom (Projective), 106 Classical logic gate, 101
Measurement Axiom (Simplified), 110 Closed operation, 23, 24
Observable Axiom, 61 CNOT gate, 104
Operator Axiom, 60 Collapse, 106
Probability Axiom, 61 Collapse models, 114
State Axiom, 60 Commutation relation
System Axiom, 59 Canonical, 142
Of spin matrices, 72
Base of a logarithm, 127 Commutative operation, 24
Bell inequality, 91 Commutator, 72, 93
Bell states, 87 Complete metric space, 26
Bell’s theorem, 89 Completeness relation, 34
Bijection, 61 Infinitedimensional case, 153
Binomial coefficients, 128 Complex 𝑛vector, 23
Binomial theorem, 128 Complex conjugation, 19
Bit Complex numbers, 17
Classical, 73 Complex phase, 22
Quantum, 73 Complex plane, 19
Black body, 6 Complex vector space, 25
Boltzmann constant, 59 Composite system, 77
Born rule, 61 Composite System Axiom, 77
Bra, 26 Composition property, 133
Computational basis, 73 Energy, 136
Conditional probability, 51 Kinetic, 138
Conjugatesymmetric inner product, 27 Potential, 138
Constructive interference, 11, 157 Energy eigenstates, 137
ControlledNOT gate, 104 Entangled state, 84
Coordinates EPR states, 87
Canonical, 138 Equivalence class of vectors, 60
Coordinates of a matrix in a basis, 44 Euler’s formula, 22, 127
Coordinates of a vector in a basis, 35 Even permutation, 73
Infinitedimensional case, 152 Everett interpretation, 112
Copenhagen interpretation, 111 Evolution Axiom, 100
Copying operator, 117 Excited state, 149
Coulomb constant, 59 Exclusive OR, 102
COVID19, 51 Expected (or expectation) value, 52
Creation operator, 145, 148 Of a quantum observable, 67
Cross terms, 157 Exponent, 125
Dagger, 30 Of a matrix, 128
De Broglie–Bohm theory, 76, 92 Timeordered, 135
And the measurement problem, 113 Exponential function, 125
Decoherence time, 116
Degenerate eigenvectors, 105 Fair coin or die, 49
Destructive interference, 11, 157 Field (algebra), 16
Determinant, 86 Finestructure constant, 58
Determinism, 89 Finite square barrier, 161
Diagonal matrix, 45 Finite square well, 161
Diagonalizable matrix, 45 First excited state, 149
Dimensionful constants, 58 Frequency, 143
Dimensionless constants, 58 Fundamental theorem of algebra, 18
Dirac delta function, 150
Dirac notation, 23 Gate
Discriminant, 17 Classical AND, 101
Distribution (generalized function), 150 Classical NOT, 101
Distributive operation, 24 Classical OR, 102
Doubleslit experiment, 8, 156 Classical XOR, 102
Dual vector, 25 Quantum CNOT, 104
Quantum Hadamard, 103
Eigenspace, 105 Quantum NOT (X), 102
Eigenstates, 61 Quantum Z, 103
Eigenvalue, 39 Gaussian distribution, 55
Eigenvector, 39 Generalized momentum, 140
Degenerate, 105 Gluon, 143
Gram–Schmidt process, 47 Ket, 26
Gravitational constant, 59 Kinetic energy, 138
Ground state, 137, 149 Kronecker delta, 28, 150
GRW model, 114
Ladder operators, 145, 148
Hadamard gate, 103 Lebesgue measure, 152
Hamilton’s equations, 139 LeviCivita symbol, 72
Hamiltonian Linear combination, 27
Classical, 138 Linear inner product, 27
Quantum, 132, 134 Linearly independent, 28
Timeindependent, 135 Loaded coin or die, 49
Hat notation for operators, 150 Local hidden variable theories, 89
Heaviside step function, 151 Local realism, 90
Hermitian matrix, 40 Locality, 90
Hermitian operator, 61 Logarithm, 127
Hidden variable theories, 76 Of a matrix, 129
And the measurement problem, 113 Logic gate
Local, 89 Classical, 101
Nonlocal, 76, 92 Quantum, 102
Higgs boson, 143 Loop quantum gravity, 59
Hilbert space, 26
Magnitude of a complex number, 21
Identity matrix, 31 Manyworlds interpretation, 112
Identity scalar, 25 Matrices inside inner products, 38
Identity vector, 24 Matrix, 30
Imaginary number, 18 Matrix anticommutator, 73
Imaginary unit, 17 Matrix commutator, 72, 93
Inner product, 26 Matrix determinant, 86
Infinitedimensional case, 153 Matrix exponential, 128
Integration constants, 144 Matrix logarithm, 129
Interference term, 157 Matrix product, 37
Interpretations of quantum mechanics, Mean, 52
111 Measurement Axiom (Projective), 106
Inverse matrix, 37 Measurement Axiom (Simplified), 110
Inverse vector, 24 Measurement problem, 111
Invertible matrix, 37 Model, 124
Involution, 19 Momentum eigenstates, 155
Involutory matrix, 69 Momentum operator, 155
Isomorphism, 21 Momentum space, 138
Multiplication of vector by scalar, 23
Jacobi identity, 94, 141
Joint probability, 50 Natural logarithm, 127
Newton’s second law, 140
No collapse, 112 Planck units, 59
Nocloning theorem, 116 Planck’s law, 7
Nocommunication theorem, 89 Poisson brackets, 139
Nodeleting theorem, 118 Polar coordinates, 22
Nonlocal hidden variable theories, 76, Polarization, 74
92 Position eigenstate, 150
Norm, 26 Position operator, 150
Infinitedimensional case, 154 Positionmomentum uncertainty relation,
Normal distribution, 55 94
Normal matrix, 43 Positivedefinite inner product, 27
Normalizing a vector, 29 Potential energy, 138
NOT (X) gate, 102 Power series, 125
NOT gate Probability, 49
Classical, 101 Probability amplitude, 61
Quantum, 102 Probability Axiom, 61
Number operator, 146 Probability density, 152
Probability distribution, 49
Observable, 61 Projection, 64
Observable Axiom, 61 Projective measurements, 106
Occam’s razor, 114 Projector, 105
Odd permutation, 73 Promotion of operators, 142
Operator, 60
Operator anticommutator, 73 Quanta, 7
Operator Axiom, 60 Quantization, 12, 138, 142
Operator commutator, 72, 93 Canonical, 142
Operator exponential, 130 Path integral, 142
OR gate, 102 Quantum bit, 73
Orthogonal, 28 Quantum computer, 13
Orthonormal basis, 27 Quantum decoherence, 116
Orthonormal eigenbasis, 41 Quantum electromagnetic field, 12
Orthonormal vectors, 28 Quantum entanglement, 15
Outer product, 32 Quantum excitation, 12
Outer product representation, 44 Quantum field theory, 12, 143, 149
Quantum gate
Path integral quantization, 142 CNOT, 104
Pauli matrices, 69 Hadamard, 103
Permutation, 73 NOT (X), 102
Phase space, 138 Z, 103
Photoelectric effect, 7 Quantum gravity, 14, 124, 137
Photons, 7 Quantum harmonic oscillator, 143
Planck constant, 59 Quantum logic gate, 102
Planck length, 59 Quantum observable, 61
Quantum operator, 60 Stationary states, 160
Quantum state, 60 SternGerlach experiment, 12
Quantum system, 59 Strong nuclear force, 143
Quantum teleportation, 118 Superposition, 15, 63
Quark, 143 Meaning of, 74
Qubit, 73 System, 59
Qubits, 13 System Axiom, 59
Zero vector, 24
Zeropoint energy, 149