Physics of Light and Optics PDF
Physics of Light and Optics PDF
Physics of Light and Optics PDF
Justin Peatross
Michael Ware
Brigham Young University
2011c Edition
March 2, 2012
Preface
This curriculum was originally developed for a senior-level optics course in the
Department of Physics and Astronomy at Brigham Young University. Topics
are addressed from a physics perspective and include the propagation of light in
matter, reflection and transmission at boundaries, polarization effects, dispersion,
coherence, ray optics and imaging, diffraction, and the quantum nature of light.
Students using this book should be familiar with differentiation, integration, and
standard trigonometric and algebraic manipulation. A brief review of complex
numbers, vector calculus, and Fourier transforms is provided in Chapter 0, but it
is helpful if students already have some experience with these concepts.
While the authors retain the copyright, we have made this book available free
of charge at optics.byu.edu. This is our contribution toward a future world with
free textbooks! The web site also provides a link to purchase bound copies of the
book for the cost of printing. A collection of electronic material related to the
text is available at the same site, including videos of students performing the lab
assignments found in the book.
The development of optics has a rich history. We have included historical
sketches for a selection of the pioneers in the field to help students appreciate
some of this historical context. These sketches are not intended to be authorita-
tive, the information for most individuals having been gleaned primarily from
Wikipedia.
The authors may be contacted at opticsbook@byu.edu. We enjoy hearing
reports from those using the book and welcome constructive feedback. We occa-
sionally revise the text. The title page indicates the date of the last revision.
We would like to thank all those who have helped improve this material. We
especially thank John Colton, Bret Hess, and Harold Stokes for their careful review
and extensive suggestions. This curriculum benefitted from a CCLI grant from
the National Science Foundation Division of Undergraduate Education (DUE-
9952773).
iii
Contents
Preface iii
Table of Contents v
0 Mathematical Tools 1
0.1 Vector Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.2 Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
0.3 Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
0.4 Fourier Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Appendix 0.A Table of Integrals and Sums . . . . . . . . . . . . . . . . 20
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1 Electromagnetic Phenomena 27
1.1 Gauss’ Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.2 Gauss’ Law for Magnetic Fields . . . . . . . . . . . . . . . . . . . . 29
1.3 Faraday’s Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.4 Ampere’s Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.5 Maxwell’s Adjustment to Ampere’s Law . . . . . . . . . . . . . . . . 33
1.6 Polarization of Materials . . . . . . . . . . . . . . . . . . . . . . . . 36
1.7 The Wave Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
v
vi CONTENTS
10 Diffraction 257
10.1 Huygens’ Principle as Formulated by Fresnel . . . . . . . . . . . . 258
10.2 Scalar Diffraction Theory . . . . . . . . . . . . . . . . . . . . . . . . 260
10.3 Fresnel Approximation . . . . . . . . . . . . . . . . . . . . . . . . . 262
10.4 Fraunhofer Approximation . . . . . . . . . . . . . . . . . . . . . . . 264
10.5 Diffraction with Cylindrical Symmetry . . . . . . . . . . . . . . . . 265
Appendix 10.A Fresnel-Kirchhoff Diffraction Formula . . . . . . . . . . 267
Appendix 10.B Green’s Theorem . . . . . . . . . . . . . . . . . . . . . . . 270
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
viii CONTENTS
Index 331
Mathematical Tools
Our study of optics begins with Maxwell’s equations in Chapter 1. Before we start,
look over this chapter to make sure you are comfortable with the mathematical
tools we’ll be using. The vector calculus material in section 0.1 will be used
beginning in Chapter 1, so you should review it now. In Section 0.2 we review
complex numbers. You have probably had some exposure to complex numbers,
but if you are like many students, you haven’t yet fully appreciated their usefulness.
Please be warned that your life will be much easier if you understand the material
in section 0.2 by heart. Complex notation is pervasive throughout the book,
beginning in chapter 2.
You may safely procrastinate reviewing Sections 0.3 and 0.4 until they come
up in the book. The linear algebra refresher in Section 0.3 is useful for Chapter 4,
where we analyze multilayer coatings, and again in Chapter 6, where we discuss
polarization. Section 0.4 provides an introduction to Fourier theory. Fourier trans-
forms are used extensively in optics, and you should study Section 0.4 carefully René Descartes (1596-1650, French)
was born in in La Haye en Touraine
before tackling Chapter 7. (now Descartes), France. His mother
died when he was an infant. His father
was a member of parliament who en-
couraged Descartes to become a lawyer.
0.1 Vector Calculus Descartes graduated with a degree in
law from the University of Poitiers
Each position in space corresponds to a unique vector r ≡ x x̂ + y ŷ + z ẑ, where in 1616. In 1619, he had a series of
dreams that led him to believe that he
x̂, ŷ, and ẑ are unit vectors with length one, pointing along their respective axes. should instead pursue science. Descartes
Boldface type distinguishes a variable as a vector quantity, and the use of x̂, ŷ, became one of the greatest mathemati-
cians, physicists, and philosophers of
and ẑ denotes a Cartesian coordinate system. Electric and magnetic fields are
all time. He is credited with inventing
vectors whose magnitude and direction can depend on position, as denoted by the cartesian coordinate system, which
1
2 Chapter 0 Mathematical Tools
Example 0.1
¡ ¢
Compute
¡ the electric
¢ field at r = 2x̂ + 2ŷ + 2ẑ Å due to an charge q positioned at
r0 = 1x̂ + 1ŷ + 2ẑ Å.
We have ¡ ¢ ¡ ¢
r − r0 = (2 − 1)x̂ + (2 − 1)ŷ + (2 − 2)ẑ Å = 1x̂ + 1ŷ Å
and p p
|r − r0 | = (1)2 + (1)2 Å = 2Å
The electric field is then ¡ ¢
q 1x̂ + 1ŷ Å
E=− ¡p ¢3
4π²0 2 Å
Consider the plane that contains the two vectors k and r. Call it the x 0 y 0 -plane. In
this coordinate system, the two vectors can be written as k = k cos θx̂0 +k sin θŷ0 and
r = r cos αx̂0 +r sin αŷ0 , where θ and α are the respective angles that the two vectors
make with the x 0 -axis. The dot product gives k · r = kr (cos θ cos α + sin θ sin α).
This simplifies to k · r = kr cos φ (see (0.13)), where φ ≡ θ − α is the angle between
the vectors. Thus, the dot product between two vectors is the product of the
magnitudes of each vector times the cosine of the angle between them.
Note that the cross product results in a vector, whereas the dot product mentioned
above results in a scalar (i.e. a number with appropriate units). The resultant
vector is always perpendicular to the two vectors that are cross multiplied. If the
fingers on your right hand curl from the first vector towards the second, your
thumb will point in the direction of the result. The magnitude of the result equals
the product of the magnitudes of the constituent vectors times the sine of the
angle between them.
We label the plane containing E and B the x 0 y 0 -plane. In this coordinate system, the
two vectors can be written as E = E cos θx̂0 + E sin θŷ0 and B = B cos αx̂0 + B sin αŷ0 ,
where θ and α are the respective angles that the two vectors make with the x 0 -axis.
The cross product, according to (0.3), gives E × B = E B (cos θ sin α − sin θ cos α)ẑ0 .
This simplifies to E × B = E B sin φẑ0 (see (0.14)), where φ ≡ θ − α is the angle be-
tween the vectors. The vectors E and B, which both lie in the x 0 y 0 -plane, are both
perpendicular to z 0 . If 0 < θ − α < π, the result E × B points in the positive z 0
direction, which is consistent with the right-hand rule.
¡ ¢ ∂f ∂f ∂f
∇ f x, y, z = x̂ + ŷ + ẑ (0.4)
∂x ∂y ∂z
∂E x ∂E y ∂E z
∇·E = + + (0.5)
∂x ∂y ∂z
Example 0.2
Derive the gradient (0.4) in cylindrical coordinates defined by the transformations
x = ρ cos φ and y = ρ sin φ. (The coordinate z remains unchanged.)
2 See M. R. Spiegel, Schaum’s Outline of Advanced Mathematics for Engineers and Scientists, pp.
Solution: By inspection of Fig. 0.2, the cartesian unit vectors may be expressed as
In accordance with the rules of calculus, the needed partial derivatives expressed
in terms of the new variables are
∂ ∂ρ ∂ ∂φ ∂ ∂ ∂ρ ∂ ∂φ ∂
µ ¶ µ ¶ µ ¶ µ ¶
= + and = +
∂x ∂x ∂ρ ∂x ∂φ ∂y ∂y ∂ρ ∂y ∂φ
∂2 E y ∂2 E y ∂2 E y
à !
Laplacian dierential operator as well as µ 2
∂ E x ∂2 E x ∂2 E x
¶
Laplace transforms are used widely in 2
∇ E= + + x̂ + + + ŷ
applied mathematics. (Wikipedia) ∂x 2 ∂y 2 ∂z 2 ∂x 2 ∂y 2 ∂z 2
µ 2 (0.9)
∂ E z ∂2 E z ∂2 E z
¶
+ + + ẑ
∂x 2 ∂y 2 ∂z 2
0.1 Vector Calculus 5
∇2 E ≡ ∇(∇ · E) − ∇ × (∇ × E) (0.10)
∂ ∂E y ∂E x ∂ ∂E z ∂E x ∂ ∂E y ∂E x ∂ ∂E z ∂E y
· µ ¶ µ ¶¸ · µ ¶ µ ¶¸
= − + − x̂ − − − − ŷ
∂y ∂x ∂y ∂z ∂x ∂z ∂x ∂x ∂y ∂z ∂y ∂z
∂ ∂E z ∂E x ∂ ∂E z ∂E y
· µ ¶ µ ¶¸
+ − − − − ẑ
∂x ∂x ∂z ∂y ∂y ∂z
∂2 E x ∂2 E y
x̂ + ∂y 2 ŷ + ∂∂zE2z ẑ and then rearranging, we
2
After adding and subtracting ∂x 2
get
∂2 E y ∂2 E z 2
∂2 E x ∂ E y ∂2 E z
2
∂2 E x ∂ E y ∂2 E z
" # " # " #
∂2 E x
∇ × (∇ × E) = + + x̂ + + + ŷ + + + ẑ
∂x 2 ∂x∂y ∂x∂z ∂x∂y ∂y 2 ∂y∂z ∂x∂z ∂y∂z ∂z 2
" 2
∂ E y ∂2 E y ∂2 E y
" # # " #
∂2 E x ∂2 E x ∂2 E x ∂2 E z ∂2 E z ∂2 E z
− + + x̂ − + + ŷ − + + ẑ
∂x 2 ∂y 2 ∂z 2 ∂x 2 ∂y 2 ∂z 2 ∂x 2 ∂y 2 ∂z 2
Schaum’s Outline of Advanced Mathematics for Engineers and Scientists, p. 154 (New York: McGraw-
Hill 1971).
6 Chapter 0 Mathematical Tools
The integration on the left-hand side is over the closed surface S, which contains
the volume V associated with the integration on the right-hand side. The unit
vector n̂ points outward, normal to the surface. The divergence theorem is espe-
cially useful in connection with Gauss’ law, where the left-hand side is interpreted
as the number of field lines exiting a closed surface.
Example 0.3
2
¡ ¢
Check the divergence theorem (0.11) for the vector function F x, y, z = ¯ y¯ x̂+x y ŷ+
2
x z ẑ. Take as the volume a cube contained by the six planes |x| = ±1, ¯ y ¯ = ±1, and
|z| = ±1.
Solution: First, we evaluate the left side of (0.11) for the function:
I Z1 Z1 Z1 Z1 Z1 Z1
d xd y x 2 z z=1 − d xd y x 2 z z=−1 +
¡ ¢ ¡ ¢ ¡ ¢
F · n̂d a = d xd z x y y=1
Figure 0.3 The function F (red S −1 −1 −1 −1 −1 −1
arrows) plotted for several points Z1 Z1 Z1 Z1 Z1 Z1
on the surface S. −
¡ ¢
d xd z x y y=−1 + d yd z y 2 x=1 −
¡ ¢
d yd z y 2 x=−1
¡ ¢
−1 −1 −1 −1 −1 −1
Z1 Z1 Z1 Z1 3 ¯¯1 ¯1
x ¯ x 2 ¯¯ 8
=2 d xd y x 2 + 2 d xd zx = 4 + 4 =
3 −1
¯ 2 −1 3
¯
−1 −1 −1 −1
Z1 Z1 Z1 Z1 · 2 ¸1
x x3 8
Z
d xd yd z x + x 2 = 4 d x x + x2 = 4
£ ¤ £ ¤
∇ · Fd v = + =
2 3 −1 3
V −1 −1 −1 −1
The integration on the left-hand side is over an open surface S (not enclosing a
volume). The integration on the right-hand side is around the edge of the surface.
Again, n̂ is a unit vector that always points normal to the surface . The vector d `
points along the curve C that bounds the surface S. If the fingers of your right
hand point in the direction of integration around C , then your thumb points
in the direction of n̂. Stokes’ theorem is especially useful in connection with
Ampere’s law and Faraday’s law. The right-hand side is an integration of a field
around a loop.
hyperbolic cosines and hyperbolic sines. If φ happens to be imaginary such that the Russian Academy were given con-
siderable freedom to pursue scientic
φ = i γ where γ is real, then we have questions with relatively light teaching
duties. Euler spent his early career in
e −γ − e γ Russia, his mid career in Berlin, and
sin i γ = = i sinh γ his later career again in Russia. Euler
2i (0.17)
e −γ + e γ
introduced the concept of a function.
By expanding each function appearing in (0.15) in a Taylor’s series about the origin rina Gsell, were the parents of 13 chil-
dren, many of whom died in childhood.
we obtain
(Wikipedia)
φ2 φ4
cos φ = 1 − + −···
2! 4!
φ3 φ5
i sin φ = i φ − i +i −··· (0.19)
3! 5!
φ2 φ3 φ4 φ5
eiφ = 1 + i φ − −i + +i −···
2! 3! 4! 5!
8 Chapter 0 Mathematical Tools
The last line of (0.19) is seen to be the sum of the first two lines, from which Euler’s
formula directly follows.
Example 0.4
Prove (0.13) and (0.14) as well as cos2 φ + sin2 φ = 1 by taking advantage of (0.16).
Solution: We start with (0.13). By direct application of (0.16) and some rearranging,
we have
e i α + e −i α e i β + e −i β e i α − e −i α e i β − e −i β
cos α cos β − sin α sin β = −
2 2 2i 2i
Brook Taylor (1685-1731, English) was
born in Middlesex, England. He studied e i (α+β) + e i (α−β) + e −i (α−β) + e −i (α+β)
=
at Cambridge as a fellow-commoner 4
earning a bachelor degree in 1709
and a doctoral degree in 1714. Soon
e i (α+β) − e i (α−β) − e −i (α−β) + e −i (α+β)
+
thereafter, he developed the branch 4
of mathematics known as calculus of i (α+β) −i (α+β)
e +e
= cos α + β
¡ ¢
nite dierences. He used it to study =
the movement of vibrating strings. As 2
part of that work, he developed the for-
mula known today as Taylor's theorem,
We can prove (0.14) using the same technique:
which was under-appreciated until 1772,
when French mathematician Lagrange e i α − e −i α e i β + e −i β e i β − e −i β e i α + e −i α
referred to it as the main foundation of sin α cos β + sin β cos α = +
2i 2 2i 2
dierential calculus. (Wikipedia)
i (α+β) i (α−β) −i (α−β) −i (α+β)
e +e −e −e
=
4i
e i (α+β) − e i (α−β) + e −i (α−β) − e −i (α+β)
+
4i
e i (α+β) − e −i (α+β)
= sin α + β
¡ ¢
=
2i
Euler’s formula, it is possible to transform any complex number a + i b into the University of Pavia and later at Padua.
He was known for being eccentric and
form ρe i φ , where a, b, ρ, and φ are real. From (0.15), the required connection confrontational, which did not earn him
between ρ, φ and (a, b) is
¡ ¢
many friends. He supported himself
in part as a somewhat successful gam-
The real and imaginary parts of this equation must separately be equal. Thus, we (Wikipedia)
have
a = ρ cos φ
(0.24)
b = ρ sin φ
Quadrant I
These equations can be inverted to yield
p II
ρ= a2 + b2
b (0.25)
φ = tan−1 (a > 0)
a
When a < 0, we must adjust φ by π since the arctangent has a range only from III IV
−π/2 to π/2.
The transformations in (0.24) and (0.25) have a clear geometrical interpreta-
tion in the complex plane, and this makes it easier to remember them. They are
Figure 0.4 A number in the com-
just the usual connections between Cartesian and polar coordinates. As seen in plex plane can be represented
Fig. 0.4, ρ is the hypotenuse of a right triangle having legs with lengths a and b, either by Cartesian or polar repre-
and φ is the angle that the hypotenuse makes with the x-axis. Again, you should sentation.
be careful when a is negative since the arctangent is defined in quadrants I and
10 Chapter 0 Mathematical Tools
IV. An easy way to deal with the situation of a negative a is to factor the minus
sign out before proceeding (i.e. a + i b = − (−a − i b) ). Then the transformation
is made on −a − i b where −a is positive. The overall minus sign out in front is
just carried along unaffected and can be factored back in at the end. Notice that
−ρe i φ is the same as ρe i (φ±π) .
Example 0.5
Write −3 + 4i in polar format.
Solution: We must be careful with the negative real part since it indicates a quad-
rant (in this case II) outside of the domain of the inverse tangent (quadrants I and
IV). Best to factor the negative out and deal with it separately.
−1 (−4) −1 4 −1 4
−3 + 4i = −(3 − 4i ) = − 32 + (−4)2 e i tan 3 = e i π 5e −i tan 3 = 5e i (π−tan 3 )
p
Figure 0.5 Geometric representa- Finally, we consider the concept of a complex conjugate. The conjugate of a
tion of −3 + 4i complex number z = a + i b is denoted with an asterisk and amounts to changing
the sign on the imaginary part of the number:
z ∗ = (a + i b)∗ ≡ a − i b (0.26)
The complex conjugate is useful when computing the absolute value of a complex
number: p p p
|z| = z ∗ z = (a − i b) (a + i b) = a 2 + b 2 = ρ (0.27)
Note that the absolute value of a complex number is the same as its magnitude ρ
as defined in (0.25). The complex conjugate is also useful for eliminating complex
numbers from the denominator of expressions:
a + i b (a + i b) (c − i d ) ac + bd + i (bc − ad )
= = (0.28)
c + i d (c + i d ) (c − i d ) c2 + d2
1¡
z + z∗
¢
Re {z} = (0.30)
2
0.3 Linear Algebra 11
Notice that the expression for cos φ in (0.16) is an example of this formula. Some-
times when a lengthy expression is added to its own complex conjugate, we let
“C.C.” represent the complex conjugate in order to avoid writing the expression
twice.
In optics we sometimes encounter a complex angle, , such as K z in (0.29). The
imaginary part of K governs exponential decay (or growth) when a light wave
propagates in an absorptive (or amplifying) medium. Similarly, when we compute
the transmission angle for light incident upon a surface beyond the critical angle
for total internal reflection, we encounter the arcsine of a number greater than
one in an effort to satisfy Snell’s law. Even though such an angle does not exist in
the physical sense, a complex value for the angle can be found, which satisfies
(0.16) and describes evanescent waves.
Ax + B y = F and Cx + Dy = G (0.31)
where x and y are variables. A set of linear equations such as (0.31) can be
expressed using matrix notation as
· ¸· ¸ · ¸ · ¸
A B x Ax + B y F
= = (0.32)
C D y Cx +Dy G
where the right-hand side is called the identity matrix. You can easily check that
the identity matrix leaves unchanged anything that it multiplies, and so (0.33)
12 Chapter 0 Mathematical Tools
simplifies to
· ¸ · ¸−1 · ¸
x A B F
=
y C D G
Once the inverse matrix is found, the matrix multiplication on the right can be
performed and the answers for x and y obtained as the upper and lower elements
of the result.
The inverse of a 2 × 2 matrix is given by
· ¸−1 · ¸
A B 1 D −B
=¯ (0.35)
C D −C A
¯
¯ A B ¯¯
¯
¯ C D ¯
where ¯ ¯
¯ A B ¯¯
¯
¯ C ≡ AD −C B
D ¯
is called the determinant. We can check that (0.35) is correct by direct substitution:
· ¸−1 · ¸ · ¸· ¸
A B A B 1 D −B A B
=
C D C D AD − BC −C A C D
· ¸
1 AD − BC 0
= (0.36)
AD − BC 0 AD − BC
· ¸
1 0
=
0 1
The above review of linear algebra is very basic. In contrast, we next dis-
cuss Sylvester’s theorem, which you probably have not previously encountered.
Sylvester’s theorem is useful when multiplying the same 2 × 2 matrix (with a de-
terminate of unity) together many times (i.e. raising the matrix to a power). This
situation occurs when modeling periodic multilayer mirror coatings or when
considering light rays trapped in a laser cavity as they reflect many times.
Sylvester’s Theorem:4 If the determinant of a 2×2 matrix is one, (i.e. AD −BC = 1)
then
¸N
A sin N θ − sin (N − 1) θ B sin N θ
· · ¸
A B 1
= (0.37)
C D sin θ C sin N θ D sin N θ − sin (N − 1) θ
James Joseph Sylvester (1814-1897,
English) made fundamental contribu-
tions to matrix theory, invariant theory,
where
1
number theory, partition theory and cos θ = (A + D) (0.38)
combinatorics. He played a leadership 2
role in American mathematics in the
later half of the 19th century as a pro-
fessor at the Johns Hopkins University
4 The theorem presented here is a specific case. See A. A. Tovar and L. W. Casperson, “Generalized
and as founder of the American Journal
of Mathematics. (Wikipedia) Sylvester theorems for periodic applications in matrix optics,” J. Opt. Soc. Am. A 12, 578-590 (1995).
0.4 Fourier Theory 13
¸N +1
A sin N θ − sin (N − 1) θ B sin N θ
· · ¸· ¸
A B 1 A B
=
C D sin θ C D C sin N θ D sin N θ − sin (N − 1) θ
· ¡ 2
A + BC sin N θ − A sin (N − 1) θ (AB + B D) sin N θ − B sin (N − 1) θ
¢ ¸
1
= ¡ 2
sin θ (AC +C D) sin N θ −C sin (N − 1) θ D + BC sin N θ − D sin (N − 1) θ
¢
Now we inject the condition AD − BC = 1 into the diagonal elements and obtain
· ¡ 2
A + AD − 1 sin N θ − A sin (N − 1) θ B [(A + D) sin N θ − sin (N − 1) θ]
¢ ¸
1
¡ 2
sin θ C [(A + D) sin N θ − sin (N − 1) θ] D + AD − 1 sin N θ − D sin (N − 1) θ
¢
and then
A [(A + D) sin N θ − sin (N − 1) θ] − sin N θ B [(A + D) sin N θ − sin (N − 1) θ]
· ¸
1
sin θ C [(A + D) sin N θ − sin (N − 1) θ] D [(A + D) sin N θ − sin (N − 1) θ] − sin N θ
e i m∆ωt , where m is an integer, and integrate over the function period 2π/∆ω:
π/∆ω π/∆ω
Z ∞ Z
i m∆ωt
e i (m−n)∆ωt d t
X
f (t )e dt = cn
n=−∞
−π/∆ω −π/∆ω
¸π/∆ω
∞ e i (m−n)∆ωt
·
X
= cn
n=−∞ i (m − n) ∆ω −π/∆ω (0.44)
X 2πc n e i (m−n)π − e −i (m−n)π
∞ · ¸
=
n=−∞ ∆ω 2i (m − n) π
∞ 2πc sin [(m − n) π]
X n
=
n=−∞ ∆ω (m − n) π
The function sin [(m − n) π] / [(m − n) π] is equal to zero for all n 6= m, and it is
equal to one when n = m (to see this, use L’Hospital’s rule on the zero-over-zero
situation, or just go back and re perform the above integral for n = m). Thus, only
one term contributes to the summation in (0.44). We now have
π/∆ω
∆ω
Z
cm = f (t )e i m∆ωt d t (0.45)
2π
−π/∆ω
from which the coefficients c n can be computed, given a function f (t ). (Note that
m is a dummy index so we can change it back to n if we like.)
This completes the circle. If we know the function f (t ), we can find the
coefficients c n via (0.45), and, if we know the coefficients c n , we can generate the
function f (t ) via (0.42). If we are feeling a bit silly, we might combine these into a
single identity:
π/∆ω
∆ω
∞ Z
f (t )e i n∆ωt d t e −i n∆ωt
X
f (t ) = (0.46)
n=−∞ 2π
−π/∆ω
Z∞
1 ∞ 0
e −i n∆ωt f t 0 e i n∆ωt d t 0 ∆ω
X ¡ ¢
f (t ) = lim (0.47)
2π ∆ω→0 n=−∞
−∞
16 Chapter 0 Mathematical Tools
Recall that an integral is really a summation of rectangles under a curve with finely
spaced steps:
Zb b−a
∆ω
g (ω) d ω ≡ lim g (a + n∆ω) ∆ω
X
∆ω→0 n=0
a
b−a
(0.48)
a +b
2∆ω
µ ¶
+ n∆ω ∆ω
X
= lim g
∆ω→0 b−a 2
n=− 2∆ω
The final expression has been manipulated so that the index ranges through both
negative and positive numbers. If we set a = −b and take the limit b → ∞, then the
above expression becomes
Z∞ ∞
g (ω) d ω = lim g (n∆ω) ∆ω
X
(0.49)
∆ω→0 n=−∞
−∞
Now, (0.47) has the same form as (0.49) if g (n∆ω) represents everything in
the square brackets of (0.47). The result is the Fourier integral theorem:
Z∞ Z∞
1 1
e −i ωt p f t 0 e i ωt d t 0 d ω
¡ ¢ 0
f (t ) = p (0.50)
2π 2π
−∞ −∞
The piece in brackets is called the Fourier transform, and the rest of the operation
is called the inverse Fourier transform. The Fourier integral theorem (0.50) is often
written with the following (potentially confusing) notation:
Z∞
1
f (ω) ≡ p f (t )e i ωt d t
2π
−∞
(0.51)
Z∞
1 −i ωt
f (t ) ≡ p f (ω) e dω
2π
−∞
The transform and inverse transform are also sometimes written as f (ω) ≡
F f (t ) and f (t ) ≡ F −1 f (ω) . Note that the functions f (t ) and f (ω) are en-
© ª © ª
tirely different, even taking on different units (e.g. the latter having extra units of
per frequency). The two functions are distinguished by their arguments, which
also have different units (e.g. time vs. frequency). Nevertheless, it is customary to
use the same letter to denote either function since they form a transform pair.
0.4 Fourier Theory 17
Example 0.6
2 /2T 2
Compute the Fourier transform of E (t ) = E 0 e −t e −i ω0 t followed by the inverse
Fourier transform.
The integration can be performed with the help of (0.55), which yields
(ω−ω0 )2
π
r
E0 − 2 2
E (ω) = p 2
e 4(1/2T 2 )
= T E 0 e −T (ω−ω0 ) /2
2π 1/2T
Similarly, the inverse Fourier transform of the above function is
Z∞ ³ Z∞
1 −T 2 (ω−ω0 )2 /2
´
−i ωt T E0 T2 2
ω2 +(T 2 ω0 −i t )ω− T2 ω20
E (t ) = p T E0e e dω = p e− 2 dω
2π 2π
−∞ −∞
2π T 2 /2
which brings us back to where we started.
in such a way as to make the integral take on the value of the function f (t ). (You
can think of δ t 0 − t d t 0 as an infinitely tall and infinitely thin rectangle centered
¡ ¢
at t 0 = t with an area unity.) The integral only pays attention to the value of f t 0
¡ ¢
at the point t 0 = t .
A remarkable attribute of the delta function can be seen from the Fourier
integral theorem. After rearranging the order of integration, the Fourier integral
theorem (0.50) can be written as
Z∞ Z∞
¡ 0¢ 1
e i ω(t −t ) d ω d t 0
0
f (t ) = f t (0.53)
2π
−∞ −∞
A comparison of (0.52) and (0.53) shows that you may write the delta function
as a uniform superposition of all frequency components:
Z∞
1
e i ω(t −t ) d ω
0
0
δ t −t =
¡ ¢
(0.54)
2π
−∞
Example 0.7
Use (0.54) to prove Parseval’s relation: 7
Z∞ Z∞
¯ f (ω)¯2 d ω = ¯ f (t )¯2 d t
¯ ¯ ¯ ¯
−∞ −∞
Solution:
Z∞ Z∞
¯ f (ω)¯2 d ω = f (ω) f ∗ (ω) d ω
¯ ¯
−∞ −∞
Z∞ 1 Z∞ 1 Z∞ ¡ ¢
0
= p f (t ) e i ωt d t p f ∗ t 0 e −i ωt d t 0 d ω
2π 2π
−∞ −∞ −∞
7 For a more general version of the relation, see G. B. Arfken and H. J. Weber, Mathematical
Methods for Physicists 6th ed., Sect. 15.5 (San Diego: Elsevier Academic Press 2005).
0.4 Fourier Theory 19
Z∞
π b2 +c
r
−ax 2 +bx+c
e dx = e 4a (Re {a} > 0) (0.55)
a
−∞
Z∞
e i ax π |b| −|ab|
2 2
dx = e (b > 0) (0.56)
1 + x /b 2
0
Z2π
e ±i a cos(θ−θ ) d θ = 2πJ 0 (a)
0
(0.57)
0
Za
a
J 0 (bx) x d x = J 1 (ab) (0.58)
b
0
Z∞ 2
−ax 2 e −b /4a
e J 0 (bx) x d x = (0.59)
2a
0
Z∞
sin2 (ax) π
2
dx = (0.60)
(ax) 2a
0
dy y
Z
¤3/2 = p (0.61)
c y2 + c
£
y2 + c
p
dx 1 −1 c
Z
p = − p sin (0.62)
x x2 − c c |x|
Zπ Zπ
1
sin(ax) sin(bx) d x = cos(ax) cos(bx) d x = δab (a, b integer) (0.63)
2
0 0
N 1 − r N +1
rn =
X
(0.64)
n=0 1−r
N r (1 − r N )
rn =
X
(0.65)
n=1 1−r
∞ 1
rn =
X
(r < 1) (0.66)
n=0 1−r
Exercises 21
Exercises
P0.2 Use the dot product (0.2) to show that the cross product E × B is per-
pendicular to E and to B.
r − r0
¡ ¢
1
∇r =− ,
|r − r0 | |r − r0 |3
occurs.
1 ∂ ∂ 1 ∂2 ∂2
µ ¶
2
∇ = ρ + 2 +
ρ ∂ρ ∂ρ ρ ∂φ2 ∂z 2
Solution: (Partial)
Continuing with the approach in Example 0.2, we have
à ! à !
∂2 f ∂2 ρ ∂ f ∂ρ ∂ ∂ f ∂2 φ ∂ f ∂φ ∂ ∂ f
= + + +
∂x 2 ∂x 2 ∂ρ ∂x ∂ρ ∂x ∂x 2 ∂φ ∂x ∂φ ∂x
à ! ¸ à 2 !
∂2 ρ ∂ f ∂ρ ∂ ∂ρ ∂ f ∂φ ∂ f ∂ φ ∂f ∂φ ∂ ∂ρ ∂ f ∂φ ∂ f
·µ ¶ µ ¶ ·µ ¶ µ ¶ ¸
= + + + + +
∂x 2 ∂ρ ∂x ∂ρ ∂x ∂ρ ∂x ∂φ ∂x 2 ∂φ ∂x ∂φ ∂x ∂ρ ∂x ∂φ
22 Chapter 0 Mathematical Tools
and
∂2 f ∂2 f ∂2 f
∇2 f = + +
∂x 2 ∂y 2 ∂z 2
à ! õ ¶ !
∂2 ρ ∂2 ρ ∂ρ 2 ∂2 f
¶¸ 2
∂f ∂ρ 2 ∂φ ∂ρ ∂φ ∂ρ ∂ f
¶ µ ·µ ¶µ ¶ µ ¶µ
= + + + +2 +
∂x 2 ∂y 2 ∂ρ ∂x ∂y ∂ρ 2 ∂x ∂x ∂y ∂y ∂φ∂ρ
"Ã ! Ã !# "µ ¶ #
∂2 φ ∂2 φ ∂f ∂φ 2 ∂φ 2 ∂2 f ∂2
¶ µ
+ + + + +
∂x 2 ∂y 2 ∂φ ∂x ∂y ∂φ2 ∂z 2
The needed first derivatives are given in Example 0.2. The needed second derivatives are
∂2 ρ 1 x2 sin2 φ
=q −¡ =
∂x 2 ρ
¢3/2
x2 + y 2 x2 + y 2
∂2 φ 2x y 2 sin φ cos φ
=¡ ¢2 =
∂x 2 x2 + y 2 ρ2
∂2 ρ 1 y2 cos2 φ
=q −¡ =
∂y 2 ρ
¢3/2
x2 + y 2 x2 + y 2
∂2 φ 2x y 2 sin φ cos φ
=−¡ ¢2 = −
∂y 2 x2 + y 2 ρ2
Finish the derivation by substituting these derivatives into the above expression.
P0.11 Verify Stokes’ theorem (0.12) for the function given in Example 0.3.
Take¯the¯ surface to be a square in the x y-plane contained by |x| = ±1
and ¯ y ¯ = ±1, as illustrated in Fig. 0.6.
P0.12 Verify the following vector integral theorem for the same volume used
in Example 0.3, but with F = y 2 x x̂ + x y ẑ and G = x 2 x̂:
Z I
[F (∇ · G) + (G · ∇) F] d v = F (G · n̂) d a
Figure 0.6
V S
P0.13 Use the divergence theorem to show that the function in P0.5 is 4π
times the three-dimensional delta function
δ 3 r0 − r ≡ δ x 0 − x δ y 0 − y δ z 0 − z
¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢
1 if V contains r0
Z ½
δ3 r0 − r d v =
¡ ¢
0 otherwise
V
r − r0 r − r0
I ¡ ¢ Z ¡ ¢
¯ ¯3 · n̂d a = ∇r · ¯ ¯3 d v
¯r − r 0 ¯ ¯r − r 0 ¯
S V
Exercises 23
From P0.5, the argument in the integral on the right-hand side is zero except at r = r0 . Therefore,
if the volume V does not contain the point r = r0 , then the result of both integrals must be zero.
Let us construct a volume between an arbitrary surface S 1 containing r = r0 and S 2 , the surface
of a tiny sphere centered on r = r0 . Since the point r = r0 is excluded by the tiny sphere, the result
of either integral in the divergence theorem is still zero. However, we have on the tiny sphere
Z2πZπ Ã !
r − r0
I ¡ ¢
1
¯3 · n̂d a = − r ²2 sin φd φd α = −4π
r ²2
¯
¯r − r 0 ¯
S2 0 0
Therefore, for the outer surface S 1 (containing r = r0 ) we must have the equal and opposite
result:
r − r0
I ¡ ¢
¯ ¯3 · n̂d a = 4π
¯r − r 0 ¯
S1
This implies
r − r0
¡ ¢
4π if V contains r0
Z ½
∇r · ¯ ¯3 d v = 0 otherwise
¯r − r 0 ¯
V
r−r0
¡ ¢
The integrand exhibits the same characteristics as the delta function Therefore, ∇r · =
|r−r0 |3
4πδ3 r − r0 . The delta function is defined in (0.52)
¡ ¢
P0.16 Invert (0.15) to get both formulas in (0.16). HINT: You can get a second
equation by considering Euler’s equation with a negative angle −φ.
P0.21 Prove that Fourier Transforms have the property of linear superposi-
tion:
F ag (t ) + bh (t ) = ag (ω) + bh (ω)
© ª
ª 1 ¡ω¢
Prove F g (at ) = |a|
©
P0.22 g a .
Prove F g (t − τ) = g (ω)e i ωτ .
© ª
P0.23
2
P0.24 Show that the Fourier transform of E (t ) = E 0 e −(t /T ) cos ω0 t is
T E 0 − (ω+ω0 )2 ω−ω0 )2
µ ¶
−(
E (ω) = p e 4/T 2 + e 4/T 2
2 2
P0.25 Take the inverse Fourier transform of the result in P0.24. Check that it
returns exactly the original function.
p Z∞ Z∞
1 1 0
g (t ) e i ωt d t p h t 0 e i ωt d t 0
¡ ¢
= 2π p
2π 2π
−∞ −∞
p
= 2πg (ω) h (ω)
Exercises 25
Electromagnetic Phenomena
Here E and B represent electric and magnetic fields, respectively. The charge
density ρ describes the charge per volume distributed through space.3 The current
density J describes the motion of charge density (in units of ρ times velocity). The
constant ²0 is called the permittivity, and the constant µ0 is called the permeability.
Taken together, these are known as Maxwell’s equations.
After introducing a key revision of Ampere’s law, Maxwell realized that together
these equations comprise a complete self-consistent theory of electromagnetic
phenomena. Moreover, the equations imply the existence of electromagnetic
waves, which travel at the speed of light. Since the speed of light had been
measured before Maxwell’s time, it was immediately apparent (as was already
suspected) that light is a high-frequency manifestation of the same phenomena
that govern the influence of currents and charges upon each other. Previously,
optics had been considered a topic quite separate from electricity and magnetism.
Once the connection was made, it became clear that Maxwell’s equations form
the theoretical foundations of optics, and this is where we begin our study of light.
1 In Maxwell’s original notation, this set of equations was hardly concise, written without the
convenience of modern vector notation or ∇. His formulation wouldn’t fit easily on a T-shirt!
2 See J. D. Jackson, Classical Electrodynamics, 3rd ed., p. 1 (New York: John Wiley, 1999) or the
back cover of D. J. Griffiths, Introduction to Electrodynamics, 3rd ed. (New Jersey: Prentice-Hall,
1999).
3 Later in the book we use ρ for the radius in cylindrical coordinates, not to be confused with
charge density.
27
28 Chapter 1 Electromagnetic Phenomena
vector. We have written the force in terms of an electric field E (r), which is defined
throughout space (regardless of whether a second charge q is actually present).
The permittivity ²0 amounts to a proportionality constant.
The total force from a collection of charges is found by summing expression
Origin
(1.5) over all charges q n0 associated with their specific locations r0n . If the charges
are distributed continuously throughout space, having density ρ r0 (units of
¡ ¢
Figure 1.1 The geometry of
Coulomb’s law for a point charge charge per volume), the summation for finding the net electric field at r becomes
an integral:
¡ 0 ¢ r − r0
¡ ¢
1
Z
E (r) = ρ r d v0 (1.7)
4π²0 |r − r0 |3
V
4
This three-dimensional integral gives the net electric field produced by the
charge density ρ distributed throughout the volume V .
Gauss’ law (1.1), the first of Maxwell’s equations, follows directly from (1.7)
with some mathematical manipulation. No new physical phenomenon is intro-
duced in this process.5
Origin
r − r0
¡ ¢
≡ 4πδ3 r0 − r ≡ 4πδ x 0 − x δ y 0 − y δ z 0 − z
¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢
∇r · 3
(1.9)
0
|r − r |
ρ (r)
∇ · E (r) =
²0
The (perhaps more familiar) integral form of Gauss’ law can be obtained by
integrating (1.1) over a volume V and applying the divergence theorem (0.11) to
the left-hand side:
1 Figure 1.3 Gauss’ law in integral
I Z
E (r) · n̂ d a = ρ (r) d v (1.10) form relates the flux of the elec-
²0
S V tric field through a surface to the
This form of Gauss’ law shows that the total electric field flux extruding through a charge contained inside that sur-
face.
closed surface S (i.e. the integral on the left side) is proportional to the net charge
contained within it (i.e. within volume V contained by S).
Example 1.1
Suppose we have an electric field given by E = (αx 2 y 3 x̂ + βz 4 ŷ) cos ωt . Use Gauss’
law (1.1) to find the charge density ρ(x, y, z, t ).
Solution:
∂ ∂ ∂
µ ¶
ρ = ²0 ∇ · E = ²0 x̂ + ŷ + ẑ (αx 2 y 3 x̂ + βz 4 ŷ) cos ωt = 2²0 αx y 3 cos ωt
∂x ∂y ∂z
the magnetic field to arise from a distribution of moving charges described by a 1 to 100, which he did in seconds to the
astonishment of his teacher. (Presum-
current density J r0 throughout space. The current density has units of charge
¡ ¢
ably, Friedrich immediately realized that
times velocity per volume (or equivalently, current per cross sectional area). The the numbers form fty pairs equal to
101.) Gauss made important advances
magnetic force law analogous to Coulomb’s law is in number theory and dierential geome-
try. He developed the law discussed here
F = qv × B (1.11) as one of Maxwell's equations in 1835,
but it was not published until 1867, af-
6 For a derivation of Gauss’ law from Coulomb’s law that does not rely directly on the Dirac delta ter Gauss' death. Ironically, Maxwell
was already using Gauss' law by that
function, see J. D. Jackson, Classical Electrodynamics 3rd ed., pp. 27-29 (New York: John Wiley,
time. (Wikipedia)
1999).
30 Chapter 1 Electromagnetic Phenomena
where
µ0 r − r0
Z ¡ ¢
¡ 0¢
B (r) = J r × d v0 (1.12)
4π |r − r0 |3
V
The latter equation is known as the Biot-Savart law. The permeability µ0 dictates
the strength of the magnetic field, given the current distribution.
As with Coulomb’s law, we can apply mathematics to the Biot-Savart law
to obtain another of Maxwell’s equations. Nevertheless, the essential physics
is already inherent in the Biot-Savart law.7 Using the result from P0.4, we can
rewrite (1.12) as8
Jean-Baptiste Biot (1774-1862,
µ0 µ0 J r0
¡ ¢
1
Z Z
French) was born in Paris. He attended ¡ 0¢ 0
the École Polytechnique where mathe-
B (r) = − J r × ∇r dv = ∇× d v0 (1.13)
4π |r − r0 | 4π |r − r0 |
matician Gaspard Monge recognized his V V
academic potential. After graduating,
Biot joined the military and then took
Since the divergence of a curl is identically zero (see P0.6), we get straight away
part in an insurrection on the side of
the Royalists. He was captured, and the second of Maxwell’s equations (1.2)
his career might of have met a tragic
ending there had Monge not success-
∇·B = 0
fully pleaded for his release from jail.
Biot went on to become a professor
of physics at the College de France. which is known as Gauss’ law for magnetic fields. (Two equations down; two to
Among other contributions, Biot partic-
go.)
ipated in the rst hot-air balloon ride
with Gay-Lussac and correctly deduced The similarity between ∇ · B = 0 and ∇ · E = ρ/²0 , Gauss’ law for electric fields,
that meteorites that fell on L'Aigle,
is immediately apparent. In integral form, Gauss’ law for magnetic fields looks the
France in 1803 came from space. Later
Biot collaborated with the younger Felix same as (1.10), only with zero on the right-hand side. If one were to imagine the
Savart (1791-1841) on the theory of existence of magnetic monopoles (i.e. isolated north or south ‘charges’), then the
magnetism and electrical currents. They
formulated their famous law in 1820.
right-hand side would not be zero. The law implies that the total magnetic flux
(Wikipedia) extruding through any closed surface balances, with as many field lines pointing
inwards as pointing outwards.
Example 1.2
The field surrounding a magnetic dipole is given by
B = β 3xz x̂ + 3y z ŷ + 3z 2 − r 2 ẑ r 5
£ ¡ ¢ ¤±
p
where r ≡ x 2 + y 2 + z 2 . Show that this field satisfies Gauss’ law for magnetic
fields (1.2).
7 Like Coulomb’s law, the Biot-Savart law is incomplete since it also implies an instantaneous
response of the magnetic field to a reconfiguration of the currents. The generalized version of the
Biot-Savart law, another of Jefimenko’s equations, incorporates the fact that electromagnetic news
travels at the speed of light. Ironically, Gauss’ law for magnetic fields and Maxwell’s version of
Ampere’s law, derived from the Biot-Savart law, hold perfectly whether the Currents are steady or
vary in time. The Jefimenko equations, analogs of Coulomb and Biot-Savart, also embody Faraday’s
law, the only of Maxwell’s equations that cannot be derived from the usual forms of Coulomb’s law
and the Biot-Savart law. See D. J. Griffiths, Introduction to Electrodynamics, 3rd ed., Sect. 10.2.2
(New Jersey: Prentice-Hall, 1999).
8 Note that ∇ ignores the variable of integration r0 .
r
1.3 Faraday’s Law 31
Solution:
∂ ³ xz ´ ∂ ³ y z ´ ∂ 3z 2
· µ ¶¸
1
∇·B = β 3 +3 + − 3
∂x r 5 ∂y r 5 ∂z r 5 r
z 5xz ∂r z 5y z ∂r 6z 15z 2 ∂r 3 ∂r
· µ ¶ µ ¶ µ ¶¸
=β 3 5 − 6 +3 5 − 6 + 5− 6 +
r r ∂x r r ∂y r r ∂z r 4 ∂z
∂r ∂r ∂r 3 ∂r
· µ ¶ ¸
12z 15z
=β − 6 x +y +z + 4
r5 r ∂x ∂y ∂z r ∂z
.p
The necessary derivatives are ∂r /∂x = x x 2 + y 2 + z 2 = x/r , ∂r /∂y = y/r , and
∂r /∂z = z/r , which lead to
· ¸
12z 15z 3z
∇·B = β − + =0 Michael Faraday (17911867, English)
r5 r5 r5 was one of the greatest experimental
physicists in history. Born on the out-
skirts of London, his family was not well
o, his father being a blacksmith. The
young Michael Faraday only had access
a circuit loop (see Fig. 1.4) induces a voltage around the loop according to ity. Given his background, Faraday's
entry into the scientic community was
very gradual, from servant to assistant
∂
I Z
E · d` = − B · n̂ d a (1.14) and eventually to director of the labo-
∂t ratory at the Royal Institution. Faraday
C S is perhaps best known for his work that
established the law of induction and
The right side describes a change in the magnetic flux through a surface and the for the discovery that magnetic elds
can interact with light, known as the
left side describes the voltage around the loop containing the surface. Faraday eect. He also made many ad-
We apply Stokes’ theorem (0.12) to the left-hand side of Faraday’s law and vances to chemistry during his career
including guring out how to liquify
obtain
several gases. Faraday was a deeply re-
ligious man, serving as a Deacon in his
∂ ∂B
Z Z Z µ ¶
church. (Wikipedia)
(∇ × E) · n̂ d a = − B · n̂ d a or ∇×E+ · n̂ da = 0 (1.15)
∂t ∂t
S S S
∂B
∇×E = −
∂t
which is the differential form of Faraday’s law (1.4) (three of Maxwell’s equations
down; one to go).
N
Example 1.3
For the electric field given in Example 1.1, E = (αx 2 y 3 x̂+βz 4 ŷ) cos ωt , use Faraday’s Magnet
law (1.3) to find B(x, y, z, t ).
Figure 1.4 Faraday’s law.
32 Chapter 1 Electromagnetic Phenomena
Solution:
¯ ¯
¯ x̂ ŷ ẑ ¯¯
∂B ¯
∂ ∂ ∂ ¯
= −∇ × E = − cos ωt ¯
¯
∂x ∂y ∂z ¯¯
∂t ¯ αx 2 y 3 βz 4 0 ¯
¯
∂ ∂ ¡ 4¢ ∂ ∂ ¡ 2 3¢
·
= − cos ωt x̂ (0) − x̂ βz − ŷ (0) + ŷ αx y
∂y ∂z ∂x ∂z
∂ ¡ 4¢ ∂ ¡ 2 3¢
¸
+ẑ βz − ẑ αx y
∂x ∂y
= 4βz 3 x̂ + 3αx 2 y 2 ẑ cos ωt
¡ ¢
¢ sin ωt
B = 4βz 3 x̂ + 3αx 2 y 2 ẑ
¡
ω
plus possibly a constant field.
µ0 r − r0 ¤ r − r0
¡ ¢¸ ¡ ¢¶
executed his father. In 1799, Ampère
Z µ ·
J r0 ∇r ·
£ ¡ 0¢
d v0
¡ ¢
married Julie Carron, who died of ill- ∇ × B (r) = − J r · ∇r (1.17)
ness a few years later. These tragedies 4π |r − r0 |3 |r − r0 |3
V
weighed heavy on Ampère through-
out his life, especially because he was
According to (1.9), the first term in the integral is 4πJ r0 δ3 r0 − r , which is easily
¡ ¢ ¡ ¢
away from his wife during much of their
short life together, while he worked as
integrated. To make progress on the second term, we observe that the gradient can
a professor of physics and chemistry in
Bourg. After her death, Ampère was
be changed to operate on the primed variables without affecting the final result
appointed professor of mathematics at (i.e. ∇r → −∇r0 ). In addition, we take advantage of a vector integral theorem (see
the University of Lyon and then in 1809 P0.12) to arrive at
at the Ècole Polytechnique in Paris. Af-
ter hearing that a current-carrying wire
µ0 r − r0 £ ¡ 0 ¢¤ 0 µ0 r − r0 £ ¡ 0 ¢ ¤ 0
Z ¡ ¢ I ¡ ¢
could attract a compass needle in 1820, ∇ × B (r) = µ0 J (r) − ∇r0 · J r d v + J r · n̂ d a
Ampère quickly developed the theory of 4π |r − r0 |3 4π |r − r0 |3
V S
electromagnetism. (Wikipedia)
(1.18)
1.5 Maxwell’s Adjustment to Ampere’s Law 33
The last term in (1.18) vanishes if we assume that the current density J is com-
pletely contained within the volume V so that it is zero at the surface S. Thus, the
expression for the curl of B (r) reduces to
µ0 r − r0 £
Z ¡ ¢
∇ × B (r) = µ0 J (r) − ∇r0 · J r0 d v 0
¡ ¢¤
(1.19)
4π |r − r0 |3
V
∇·J ∼
=0 (steady-state approximation) (1.20)
∇ × B = µ0 J (1.21)
where n̂ is the outward normal to the surface. The units on this equation are that
of current, or charge per time, leaving the volume.
Since we have considered a closed surface S, the net current leaving the enclosed
volume V must be the same as the rate at which charge within the volume vanishes:
∂
Z
I =− ρ dv (1.25)
∂t
V
Upon equating these two expressions for current, as well as applying the diver-
gence theorem (0.11) to the former, we get
∂ρ ∂ρ
Z Z Z µ ¶
∇ · Jd v = − d v or ∇·J+ dv = 0 (1.26)
∂t ∂t
V V V
Example 1.4
(a) Use Gauss’s law to find the electric field in a gap that interrupts a current-
carrying wire, as shown in Fig. 1.6.
(b) Find the strength of the magnetic field on contour C using Ampere’s law applied
to surface S 1 . C
(c) Show that the displacement current in the gap leads to the identical magnetic
field when using surface S 2 .
I I
Solution: (a) We’ll assume that the cross-sectional area of the wire A is much wider
than the gap separation. Then the electric field in the gap will be uniform, and the
integral on the left-hand side of (1.10) reduces to E A since there is essentially no
field other than in the gap. If the accumulated charge on the ‘plate’ is Q, then the
right-hand side of (1.10) integrates to Q/²0 , and the electric field turns out to be
E = Q/(²0 A). Figure 1.6 Charging capacitor.
(b) Let the contour C be a circle at radius r . The magnetic field points around the
circumference with constant strength. The left-hand side of (1.22) becomes 2πr B
while the right-hand side is
∂Q
Z
µ0 J · n̂d a = µ0 I = µ0
∂t
S
Example 1.5
2 3
For the electric field E = (αx ¡ y x̂ + βz 4 ŷ) cos ¢ωt (see Example 1.1) and the as-
sociated magnetic field B = 4βz x̂ + 3αx 2 y 2 ẑ sinωωt (see Example 1.3), find the
3
Solution:
¯ ¯
* ¯ x̂ ŷ ẑ ¯
B ∂E sin ωt ¯ ∂ ¯
∂ ∂
¯
J = ∇× − ²0 = ¯ + ²0 ω(αx 2 y 3 x̂ + βz 4 ŷ) sin ωt
¯
¯ ∂x ∂y ∂z
µ0 ∂t µ0 ω ¯¯ 3 2 2 ¯
¯
4βz 0 3αx y
sin ωt £
6αx 2 y x̂ − 6αx y 2 ŷ + 4βz 3 ŷ + ²0 ω(αx 2 y 3 x̂ + βz 4 ŷ) sin ωt
¤
=
µ0 ω
6αx 2 y 4βz 3 6αx y 2
·µ ¶ µ ¶ ¸
= ²0 ωαx 2 y 3 + x̂ + ²0 ωβz 4 + − ŷ sin ωt
µ0 ω µ0 ω µ0 ω
36 Chapter 1 Electromagnetic Phenomena
ρ p = −∇ · P (1.31)
The left-hand side of (1.32) is a surface integral, which after integrating gives
units of charge. Physically, it is the sum of the charges touching the inside of
surface S (multiplied by a minus since by convention dipole vectors point from
the negatively charged end of a molecule to the positively charged end). When
∇ · P is zero, there are equal numbers of positive and negative charges touching
S from within, as depicted in Fig. 1.7. When ∇ · P is not zero, the positive and
negative charges touching S are not balanced, as depicted in Fig. 1.8. Essentially,
excess charge ends up within the volume because the non-uniform alignment of
dipoles causes them to be cut preferentially at the surface.11
Since we will ignore free charges (for optical media), we write the charge
density according to (1.31) as
ρ = −∇ · P (1.33)
In summary, in electrically neutral non-magnetic media, Maxwell’s equations
(in terms of the medium polarization P) are12
∇·P
∇·E = − (Gauss’s law) (1.34) Figure 1.8 A polarized medium
²0 with ∇ · P 6= 0.
∇·B = 0 (Gauss’s law for magnetism) (1.35)
∂B
∇×E = − (Faraday’s law) (1.36)
∂t
B ∂E ∂P
∇× = ²0 + + Jfree (Ampere’s law; fixed by Maxwell) (1.37)
µ0 ∂t ∂t
cutting any dipoles. However, the function P (r) is continuous, while the figures depict crudely just
a few dipoles. In a continuous material you can’t draw a surface that avoids cutting dipoles.
12 It is not uncommon to see the macroscopic Maxwell equations written in terms of two auxiliary
fields: H and D. The field H is useful in magnetic materials. In these materials, the combination
B µ0 in Ampere’s law is replaced by H ≡ B/µ0 − M, where Jm = ∇ × M is the current associated
±
with the material’s magnetization. Since we only consider nonmagnetic materials (M = 0), there
is little point in using H. The field D, called the displacement, is defined as D ≡ ²0 E + P. This
combination of E and P occurs in Coulomb’s law and Ampere’s law. For the purposes of this book,
it is conceptually more clear to retain the polarization P as a separate field in these two equations.
38 Chapter 1 Electromagnetic Phenomena
At first glance, Maxwell’s equations might not immediately suggest (to the
inexperienced eye) that waves are solutions. However, we can manipulate the
equations (first order differential equations that couple E to B) into the familiar
wave equation (decoupled second order differential equations for either E or B).
You should become familiar with this derivation. In what follows, we will derive
the wave equation for E. The derivation of the wave equation for B is very similar
(see problem P1.6).
∂
∇ × (∇ × E) + (∇ × B) = 0 (1.38)
∂t
We may eliminate ∇ × B by substitution from (1.4), which gives
∂2 E ∂J
∇ × (∇ × E) + µ0 ²0 = −µ0 (1.39)
∂t 2 ∂t
∂2 E ∂J ∇ρ
∇2 E − µ0 ²0 = µ0 + (1.40)
∂t 2 ∂t ²0
∂2 E ∂Jfree ∂2 P 1
∇2 E − µ0 ²0 = µ0 + µ0 − ∇ (∇ · P) (1.41)
∂t 2 ∂t ∂t 2 ²0
The left-hand side of (1.41) is the familiar wave equation. However, the right-
hand side contains a number of source terms, which arise when various currents
and/or polarizations are present. The first term on the right-hand side of (1.41)
describes currents of free charges, which are important for determining the reflec-
tion of light from a metallic surface or for determining the propagation of light in
a plasma. The second term on the right-hand side describes dipole oscillations,
which behave similar to currents. The final term on the right-hand side of (1.41)
is important in anisotropic media such as crystals. In this case, the polarization
P responds to the electric field along a direction not necessarily parallel to E,
due to the influence of the crystal lattice (addressed in chapter 5). In summary,
when light propagates in a material, at least one of the terms on the right-hand
side of (1.41) will be non zero. As an example, in glass, Jfree = 0 and ∇ · P = 0, but
∂2 P ∂t 2 6= 0 since the medium polarization responds to the light field, giving rise
±
Example 1.6
1.7 The Wave Equation 39
ρ = 2²0 αx y 3 cos ωt
Solution: We have
∂2 E £ ¡ 3
∇2 E − µ0 ²0 = α 2y + 6x 2 y x̂ + 12βz 2 ŷ cos ωt
¢ ¤
∂t 2
Similarly,
∂J ∇ρ £¡
µ0 = µ0 ²0 ω2 αx 2 y 3 + 6αx 2 y x̂ + µ0 ²0 ω2 βz 4 + 12z 2 − 6αx y 2 ŷ cos ωt
¢ ¡ ¢ ¤
+
∂t ²0
+ 2αy 3 x̂ + 6αx y 2 ŷ cos ωt
£ ¤
= α µ0 ²0 ω2 x 2 y 3 + 6x 2 y + 2y 3 x̂ + µ0 ²0 ω2 βz 4 + 12z 2 ŷ cos ωt
£ ¡ ¢ ¡ ¢ ¤
The two expressions are equivalent, and the wave equation is satisfied.13
The magnetic field B satisfies a similar wave equation, decoupled from E (see
P1.6). However, the two waves are not independent. The fields for E and B must
be chosen to be consistent with each other through Maxwell’s equations. After
solving the wave equation (1.41) for E, one can obtain the consistent B from E via
Faraday’s law (1.36).
In vacuum all of the terms on the right-hand side in (1.41) are zero, in which
case the wave equation reduces to
∂2 E
∇2 E − µ0 ²0 =0 (vacuum) (1.42)
∂t 2
Solutions to this equation can take on every imaginable functional shape (speci-
fied at a given instant—the evolution thereafter being controlled by (1.42)). More-
over, since the differential equation is linear, any number of solutions can be
added together to create other valid solutions. Consider the subclass of solutions
13 The expressions in Example 1.6 hardly look like waves. The (quite unlikely) current and charge
distributions, which fill all space, would have to be artificially induced rather than arise naturally in
response to a field disturbance on a medium.
40 Chapter 1 Electromagnetic Phenomena
In this case, E depends on the argument û·r−c t , where û is a unit vector specifying
the direction of propagation. The shape is preserved since features occurring at a
given position recur ‘downstream’ at a distance c t after a time t . By checking this
solution in (1.42), one confirms that the speed of propagation is c (see P1.8). As
mentioned previously, one may add together any combination of solutions (even
with differing directions of propagation) to form other valid solutions.
Exercises 41
Exercises
P1.1 Consider an infinitely long hollow cylinder with inner radius a and
outer radius b as shown in Fig. 1.9. Assume that the cylinder has a
charge density ρ = k/s 2 for a < s < b and no charge elsewhere, where s
is the radial distance from the axis of the cylinder. Use Gauss’s Law in
integral form to find the electric field produced by this charge for each
of the three regions: s < a, a < s < b, and s > b.
a
HINT: For each region first draw an appropriate ‘Gaussian surface’ and
integrate the charge density over the volume to figure out the enclosed
b
charge. Then use Gauss’s law in integral form and the symmetry of the
problem to solve for the electric field. Figure 1.9 A charged cylinder with
charge located between a and b.
k × E0
cos k · r − ωt + φ
¡ ¢
B(r, t ) =
ω
is consistent with (1.3).
P1.3 A conducting cylinder with the same geometry as P1.1 carries a current
density J = k/s ẑ along the axis of the cylinder for a < s < b, where s is
the radial distance from the axis of the cylinder. Using Ampere’s Law in
integral form, find the magnetic field due to this current. Find the field
for each of the three regions: s < a, a < s < b, and s > b.
HINT: For each region first draw an appropriate ‘Amperian loop’ and
integrate the current density over the surface to figure out how much
current passes through the loop. Then use Ampere’s law in integral
form and the symmetry of the problem to solve for the magnetic field.
P1.5 Check that the E and B fields in P1.2, satisfy the rest of Maxwell’s equa-
tions (1.1), (1.2), and (1.4). What are the implications for J and ρ?
P1.6 Derive the wave equation for the magnetic field B in vacuum (i.e. J = 0
and ρ = 0).
P1.7 Show that the magnetic field in P1.2 is consistent with the wave equa-
tion derived in P1.6.
P1.8 Verify that E(û·r−c t ) satisfies the vacuum wave equation (1.42), where
E has an arbitrary functional form.
Screen D (e) Use (1.34) to show that E0 and û must be perpendicular to each
Laser other in vacuum.
A
L1.10 Measure the speed of light using a rotating mirror. Provide an estimate
B C of the experimental uncertainty in your answer (not the percentage
Rotating Delay Path
Mirror error from the known value). (video)
Figure 1.10 Geometry for lab 1.10. Figure 1.10 shows a simplified geometry for the optical path for light
in this experiment. Laser light from A reflects from a rotating mirror
at B towards C . The light returns to B , where the mirror has rotated,
sending the light to point D. Notice that a mirror rotation of θ deflects
the beam by 2θ.
Exercises 43
Retro-reflecting
Collimation Telescope
Rotating Long Corridor
mirror
Front of laser can
serve as screen
for returning light
Laser
P1.11 Ole Roemer made the first successful measurement of the speed of light Ole Roemer (16441710, Danish) was
a man of many interests. In addition to
in 1676 by observing the orbital period of Io, a moon of Jupiter with a
measuring the speed of light, he created
period of 42.5 hours. When Earth is moving toward Jupiter, the period a temperature scale which with slight
modication became the Fahrenheit
is measured to be shorter than 42.5 hours because light indicating the
scale, introduced a system of standard
end of the moon’s orbit travels less distance than light indicating the weights and measures, and was heavily
beginning. When Earth is moving away from Jupiter, the situation is involved in civic aairs (city planning,
etc.). Scientists initially became inter-
reversed, and the period is measured to be longer than 42.5 hours. ested in Io's orbit because its eclipse
(when it went behind Jupiter) was an
(a) If you were to measure the time for 40 observed orbits of Io when
event that could be seen from many
Earth is moving directly toward Jupiter and then several months later places on earth. By comparing accurate
measurements of the local time when Io
measure the time for 40 observed orbits when Earth is moving directly
was eclipsed by Jupiter at two remote
away from Jupiter, what would you expect the difference between these places on earth, scientists in the 1600s
two measurements be? Take the Earth’s orbital radius to be 1.5×1011 m. were able to determine the longitude
dierence between the two places.
To simplify the geometry, just assume that Earth moves directly toward
or away from Jupiter over the entire 40 orbits (see Fig. 1.12).
Earth
(b) Roemer did the experiment described in part (a), and experimen-
Io
tally measured a 22 minute difference. What speed of light would one Sun
deduce from that value?
Jupiter
P1.12 In an isotropic medium (i.e. ∇ · P = 0), the polarization can often be Earth
written as function of the electric field: P = ²0 χ (E ) E, where χ (E ) =
Figure 1.12 Geometry for P1.11
χ1 + χ2 E + χ3 E 2 · · · . The higher order coefficients in the expansion (i.e.
χ2 , χ3 , ...) are typically small, so only the first term is important at low
intensities. The field of nonlinear optics deals with intense light-matter
interactions, where the higher order terms of the expansion become
important. This can lead to phenomena such as harmonic generation.
Starting with Maxwell’s equations, derive the wave equation for nonlin-
ear optics in an isotropic medium:
¢ ∂2 E ∂2 χ2 E + χ3 E 2 + · · · E ∂J
¡ ¢
2
∇ E − µ0 ²0 1 + χ1 = µ0 ²0 + µ0
¡
∂t 2 ∂t 2 ∂t
We retain the possibility of current here since, for example, in a gas
some of the molecules might ionize in the presence of a strong field,
giving rise to currents.
Chapter 2
45
46 Chapter 2 Plane Waves and Refractive Index
We are interested in solutions to (2.1) that have the functional form (see P1.9)
E(r, t ) = E0 cos k · r − ωt + φ
¡ ¢
(2.2)
AM
Frequency (Hz)
Here φ represents an arbitrary (constant) phase term. The vector k, called the
Radio
wave vector, may be written as
FM 2π
k ≡ k û = û (vacuum) (2.3)
λvac
where k has units of inverse length, û is a unit vector defining the direction of
Microwave
Radar
propagation, and λvac is the length by which r must vary (in the direction of û) to
cause the cosine to go through a complete cycle. This distance is known as the
(vacuum) wavelength. The frequency of oscillation is related to the wavelength via
2πc
ω= (vacuum) (2.4)
λvac
Infrared
The frequency ω has units of radians per second. Frequency is also often ex-
pressed as ν ≡ ω/2π in units of inverse seconds or Hz. Notice that k and ω cannot
Visible
be chosen independently; the wave equation requires them to be related through
the dispersion relation
Ultraviolet ω
k= (vacuum) (2.5)
c
Typical values for λvac are given in Fig. 2.1. Sometimes the spatial period of the
wave is expressed as 1/λvac , in units of cm−1 , called the wave number.
A magnetic wave accompanies any electric wave, and it obeys a similar wave
X-rays equation (see P1.6). The magnetic wave corresponding to (2.2) is
B(r, t ) = B0 cos k · r − ωt + φ ,
¡ ¢
(2.6)
Gamma Rays
In order to satisfy Faraday’s law (1.3), the arguments of the cosine in (2.2) and
(2.6) must be identical. Therefore, in vacuum the electric and magnetic fields
travel in phase. In addition, Faraday’s law requires (see P1.2)
k × E0
B0 = (2.7)
ω
The above cross product means that B0 , is perpendicular to both E0 and k. Mean-
Figure 2.1 The electromagnetic while, Gauss’ law ∇ · E = 0 forces k to be perpendicular to E0 . It follows that the
spectrum magnitudes of the fields are related through B 0 = kE 0 /ω or B 0 = E 0 /c, in view of
(2.5).
The influence of the magnetic field only becomes important (in comparison
to the electric field) for charged particles moving near the speed of light. This
typically takes place only for extremely intense lasers (> 1018 W/cm2 , see P2.12)
where the electric field is sufficiently strong to cause electrons to oscillate with
velocities near the speed of light. We will be interested in optics problems that take
2.1 Plane Wave Solutions to the Wave Equation 47
place at far less intensity where the effects of the magnetic field can typically be
safely ignored. Throughout the remainder of this book, we will focus our attention
mainly on the electric field with the understanding that we can at any time deduce
the (less important) magnetic field from the electric field via Faraday’s law.
Figure 2.2 depicts the electric field (2.2) and the associated magnetic field
(2.6) like transverse waves on a string. However, they are actually large planar
sheets of uniform field strengths (difficult to draw) that move in the direction of k.
The name plane wave is given since a constant argument in (2.2) at any moment
describes a plane, which is perpendicular to k. A plane wave fills all space and
may be thought of as a series of infinite sheets, each with a different uniform field
strength, moving in the k direction.
At this point, we rewrite our plane wave solution using complex number nota- Figure 2.2 Depiction of electric
tion. Although this change in notation will not make the task at hand any easier and magnetic fields associated
with a plane wave.
(and may even appear to complicate things), we introduce it here in preparation
for later sections, where it will save considerable labor. (For a review of complex
notation, see section 0.2.)
Using complex notation we rewrite (2.2) as
n o
E(r, t ) = Re Ẽ0 e i (k·r−ωt ) (2.8)
Ẽ0 ≡ E0 e i φ (2.9)
1 We have assumed that each vector component of the field propagates with the same phase. To
Example 2.1
Verify that the complex plane wave (2.10) is a solution to the wave equation (2.1).
∂2 ∂2 ∂2
· ¸
∇2 E0 e i (k·r−ωt ) = E0 + + e i (k x x+k y y+k z z−ωt )
∂x 2 ∂y 2 ∂z 2
(2.11)
³ ´
= −E0 k x2 + k y2 + k z2 e i (k·r−ωt )
= −k 2 E0 e i (k·r−ωt )
1 ∂2 ³ i (k·r−ωt )
´ ω2
E 0 e = − E0 e i (k·r−ωt ) (2.12)
c 2 ∂t 2 c2
Upon insertion into (2.1) we obtain the vacuum dispersion relation (2.5), which
specifies the connection between the wavenumber k and the frequency ω, empha-
sizing that k and ω cannot be chosen independently.
∂2 E ∂2 P
∇2 E − ²0 µ0 = µ0 2 (2.13)
∂t 2 ∂t
Since we are considering sinusoidal waves, we consider solutions of the form
E = E0 e i (k·r−ωt )
(2.14)
P = P0 e i (k·r−ωt )
other. This phase discrepancy is most pronounced for materials that absorb
energy at the plane wave frequency.
Substitution of the trial solutions (2.14) into (2.13) yields
where n and κ are respectively the real and imaginary parts of the index. (Note
that κ is not k.) According to (2.17), the magnitude of the wave vector is also
complex according to
N ω (n + i κ) ω
k= = (2.19)
c c
The use of complex index of refraction only makes sense in the context of complex
representation of plane waves.
The complex index N takes into account absorption as well as the usual
oscillatory behavior of the wave. We see this by explicitly placing (2.19) into
(2.14):
κω
¡ nω ¢
E(r, t ) = E0 e −Im{k}·r e i (Re{k}·r−ωt ) = E0 e − c û·r e i c û·r−ωt (2.20)
As before, û is a real unit vector specifying the direction of k. Again, when looking
at (2.20), by special agreement in advance, we should just think of the real part,
namely6 ³ nω
κω
´
E(r, t ) = E0 e − c û·r cos û · r − ωt + φ (2.21)
c
5 Electrodynamics books often use the electric displacement D ≡ ² E + P = ²E. See M. Born and
0
E. Wolf, Principles of Optics, 7th ed., p. 3 (Cambridge University Press, 1999). The permittivity
² encapsulates the constitutive relation that connects P with E. In a linear medium we have
p
² ≡ ²0 (1 + χ), so that the index of refraction is given by N = ²/²0 .
6 For the sake of simplicity in writing (2.21) we assume linearly polarized light. That is, all vector
components of E0 have the same complex phase φ. We will consider other possibilities, such as
circularly polarized light, in chapter 6.
50 Chapter 2 Plane Waves and Refractive Index
where an overall phase φ was formerly held in the complex vector Ẽ0 . (The tilde
had been suppressed.) Figure 2.3 shows a graph of (2.21). The imaginary part of
the index κ causes the wave to decay as it travels. The real part of the index n is
associated with the oscillations of the wave. By inspection of the cosine argument
in (eq:2.3.20), we see that the speed of the (diminishing) sinusoidal wave fronts is
0 It is apparent that n(ω) is the ratio of the speed of the light in vacuum to the speed
of the wave in the material.
In a dielectric, the vacuum relations (2.3) and (2.4) are modified to read
2π
0 10 20 Re {k} ≡ û, (2.23)
λ
where
Figure 2.3 Electric field of a decay-
λ ≡ λvac /n. (2.24)
ing plane wave. For convenience
in plotting, the direction of prop- While the frequency ω is the same, whether in a material or in vacuum, the
agation is chosen to be in the z
wavelength λ varies with the real part of the index n.
direction (i.e. û = ẑ).
Example 2.2
When n = 1.5, κ = 0.1, and ν = 5 × 1014 Hz, find (a) the wavelength inside the
material, and (b) the propagation distance over which the amplitude of the wave
diminishes by the factor e −1 (called the skin depth).
Solution: (a)
(b)
κω c c 3 × 108 m/s
e− c z = e −1 ⇒ z= = = ¢ = 950 nm
κω 2πκν 2π (0.1) 5 × 1014 Hz
¡
(n + i κ)2 = n 2 − κ2 + i 2nκ = 1 + Re χ + i Im χ = 1 + χ
© ª © ª
(2.25)
The real parts and the imaginary parts in the above equation are separately equal:
n 2 − κ2 = 1 + Re χ and 2nκ = Im χ
© ª © ª
(2.26)
2.3 The Lorentz Model of Dielectrics 51
κ = Im χ /2n
© ª
(2.27)
When this is substituted into the first equation of (2.26) we get a quadratic in n 2
¡ © ª¢2
4
© ª¢ 2 Im χ
n − 1 + Re χ n −
¡
=0 (2.28)
4
The positive7 real root to this equation is
v
u¡ © ª¢ q¡ © ª¢2 ¡ © ª¢2
t 1 + Re χ + 1 + Re χ + Im χ
u
n= (2.29)
2
The imaginary part of the index is then obtained from (2.27).
When absorption is small we can neglect the imaginary part of χ(ω), and
(2.29) reduces to
n (ω) = 1 + χ (ω)
p
(negligible absorption) (2.30)
tween the electric field E0 and the polarization P0 ) and hence the index of re- Kaiser, whose niece Hendrik married.
Hendrik was persuaded to become a
fraction. The model assumes that all atoms (or molecules) in the medium are physicist and wrote a doctoral disserta-
identical, each with one (or a few) active electrons responding to the external tion entitled On the theory of reection
and refraction of light, in which he
field. The atoms are uniformly distributed throughout space with N identical rened Maxwell's electromagnetic the-
active electrons per volume (units of number per volume). The polarization of ory. Lorentz correctly hypothesized that
the atoms were composed of charged
the material is then
particles, and that their movement was
P = N q e rmicro (2.31) the source of light. He also derived the
transformations of space and time, later
Recall that polarization has units of dipoles per volume. Each dipole has strength used in Einstein's theory of relativity.
q e rmicro , where rmicro is a microscopic displacement of the electron from equilib- Lorentz won the Nobel prize in 1902
for his contributions to electromagnetic
rium. theory. (Wikipedia)
7 It is possible to have n < 0 for so called meta materials, not considered here.
52 Chapter 2 Plane Waves and Refractive Index
The electric field pulls on the electron with force q e E.8 A drag force (or friction)
In an electric field −m e γṙmicro opposes the electron motion and accounts for absorption of energy.
Without this term, it is only possible to describe optical index at frequencies away
from where absorption takes place. Finally, −k Hooke rmicro is a force accounting
- for the fact that the electron is bound to the nucleus. This restoring force can be
+
thought of as an effective spring that pulls the displaced electron back towards
equilibrium with a force proportional to the amount of displacement, so this
term is essentially the familiar Hooke’s law. With some rearranging, (2.32) can be
written as
qe
r̈micro + γṙmicro + ω20 rmicro = E (2.33)
Figure 2.4 A distorted electronic me
cloud becomes a dipole.
where ω0 ≡ k Hooke /m e is the natural oscillation frequency (or resonant fre-
p
quency) associated with the electron mass and the ‘spring constant.’
There is a subtle problem with our analysis, which we will continue to neglect
in this section, but which should be mentioned. The field E in (2.32) is the net
field, which is influenced by the presence of all of the dipoles. The actual field that
a dipole ‘feels’, however, does not include its own field. That is, we should remove
from E the field produced by each dipole in its own vicinity. This significantly
modifies the result if the density of the material is sufficiently high. This effect is
described by the Clausius-Mossotti formula, which is treated in appendix 2.B.
In accordance with our examination of a single sinusoidal wave, we insert
(2.14) into (2.33) and obtain
qe
r̈micro + γṙmicro + ω20 rmicro = E0 e i (k·r−ωt ) (2.34)
me
Note that within a given atom the excursions of rmicro are so small that k·r remains
essentially constant, since k·r varies with displacements on the scale of an optical
8 The electron also experiences a force due to the magnetic field of the light, F = q v
e micro × B,
but this force is tiny for typical optical fields.
2.3 The Lorentz Model of Dielectrics 53
qe E0 e i (k·r−ωt )
µ ¶
rmicro = (2.35)
m e ω20 − i ωγ − ω2
The electron position rmicro oscillates (not surprisingly) with the same frequency
ω as the driving electric field. This solution illustrates the convenience of com-
plex notation. The imaginary part in the denominator implies that the electron
oscillates with a phase different from the electric field oscillations; the damping
term γ (the imaginary part in the denominator) causes the two to be out of phase
somewhat. The complex algebra in (2.35) accomplishes quite easily what would
otherwise be cumbersome (i.e. working out a trigonometric phase).
We are now able to write the polarization in terms of the electric field. By
substituting (2.35) into (2.31) and rearranging, we obtain
ω2p
à !
P = ²0 E0 e i (k·r−ωt ) (2.36)
ω20 − i ωγ − ω2
ω2p
χ (ω) = (2.38) 0
-10 -5 0 5 10
ω20 − i ωγ − ω2
The index of refraction is then found by substituting the susceptibility (2.38) into
Figure 2.5 Real and imaginary
(2.18). The real and imaginary parts of the index are solved by equating separately parts of the index for a single
the real and imaginary parts of (2.18), namely Lorentz oscillator dielectric with
ωp = 10γ.
ω2p
(n + i κ)2 = 1 + χ (ω) = 1 + (2.39)
ω20 − i ωγ − ω2
A graph of n and κ is given in Fig. 2.5.
Most materials actually have more than one species of active electron, and
different active electrons behave differently. The generalization of (2.39) in this
case is
f j ω2p j
2
(n + i κ) = 1 + χ (ω) = 1 +
X
(2.40)
j ω0 j − i ωγ j − ω
2 2
where f j is the aptly named oscillator strength for the j th species of active electron.
Each species also has its own plasma frequency ωp j , natural frequency ω0 j , and
damping coefficient γ j .
9 In a plasma, charges move freely so that both the Hooke restoring force and the dragging term
We will include the current density Jfree while setting the medium polarization P
to zero. The wave equation is
∂2 ∂
∇2 E − ²0 µ0 E = µ0 Jfree (2.42)
∂t 2 ∂t
We assume that the current is made up of individual electrons traveling with
velocity vmicro :
Jfree = N q e vmicro (2.43)
As before, N is the number density of free electrons (in units of number per vol-
ume). Recall that current density Jfree has units of charge times velocity per volume
10 G. Burns, Solid State Physics, Sect. 9-5 (Orlando: Academic Press, 1985).
2.5 Poynting’s Theorem 55
(or current per cross sectional area), so (2.43) may be thought of as a definition of
current density in a fundamental sense.
Again, the electrons satisfy Newton’s equation of motion, similar to (2.32) except
without a restoring force:
q e E0 e i (k·r−ωt )
µ ¶
vmicro ≡ ṙmicro = (2.45)
me γ−iω
where again we assume that the electron oscillation excursions described by rmicro
are small compared to the wavelength so that r can be treated as a constant in
(2.44). The current density (2.43) in terms of the electric field is then
N q e2 E0 e i (k·r−ωt )
µ ¶
Jfree = (2.46)
me γ−iω
We substitute this together with the electric field into the wave equation (2.42) and
get
ω2 µ0 N q e2 E0 e i (k·r−ωt )
µ ¶
2 i (k·r−ωt ) i (k·r−ωt )
− k E0 e + 2 E0 e = −i ω (2.47)
c me γ−iω
This simplifies down to the dispersion relation
ω2p
à !
2 ω2
k = 2 1− (2.48)
c i γω + ω2
which agrees with (2.41). We have made the substitution ω2p = N q e2 /²0 m e in accor-
ω2 1+χ
dance with (2.37). As usual, k 2 = ( ) = ω (n+i κ) , so the susceptibility and the
2 2
c2 c2
index may be extracted from (2.48).
Note that in the low-frequency limit (i.e. ω ¿ γ), the current density (2.46)
+ +
reduces to Ohm’s law J = σE, where σ = N q e2 /m e γ is the DC conductivity. In
the high-frequency limit (i.e. ω À γ), the behavior changes over to that of a
+
free plasma, where collisions, which are responsible for resistance, become less
important since the excursions of the electrons during oscillations become very
+
small. This formula captures the general behavior of metals, but actual values of
the index vary from this somewhat (see P2.6 ). +
+
In either the conductor or dielectric model, the damping term removes energy
from electron oscillations. The damping term gives rise to an imaginary part
of the index, which causes an exponential attenuation of the plane wave as it
propagates. Figure 2.7 The electrons in a
conductor can easily move in
response to the applied field.
2.5 Poynting’s Theorem
Until now, we have described light as the propagation of an electromagnetic
disturbance. However, we typically observe light by detecting absorbed energy
56 Chapter 2 Plane Waves and Refractive Index
rather than the field amplitude directly. In this section we examine the connection
between propagating electromagnetic fields (such as the plane waves discussed
in this chapter) and the energy transported by such fields.
In the late 1800s John Poynting developed (from Maxwell’s equations) the the-
oretical foundation that describes light energy transport. You should appreciate
and remember the ideas involved, especially the definition and meaning of the
Poynting vector, even if you forget the specifics of its derivation.
We require just two of Maxwell’s Equations: (1.3) and (1.4). We take the dot product
of B/µ0 with the first equation and the dot product of E with the second equation.
Then by subtracting the second equation from the first we obtain
∂E B ∂B
µ ¶
B B
· (∇ × E) − E · ∇ × + ²0 E · + · = −E · J (2.49)
µ0 µ0 ∂t µ0 ∂t
The first two terms can be simplified using the vector identity P0.8. The next two
John Henry Poynting (18521914, terms are the time derivatives of ²0 E 2 /2 and B 2 /2µ0 , respectively. The relation
English) was the youngest son of a Uni-
tarian minister who operated a school
(2.49) then becomes
near Manchester England where John
∂ ²0 E 2 B 2
µ ¶ µ ¶
received his childhood education. He B
∇· E× + + = −E · J (2.50)
later attended Owen's College in Manch- µ0 ∂t 2 2µ0
ester and then went on to Cambridge
University where he distinguished him- This is Poynting’s theorem. Each term in this equation has units of power per
self in mathematics and worked under
James Maxwell in the Cavendish Lab-
volume.
oratory. Poynting joined the faculty of
the University of Birmingham (then
called Mason Science College) where It is conventional to write Poynting’s theorem as follows:11
he was a professor of physics from 1880
until his death. Besides developing his
famous theorem on the conservation
∂
∇·S+ (u field + u medium ) = 0 (2.51)
of energy in electromagnetic elds, he ∂t
performed innovative measurements of
Newton's gravitational constant and where
discovered that the Sun's radiation B
draws in small particles towards it, the S ≡ E× (2.52)
Poynting-Robertson eect. Poynting
µ0
was the principal author of a multi-
volume undergraduate physics textbook,
is called the Poynting vector, which has units of power per area, called irradiance.
which was in wide use until the 1930s. The expression
(Wikipedia) ²0 E 2 B 2
u field ≡ + (2.53)
2 2µ0
is the energy per volume stored in the electric and magnetic fields. Derivations of
the electric field energy density and the magnetic field energy density are given in
Appendices 2.C and 2.D. (See (2.79) and (2.86).) The derivative
∂u medium
≡ E·J (2.54)
∂t
11 See D. J. Griffiths, Introduction to Electrodynamics, 3rd ed., Sect. 8.1.2 (New Jersey: Prentice-Hall,
1999).
2.5 Poynting’s Theorem 57
describes the power per volume delivered to the medium from the field. Equa-
tion (2.54) is reminiscent of the familiar circuit power law, Power = Voltage ×
Current. Power is delivered when a charged particle traverses a distance while
experiencing a force. This happens when currents flow in the presence of electric
fields.
Poynting’s theorem is essentially a statement of the conservation of energy,
where S describes the flow of energy. To appreciate this, consider Poynting’s
theorem (2.51) integrated over a volume V (enclosed by surface S). If we also
apply the divergence theorem (0.11) to the term involving ∇ · S we obtain
∂
I Z
S · n̂ d a = − (u field + u medium ) d v (2.55)
∂t
S V
Notice that the volume integral over energy densities u field and u medium gives
the total energy stored in V , whether in the form of electromagnetic field energy
density or as energy density that has been given to the medium. The integration
of the Poynting vector over the surface gives the net Poynting vector flux directed
outward. Equation (2.55) indicates that the outward Poynting vector flux matches
the rate that total energy disappears from the interior of V . Conversely, if the
Poynting vector is directed inward (negative), then the net inward flux matches
the rate that energy increases within V . The vector S defines the flow of energy
through space. Its units of power per area are just what is needed to describe the
brightness of light impinging on a surface.
Example 2.3
(a) Find the Poynting vector S and energy density u field for the plane wave field E =
x̂E 0 cos (kz − ωt ) traveling in vacuum. (b) Check that S and u field satisfy Poynting’s
theorem.
ẑk × x̂E 0 kE 0
B= cos (kz − ωt ) = ŷ cos (kz − ωt )
ω ω
E×B kE 0
S= = x̂E 0 cos (kz − ωt ) × ŷ cos (kz − ωt )
µ0 ωµ0
= ẑc²0 E 02 cos2 (kz − ωt )
²0 E 2 B 2 ²0 E 02 kE 02
u field = + = cos2 (kz − ωt ) + cos2 (kz − ωt )
2 2µ0 2 2µ0 ω2
= ²0 E 02 cos2 (kz − ωt )
Notice that S = cu. The energy density traveling at speed c gives rise to the power
per area passing a surface (perpendicular to z).
58 Chapter 2 Plane Waves and Refractive Index
(b) We have
∂
∇ · S = c²0 E 02 cos2 (kz − ωt ) = −2kc²0 E 02 cos (kz − ωt ) sin (kz − ωt )
∂z
whereas
∂u field ∂
= ²0 E 02 cos2 (kz − ωt ) = 2ω²0 E 02 cos (kz − ωt ) sin (kz − ωt )
∂t ∂t
Poynting’s theorem (2.50) is satisfied since ω = kc.
It is common to replace the rapidly oscillating function cos2 (kz − ωt ) with its time
average 1/2, but this would have inhibited our ability to take the above derivatives.
k × E0 i (k·r−ωt )
B(r, t ) = e (2.56)
ω
When k is complex, B is out of phase with E, and this occurs when absorption
takes place. When there is no absorption, then k is real, and B and E carry the
same complex phase.
Before computing the Poynting vector (2.52), which involves multiplication,
we must remember our unspoken agreement that only the real parts of the fields
are relevant. We necessarily remove the imaginary parts before multiplying (see
(0.22)). To obtain the real parts of the fields, we add their respective complex
conjugates and divide the result by 2 (see (0.30)). The real field associated with
the plane-wave electric field is
1h ∗
i
E(r, t ) = E0 e i (k·r−ωt ) + E∗0 e −i (k ·r−ωt ) (2.57)
2
and the real field associated with (2.56) is
We have merely exercised our previous (conspiratorial) agreement that only the
real parts of (2.39) and (2.56) are to be retained.
Now we are ready to calculate the Poynting vector. The algebra is a little messy
in general, so we restrict the analysis to the case of an isotropic medium for the
sake of simplicity.
2.6 Irradiance of a Plane Wave 59
B
S ≡ E×
µ0
1 k × E0 i (k·r−ωt ) k∗ × E∗0 −i (k∗ ·r−ωt )
· ¸
1h ∗
i
= E0 e i (k·r−ωt ) + E∗0 e −i (k ·r−ωt ) × e + e
2 2µ0 ω ω
E∗ ×(k×E ) ∗
" #
E0 ×(k×E0 ) 2i (k·r−ωt )
+ 0 ω ¡ e i (k−k )·r
0
1 ω¡ e ¢
= E0 × k∗ ×E∗ ∗ ·r E ∗ × k∗ ×E∗
¢
∗
4µ0 + ω
0
e (
i k−k ) + 0
ω
0
e (k ·r−ωt )
−2i
1 k k ∗
· ¸
2i (k·r−ωt ) −2 κω û·r
= E0 × (û × E0 ) e + E0 × (û × E0 ) e c + C.C.
4µ0 ω ω
(2.59)
The letters ‘C.C.’ stand for the complex conjugate of what precedes in the square
brackets. The direction of k is specified with the real unit vector û. We have also
used (2.19) to rewrite i (k − k∗ ) as −2 (κω/c) û.
The assumption of an isotropic medium (not a crystal) means that ∇ · E(r, t ) = 0
and therefore û · E0 = 0. We can use this fact together with the BAC-CAB rule P0.3
to reduce the above expression to
û k k¡
· ¸
2i (k·r−ωt ) ∗ −2 κω û·r
¢
S= (E0 · E0 ) e + E0 · E 0 e c + C.C. (2.60)
4µ0 ω ω
The final expression shows that (in an isotropic medium) the flow of energy is in
the direction of û (or k). This agrees with our intuition that energy flows in the
direction that the wave propagates.
Very often, we are interested in the time-average of the Poynting vector, de-
noted by 〈S〉t . There are no electronics that can keep up with the rapid oscillation
of visible light (i.e. > 1014 Hz). Therefore, what is always measured is the time-
averaged absorption of energy. Under time averaging, the first term in (2.60)
vanishes since it rapidly oscillates positive and negative. The time-averaged
Poynting vector (including the term C.C.) becomes
û k + k ∗ ¡ κω
E0 · E∗0 e −2 c û·r
¢
〈S〉t =
4µ0 ω (2.61)
n²0 c ³ ¯2 ´ κω
|E 0 x |2 + ¯E 0 y ¯ + |E 0 z |2 e −2 c û·r
¯
= û
2
We have used (2.19) to rewrite k + k ∗ as 2 (nω/c). We have also used (1.43) to
rewrite 1/µ0 c as ²0 c.
The expression (2.61) is formally called the irradiance (with the direction û
included). However, we often speak of the intensity of a field I , which amounts to
the same thing, but without regard for the direction û. The definition of intensity
is thus less specific, and it can be applied, for example, to standing waves where
the net irradiance is technically zero (i.e. counter-propagating plane waves with
60 Chapter 2 Plane Waves and Refractive Index
zero net energy flow). Nevertheless, atoms in standing waves ‘feel’ the oscillating
field. In general, the intensity is written as
Radiant Power (of a source): Elec-
n²0 c n²0 c ³ ¯2 ´
tromagnetic energy. Units: W = J/s E0 · E∗0 = |E 0 x |2 + ¯E 0 y ¯ + |E 0 z |2
¯
I= (2.62)
2 2
Radiant Solid-Angle Intensity
tion without ruining your dark-adapted vision. For example, an airplane can have red illumination
on the instrument panel without interfering with a pilot’s ability to achieve full dark-adapted vision
to see things outside the cockpit.
2.A Radiometry, Photometry, and Color 61
Photometric units, which may seem a little obscure, were first defined in terms
of an actual candle with prescribed dimensions made from whale tallow. The
basic unit of luminous power is called the lumen, defined to be (1/683) W of light Luminous Power (of a source):
Visible light energy emitted per
with wavelength λvac = 555 nm, the peak of the eye’s bright-light response. More time from a source. Units: lumens
radiant power is required to achieve the same number of lumens for wavelengths (lm) lm=(1/683) W @ 555 nm
away from the center of the eye’s spectral response. Photometric units are often
Luminous Solid-Angle Intensity
used to characterize room lighting as well as photographic, projection, and display (of a source) Luminous power per
equipment. For example, both a 60 W incandescent bulb and a 13 W compact steradian emitted from a point-
fluorescent bulb emit a little more than 800 lumens of light. The difference in like source. Units: candelas (cd),
cd = lm/Sr.
photometric output versus radiometric output reflects the fact that most of the
energy radiated from an incandescent bulb is emitted in the infrared, where Luminance (of a source): Lumi-
nous solid-angle intensity per pro-
our eyes are not sensitive. Table 2.2 gives the names of the various photometric
jected area of an extended source.
quantities, which parallel the entries for radiometric quantities in Table 2.1. We (The projected area foreshortens
include a variety of units that are sometimes encountered. by cos θ, where θ is the observa-
Cones come in three varieties, each of which is sensitive to light in different tion angle relative to the surface
wavelength bands. Figure 2.9 plots the normalized sensitivity curves13 for short normal.) Units: cd/cm2 = stilb,
cd/m2 = nit, nit = 3183 lambert =
(S), medium (M), and long (L) wavelength cones. Because your brain gets separate 3.4 footlambert
signals from each type of cone, this system gives you the ability to measure
Luminous Emittance or Exitance
basic information about the spectral content of light. We interpret this spectral
(from a source): Luminous Power
information as the color of the light. When the three types of cones are stimulated emitted per unit surface area of an
equally the light appears white, and when they are stimulated differently the extended source. Units: lm/cm2
light appears colored. Light with different spectral distributions can produce the Illuminance (to a receiver): Inci-
exact same color sensation, so our perception of color only gives very general dent luminous power delivered
information about the spectral content of light. For example, light coming from per area to a receiver. Units: lux;
lm/m2 = lux, lm/cm2 = phot,
a television has a different spectral composition than the light incident on the
lm/ft2 = footcandle
camera that recorded the image, but both can produce the same color sensation.
This ambiguity can lead to a potentially dangerous situation in the lab because
Table 2.2 Photometric quantities
lasers from 670 nm to 800 nm all appear the same color. (They all stimulate the
and units.
L and M cones in essentially the same ratio.) However, your eye’s response falls
off quickly in the near-infrared, so a dangerous 800 nm high-intensity beam can
appear about the same brightness as an innocuous 670 nm laser pointer.
Because we have have three types of cones, our perception of color can be
well-represented using a three-dimensional vector space referred to as a color
S
space.14 A color space can be defined in terms of three “basis” light sources
M
13 A. Stockman, L. Sharpe, and C. Fach, “The spectral sensitivity of the human short-wavelength L
cones,” Vision Research, 39, 2901-2927 (1999); A. Stockman, and L. Sharpe, “Spectral sensitivities
of the middle- and long-wavelength sensitive cones derived from measurements in observers of
known genotype,” Vision Research, 40, 1711-1737 (2000).
14 The methods we use to represent color are very much tied to human physiology. Other species
400 500 600 700 800
have photoreceptors that sense different wavelength ranges or do not sense color at all. For instance,
wavelength (nm)
Papilio butterflies have six types of cone-like photoreceptors and certain types of shrimp have
twelve. Reptiles have four-color vision for visible light, and pit vipers (a subgroup of snakes) have Figure 2.9 Normalized cone sensi-
an additional set of “eyes” that look like pits on the front of their face. These pits are essentially
tivity functions
pinhole cameras sensitive to infrared light, and give these reptiles crude night-vision capabilities.
(Not surprisingly, pit vipers hunt most actively at night time.) On the other hand, some insects can
perceive markings on flowers that are only visible in the ultraviolet. Each of these species would
62 Chapter 2 Plane Waves and Refractive Index
referred to as primaries. Different colors (i.e. the “vectors” in the color space) are
created by mixing the primary light in different ratios. If we had three primaries
that separately stimulated each type of rod (S, M, and L), we could recreate any
color sensation exactly by mixing those primaries. However, by inspecting Fig. 2.9
you can see that this ideal set of primaries cannot be found because of the overlap
between the S, M, and L curves. Any light that will stimulate one type of cone will
also stimulate another. This overlap makes it impossible to display every possible
color with three primaries. (Although it is possible to quantify all colors with three
primaries, even if the primaries can’t display the colors—we’ll see how shortly.)
The range of colors that can be displayed with a given set of primaries is referred
to as the gamut of that color space. As your experience with computers suggests,
we are able engineer devices with a very broad gamut, but there are always colors
that cannot be displayed.
The CIE1931 RGB15 color space is a very commonly encountered color space
based on a series of experiments performed by W. David Wright and John Guild
in 1931. In these experiments, test subjects were asked to match the color of a
monochromatic test light source by mixing monochromatic primaries at 700 nm
(R), 546.1 nm (G), and 435.8 nm (B ). The relative amount of R, G, and B light
required to match the color at each test wavelength was recorded as the color
matching functions r¯(λ), ḡ (λ), and b̄(λ), shown in Fig. 2.10. Note that the color
matching functions sometimes go negative. This is most noticeable for r¯(λ), but
all three have negative values. These negative values indicate that the test color
was outside the gamut of the primaries (i.e. the color of the test source could not
be matched by adding primaries). In these cases, the observers matched the test
0 light as closely as possible by mixing primaries, and then they added some of the
primary light to the test light until the colors matched. The amount of primary
400 500 600 700
light that had to be added to the test light was recorded as a negative number. In
Test Wavelength (nm)
this way they were able to quantify the color, even though it couldn’t be displayed
Figure 2.10 The CIE 1931 RGB using their primaries.
color matching functions. It turns out that the eye responds essentially linearly with respect to color
perception. That is, if an observer perceives one light source to have components
(R 1 ,G 1 , B 1 ) and another light source to have components (R 2 ,G 2 , B 2 ), a mixture
of the two lights will have components (R 1 + R 2 ,G 1 + G 2 , B 1 + B 2 ). This linearity
allows us to calculate the color components of an arbitrary light source with
spectrum I (λ) by integrating the spectrum against the color matching functions:
Z Z Z
R = I (λ)r¯d λ G = I (λ)ḡ d λ B = I (λ)b̄d λ (2.63)
If R, G, or B turn out to be negative for a given I (λ), then that color of light
falls outside the gamut of these particular primaries. However, the negative
coordinates still provide a valid abstract representation of that color.
find the color spaces we use to record and recreate color sensations very inaccurate.
15 This is not the RGB space you may have probably used on a computer—that space is referred
The RGB color space is an additive color model, where the primaries are added
together to produce color and the absence of light gives black. Subtractive color
models produce color using a background that reflects all visible light equally so
that it appears white (e.g. a piece of paper or canvas) and then placing absorbing
pigments over the background to remove portions of the reflected spectrum.
Some color spaces use four basis vectors. For example, color printers use
the subtractive CMYK color space (Cyan, Magenta, Yellow, and Black), and some
television manufacturers add a fourth type of primary (usually yellow) to their
display. The fourth basis vector increases the range of colors that can be displayed
by these systems (i.e. it increases the gamut). However, the fourth basis vector
makes the color space overdetermined and only helps in displaying colors—we
can abstractly represent all colors using just three coordinates (in an appropriately
chosen basis).
Example 2.4
The CIE1931 XYZ color space is derived from the CIE1931 RGB space by the trans-
formation
X 0.49 0.31 0.20 R
Y = 1 0.17697 0.81240 0.01063 G (2.64)
0.17697
Z 0.00 0.01 0.99 B
where X , Y , and Z are the color coordinates in the new basis. The matrix elements
in (2.64) were carefully chosen to give this color space some desirable properties:
none of the new coordinates (X , Y , or Z ) are ever negative; the Y gives the pho-
tometric brightness of the light and the X and Z coordinates describe the color
part (i.e. the chromatisity) of the light; and the coordinates (1/3,1/3,1/3) give the
color white. The XYZ coordinates do not represent new primaries, but rather linear
combinations of the original primaries. Find the representation in the CIE1931
RGB basis for each of the basis vectors in the XYZ space.
is the macroscopic field in the medium, which includes a contribution from all of
the dipoles. To avoid double-counting the dipole’s own field, we should replace E
with
Eactual ≡ E − Edipole (2.65)
and write
q e rmicro = αEactual (2.66)
That is, we ought not to allow the dipole’s own field to act on itself as we previously
(inadvertently) did. Here Edipole is the average field that a dipole contributes to its
quota of space in the material.
Since N is the number of dipoles per volume, each dipole occupies a volume
1/N . As will be shown below, the average field due to a dipole16 centered in such
a volume (symmetrically chosen) is
N q e rmicro
Edipole = − (2.67)
3²o
Substitution of (2.67) and (2.66) into (2.65) yields
N αEactual E
Eeffective = E + ⇒ Eactual = α
(2.68)
3²o 1− N
3²o
senting their influence with the macroscopic field. However, if they are symmetrically distributed
the result is the same. See J. D. Jackson, Classical Electrodynamics, 3rd ed., Sect. 4.5 (New York: John
Wiley, 1999).
17 This form of Clausius-Mossotti relation, in terms of the refractive index, was renamed the
Lorentz-Lorenz formula, but probably undeservedly so, since it is essentially the same formula.
2.B Clausius-Mossotti Relation 65
Example 2.5
Xenon vapor at STP (density 4.46×10−5 mol/cm3 ) has index n = 1.000702 measured
at wavelength 589nm. Use (a) the Clausius-Mossotti relation (2.70) and (b) the
uncorrected formula (i.e. numerator only) to predict the index for liquid xenon
with density 2.00×10−2 mol/cm3 Compare with the measured value of n = 1.332.18
Solution: At the low density, we may may safely neglect the correction in the
denominator of (2.71) and simply write Natm α/²0 = 1.0007022 − 1 = 1.404 × 10−3 .
The liquid density Nliquid is 2.00 × 10−2 /4.46 × 10−5 = 449 times greater. Therefore,
Nliquid α/²0 = 449 × 1.404 × 10−3 = 0.630. (a) According to Clausius-Mossotti (2.71),
the index is r
0.630
n = 1+ = 1.341
1 − 0.630/3
(b) On the other hand, without the correction in the denominator, we get
p
n = 1 + 0.630 = 1.277
q e r − ẑd /2 q e r + ẑd /2
E= 3
−
4π²0 |r − ẑd /2| 4π²0 |r + ẑd /2|3
We wish to compute the average field within a cubic volume V = L 3 that symmet- Figure 2.11 The field lines sur-
rically encompasses the dipole.19 We take the volume dimension L to be large rounding a dipole.
compared to the dipole dimension d . Integrating the field over this volume yields
ZL/2 ZL/2
qe 1 1
= −ẑ dx dy q −q
2π²0
- L/2 - L/2 x 2 + y 2 + (L − d )2 /4 x 2 + y 2 + (L + d )2 /4
of Xenon Liquid and Vapour,” J. Phys. B: At. Mol. Phys. 1, 449-457 (1968).
19 Authors often obtain the same result using a spherical volume with the (usually unmentioned)
conceptual awkwardness that spheres cannot be closely packed to form a macroscopic medium
without introducing voids.
66 Chapter 2 Plane Waves and Refractive Index
The terms multiplying x̂ and ŷ vanish since they involve odd functions integrated
over even limits on either x or y, respectively. On the remaining term, the integra-
tion on z has been executed. Before integrating the remaining expression over x
and y, we make the following approximation based on L >> d :
1 ∼ 11
q =p q
x 2 + y 2 + (L ± d )2 /4 x 2 + y 2 + L 2 /4
1 ± x 2 +yLd2 +L
/2
2 /4
Ld
· ¸
∼ 1 /4
=p 1∓ 2
x 2 + y 2 + L 2 /4 x + y 2 + L 2 /4
which will make integration considerably easier.20 Then integration over the y
dimension brings us to21
The final integral is the same as twice the integral from 0 to L/2.
p Then, with x > 0,
we can employ the variable change s = x 2 +L 2 /4 ⇒ 2d x = d s/ s − L 2 /4 and obtain
LZ2 /2
qe d L2d s q e d 4π
Z
Ed v = −ẑ p = −ẑ
4π²0 s s 2 − L 4 /16 4π²0 3
L 2 /4
Reinstalling rmicro = ẑd and dividing by the volume 1/N , allotted to individual
dipoles, brings us to the anticipated result (2.67).
qe r − ẑd /2 r + ẑd /2 qe d
E= −h i ∼ = [3rˆ (ẑ · r̂) − ẑ]
4π²0 r 3 2 3/2 d 2 3/2 4π²0 r 3
h i
1 − ẑ · rˆ dr + d 2 d
1 + ẑ · r̂ r + 2
4r 4r
charge, or volts) describes the potential energy that a charge would experience
if placed at any given point in the field. The electric field and the potential are
connected through
E (r) = −∇φ (r) (2.73)
The energy U necessary to assemble a distribution of charges (owing to attraction
or repulsion) can be written in terms of a summation over all of the charges (or
charge density ρ (r)) located within the potential:
1
Z
U= φ (r) ρ (r) d v (2.74)
2
V
We consider the potential to arise from the charges themselves. The factor 1/2
is necessary to avoid double counting. To appreciate this factor consider just
two point charges: We only need to count the energy due to one charge in the
presence of the other’s potential to obtain the energy required to bring the charges
together.
A substitution of (1.1) for ρ (r) into (2.74) gives
²0
Z
U= φ (r) ∇ · E (r) d v (2.75)
2
V
²0 ²0
Z Z
∇ · φ (r) E (r) d v −
£ ¤
U= E (r) · ∇φ (r) d v (2.76)
2 2
V V
An application of the divergence theorem (0.11) on the first integral and a substi-
tution of (2.73) into the second integral yields
²0 ²0
I Z
U= φ (r) E (r) · n̂d a + E (r) · E (r) d v (2.77)
2 2
S V
where
²0 E 2
u E (r) ≡ (2.79)
2
is interpreted as the energy density of the electric field.
68 Chapter 2 Plane Waves and Refractive Index
As in (2.74), the factor 1/2 is necessary to avoid double counting the influence of
the currents on each other.
Under the assumption of steady currents (no variations in time), we may
substitute Ampere’s law (1.21) into (2.81), which yields
1
Z
U= [∇ × B (r)] · A (r) d v (2.82)
2µ0
V
Next we employ the vector identity P0.8 from which the previous expression
becomes
1 1
Z Z
U= B (r) · [∇ × A (r)] d v − ∇ · [A (r) × B (r)] d v (2.83)
2µ0 2µ0
V V
Upon substituting (2.80) into the first equation and applying the Divergence
theorem (0.11) on the second integral, this expression for total energy becomes
1 1
Z I
U= B (r) · B (r) d v − [A (r) × B (r)] · n̂ d a (2.84)
2µ0 2µ0
V S
where
B2
u B (r) ≡ (2.86)
2µ0
is the energy density for a magnetic field.
23 J. R. Reitz, F. J. Milford, and R. W. Christy, Foundations of Electromagnetic Theory 3rd ed., Sect.
Exercises
Aλ2vac
n2 = 1 +
λ2vac − λ20,vac
from (2.39) for a gas with negligible absorption (i.e. γ ∼ = 0, valid far
from resonance ω0 ), where λ0,vac corresponds to frequency ω0 and A is
a constant. Many materials (e.g. glass, air) have strong resonances in
the ultraviolet. In such materials, do you expect the index of refraction
for blue light to be greater than that for red light? Make a sketch of n as
a function of wavelength for visible light down to the ultraviolet (where
λ0,vac is located).
P2.3 In the Lorentz model, take N = 1028 m−3 for the density of bound
electrons in an insulator (note that N is number per volume, not just
number), and a single transition at ω0 = 6 × 1015 rad/sec (in the UV),
and damping γ = ω0 /5 (quite broad). Assume E0 is 104 V/m.
For three frequencies ω = ω0 −2γ, ω = ω0 , and ω = ω0 +2γ find the mag-
nitude and phase (relative to the phase of E0 e i (k·r−ωt ) ) of the following
quantities. Give correct SI units with each quantity. You don’t need to
worry about vector directions.
(a) The charge displacement amplitude rmicro (2.35)
(b) The polarization P(ω)
(c) The susceptibility χ(ω). What would the susceptibility be for twice
the E-field strength as before?
For the following no phase is needed:
(d) Find n and κ at the three frequencies. You will have to solve for the
real and imaginary parts of (n + i κ)2 = 1 + χ(ω).
(e) Find the three speeds of light in terms of c. Find the three wave-
lengths λ.
(f) Find how far light penetrates into the material before only 1/e of the
amplitude of E remains. Find how far light penetrates into the material
before only 1/e of the intensity I remains.
P2.4 (a) Use a computer graphing program and the Lorentz model to plot n
and κ as a function of ω frequency for a dielectric (i.e. obtain graphs
such as the ones in Fig. 2.5). Use these parameters to keep things
70 Chapter 2 Plane Waves and Refractive Index
P2.6 Use (2.27), (2.29), and (2.48) to estimate the index of silver at λ =
633nm. The density of free electrons in silver is N = 5.86×1028 m−3 and
the DC conductivity is σ = 6.62 × 107 C2 / (J · m · s).25 Compare with the
actual index given in P2.5.
Answer: n + i κ = 0.02 + i 4.50
P2.7 The dielectric model and the conductor model give identical results
for n in the case of a low-density plasma where there is no restoring
force (i.e. ω0 = 0) and no dragging term (i.e. γ = 0). Use this to model
the ionosphere (the uppermost part of the atmosphere that is ionized
by solar radiation to form a low-density plasma).
(a) If the index of refraction of the ionosphere is n = 0.9 for an FM
station at ν = ω/2π = 100 MHz, calculate the number of free electrons
per cubic meter.
(b) What is the complex refractive index of the ionosphere for radio
waves at 1160 kHz (KSL radio station)? Is this frequency above or below
the plasma frequency? Assume the same density of free electrons as in
part (a).
For your information, AM radio reflects better than FM radio from the
ionosphere (like visible light from a metal mirror). At night, the lower
layer of the ionosphere goes away so that AM radio waves reflect from
a higher layer.
P2.9 In the case of a linearly-polarized plane wave, where the phase of each
vector component of E0 is the same, re-derive (2.61) directly from the
real field (2.21). For simplicity, you may ignore absorption (i.e. κ ∼
= 0).
HINT: The time-average of cos2 k · r − ωt + φ is 1/2.
¡ ¢
P2.10 (a) Find the intensity (in W/cm2 ) produced by a short laser pulse (lin-
early polarized) with duration ∆t = 2.5 ×10−14 s and energy E = 100 mJ,
focused in vacuum to a round spot with radius r = 5 µm.
(b) What is the peak electric field (in V/Å)?
HINT: The SI units of electric field are N/C = V/m.
(c) What is the peak magnetic field (in T = kg/(s · C)?
P2.11 (a) What is the intensity (in W/cm2 ) on the retina when looking directly
at the sun? Assume that the eye’s pupil has a radius r pupil = 1 mm.
Take the Sun’s irradiance at the earth’s surface to be 1.4 kW/m2 , and
neglect refractive index (i.e. set n = 1). HINT: The Earth-Sun distance
is d o = 1.5 × 108 km and the pupil-retina distance is d i = 22 mm. The
radius of the Sun r Sun = 7.0 × 105 km is de-magnified on the retina
according to the ratio d i /d o .
(b) What is the intensity at the retina when looking directly into a
1 mW HeNe laser? Assume that the smallest radius of the laser beam
is r waist = 0.5 mm positioned d o = 2 m in front of the eye, and that the
entire beam enters the pupil. Compare with part (a).
P2.12 Show that the magnetic field of an intense laser with λ = 1 µm becomes
important for a free electron oscillating in the field at intensities above
1018 W/cm2 . This marks the transition to relativistic physics. Neverthe-
less, for convenience, use classical physics in making the estimate.
HINT: At lower intensities, the oscillating electric field dominates, so
the electron motion can be thought of as arising solely from the electric
field. Use this motion to calculate the magnetic force on the mov-
ing electron, and compare it to the electric force. The forces become
comparable at 1018 W/cm2 .
P2.13 The CIE1931 RGB color matching function r¯(λ), ḡ (λ), and b̄(λ) can be
transformed using (2.64) to obtain color matching functions for the
XYZ basis: x̄(λ), ȳ(λ), and z̄(λ), plotted in Fig 2.12. As with the RGB
color matching functions, the XYZ color matching functions can be
72 Chapter 2 Plane Waves and Refractive Index
used to calculate the color coordinates in the XYZ basis for an arbitrary
spectrum:
Z Z Z
X = I (λ)x̄d λ Y = I (λ) ȳd λ Z = I (λ)z̄d λ (2.87)
(The function ȳ(λ) was chosen to be exactly the scoptic response curve
(shown in Fig. 2.8), so that Y describes the photometric brightness of
the light.)
Obtain a copy of the XYZ color matching functions from www.cvrl.org
400 500 600 700
and calculate the XYZ color coordinates for the spectrum
wavelength (nm)
2
Figure 2.12 Color matching func- I (λ) = I 0 e −(λ−500 nm)
tions for the CIE XYZ color space
P2.14 The color space you’ve probably encountered most is sRGB, used to rep-
resent color on computer displays. The sRGB coordinates are related
to the XYZ coordinates by the transformation
1999).
73
74 Chapter 3 Reflection and Refraction
material on the right as depicted in the Fig. 3.1. When a plane wave traveling in
the direction ki is incident the boundary from the left, it gives rise to a reflected
vector traveling in the direction kr and a transmitted plane wave traveling in the
direction kt . The incident and reflected waves exist only to the left of the material
interface, and the transmitted wave exists only to the right of the interface. The
angles θi , θr , and θt give the angles that each respective wave vector (ki , kr , and
kt ) makes with the normal to the interface.
For simplicity, we’ll assume that both of the materials are isotropic here.
(Chapter 5 discusses refraction for anisotropic materials.) In this case, ki , kr , and
kt all lie in a single plane, referred to as the plane of incidence, (i.e. the plane
represented by the surface of this page). We are free to orient our coordinate
system in many different ways (and every textbook seems to do it differently!).2
We choose the y–z plane to be the plane of incidence, with the z-direction normal
to the interface and the x-axis pointing into the page.
The electric field vector for each plane wave is confined to a plane perpendic-
ular to its wave vector. We are free to decompose the field vector into arbitrary
components as long as they are perpendicular to the wave vector. It is customary
to choose one of the electric field vector components to be that which lies within
the plane of incidence. We call this p-polarized light, where p stands for parallel to
z-axis the plane of incidence. The remaining electric field vector component is directed
normal to the plane of incidence and is called s-polarized light.. The s stands for
x-axis senkrecht, a German word meaning perpendicular.
directed into page
Using this system, we can decompose the electric field vector Ei into its p-
(p)
polarized component E i and its s-polarized component E i(s) , as depicted in
Fig. 3.1. The s component E i(s) is represented by the tail of an arrow pointing
into the page, or the x-direction in our convention. The other fields Er and Et
Figure 3.1 Incident, reflected, and
are similarly split into s and p components as indicated in Fig. 3.1. All field
transmitted plane wave fields at a
material interface.
components are considered to be positive when they point in the direction of
their respective arrows.3 Note that the s-polarized components are parallel for
all three plane waves, whereas the p-polarized components are not (except at
normal incidence) because each plane wave travels in a different direction.
By inspection of Fig. 3.1, we can write the various wave vectors in terms of the
ŷ and ẑ unit vectors:
ki = k i ŷ sin θi + ẑ cos θi
¡ ¢
kr = k r ŷ sin θr − ẑ cos θr
¡ ¢
(3.1)
kt = k t ŷ sin θt + ẑ cos θt
¡ ¢
Also by inspection of Fig. 3.1 (following the conventions for the electric fields
indicated by the arrows), we can write the incident, reflected, and transmitted
2 For example, our convention is different than that used by E. Hecht, Optics, 3rd ed., Sect. 4.6.2
Each field has the form (2.8), and we have utilized the k-vectors (3.1) in the
exponents of (3.2).
Now we are ready to connect the fields on one side of the interface to the
fields on the other side. This is done using boundary conditions. As explained
in appendix 3.A, Maxwell’s equations require that the component of E that are Figure 3.2 Animation of s- and
parallel to the interface must be the same on either side of the boundary. In p-polarized fields incident on an
our coordinate system, the x̂ and ŷ components are parallel to the interface, and interface as the angle of incidence
z = 0 defines the interface. This means that at z = 0 the x̂ and ŷ components is varied.
of the combined incident and reflected fields must equal the corresponding
components of the transmitted field:
h i h i
E i ŷ cos θi + x̂E i(s) e i (ki y sin θi −ωi t ) + E r ŷ cos θr + x̂E r(s) e i (kr y sin θr −ωr t )
(p) (p)
h i
= E t ŷ cos θt + x̂E t(s) e i (kt y sin θt −ωt t ) (3.3)
(p)
Since this equation must hold for all conceivable values of t and y, we are com-
pelled to set all the phase factors in the complex exponentials equal to each other.
The time portion of the phase factors requires the frequency of all waves to be the
same:
ωi = ωr = ωt ≡ ω (3.4)
(We could have guessed that all frequencies would be the same; otherwise wave
fronts would be annihilated or created at the interface.) Similarly, equating the
spatial terms in the exponents of (3.3) requires
Willebrord Snell (or Snellius) (1580
k i sin θi = k r sin θr = k t sin θt (3.5) 1626, Dutch) was an astronomer and
mathematician born in Leiden, Nether-
Now recall from (2.19) the relations k i = k r = n i ω/c and k t = n t ω/c. With these lands. In 1613 he succeeded his father
as professor of mathematics at the
relations, (3.5) yields the law of reflection University of Leiden. He was an accom-
plished mathematician, developing a
θr = θi (3.6) new method for calculating π as well as
an improved method for measuring the
circumference of the earth. He is most
and Snell’s law
famous for his rediscovery of the law of
n i sin θi = n t sin θt (3.7) refraction in 1621. (The law was known
(in table form) to the ancient Greek
The three angles θi , θr , and θt are not independent. The reflected angle matches mathematician Ptolemy, to Persian en-
the incident angle, and the transmitted angle obeys Snell’s law. The phenomenon gineer Ibn Sahl (900s), and to Polish
philosopher Witelo (1200s).) Snell au-
of refraction refers to the fact that θi and θt are different. thored several books, including one on
Because the exponents are all identical, (3.3) reduces to two relatively simple trigonometry, published a year after his
death. (Wikipedia)
equations (one for each dimension, x̂ and ŷ):
and ³ ´
(p) (p) (p)
Ei + Er cos θi = E t cos θt (3.9)
We have derived these equations from the boundary condition (3.52) on the
parallel component of the electric field. This set of equations has four unknowns
(p) (p)
(E r , E r(s) , E t , and E t(s) ), assuming that we pick the incident fields. We require
two further equations to solve the system. These are obtained using the separate
boundary condition on the parallel component of magnetic fields given in (3.56)
(also discussed in appendix 3.A).
From Faraday’s law (1.3), we have for a plane wave (see (2.56))
k×E n
B= = û × E (3.10)
ω c
where û ≡ k/k is a unit vector in the direction of k. We have also utilized (2.19)
for a real index. This expression is useful for writing Bi , Br , and Bt in terms of the
electric field components that we have already introduced. When injecting (3.1)
and (3.2) into (3.10), the incident, reflected, and transmitted magnetic fields turn
out to be
ni h ¢i
−x̂E i + E i(s) −ẑ sin θi + ŷ cos θi e i [ki ( y sin θi +z cos θi )−ωi t ]
(p) ¡
Bi =
c
n r h (p) ¢i
x̂E r + E r(s) −ẑ sin θr − ŷ cos θr e i [kr ( y sin θr −z cos θr )−ωr t ]
¡
Br = (3.11)
c h
nt i
−x̂E t + E t(s) −ẑ sin θt + ŷ cos θt e i [kt ( y sin θt +z cos θt )−ωt t ]
(p) ¡ ¢
Bt =
c
Next, we apply the boundary condition (3.56), namely that the components of B
parallel to the interface (i.e. in the x̂ and ŷ dimensions) are the same4 on either
side of the plane z = 0. Since we already know that the exponents are all equal
and that θr = θi and n i = n r , the boundary condition gives
ni h (p)
i n h
i (p)
i n h
t (p)
i
−x̂E i + E i(s) ŷ cos θi + x̂E r − E r(s) ŷ cos θi = −x̂E t + E t(s) ŷ cos θt
c c c
(3.12)
As before, (3.12) reduces to two relatively simple equations (one for the x̂ dimen-
sion and one for the ŷ dimension):
³ ´
(p) (p) (p)
ni E i − E r = nt E t (3.13)
and ³ ´
n i E i(s) − E r(s) cos θi = n t E t(s) cos θt (3.14)
These two equations together with (3.8) and (3.9) allow us to solve for the reflected
Er and transmitted fields Et for the s and p polarization components. However,
(3.8), (3.9), (3.13), and (3.14) are not yet in their most convenient form.
4 We assume the permeability µ is the same everywhere—no magnetic effects.
0
3.2 The Fresnel Coefficients 77
p-polarized light (see P3.1). he was eight years old, but by age six-
teen he excelled and entered the École
Polytechnique where he earned distinc-
tion. As a young man, Fresnel began a
Example 3.1 successful career as an engineer, but he
lost his post in 1814 when Napoleon re-
Calculate the ratio of transmitted field to the incident field and the ratio of the turned to power. (Fresnel had supported
reflected field to incident field for s-polarized light. the Bourbons.) This dicult year was
when Fresnel turned his attention to
optics. Fresnel became a major propo-
Solution: We use (3.8) nent of the wave theory of light and
E i(s) + E r(s) = E t(s) four years later wrote a paper on dirac-
tion for which he was awarded a prize
and (3.14), which with the help of Snell’s law is written by the French Academy of Sciences. A
year later he was appointed commis-
(s) (s) sin θi cos θt (s) sioner of lighthouses, which motivated
Ei − Er = E (3.15)
sin θt cos θi t the invention of the Fresnel lens (still
used in many commercial applications).
If we add these two equations, we get Fresnel was under appreciated before
his untimely death from tuberculosis.
The ratio of the reflected and transmitted field components to the incident
field components are specified by the following coefficients, called the Fresnel
coefficients:
All of the above forms of the Fresnel coefficients are potentially useful, depending
1 on the problem at hand. Remember that the angles in the coefficient are not inde-
pendently chosen, but are subject to Snell’s law (3.7). (The right-most expression
for each coefficient is obtained from the first form using Snell’s law).
The Fresnel coefficients pin down the electric field amplitudes on the two
0 sides of the boundary. They also keep track of phase shifts at a boundary. In
Fig. 3.3 we have plotted the Fresnel coefficients for the case of an air-glass inter-
-0.5 face. Notice that the reflection coefficients are sometimes negative in this plot,
which corresponds to a phase shift of π upon reflection (remember e i π = −1).
-1 Later we will see that when absorbing materials are encountered, more compli-
0 20 40 60 80
cated phase shifts can arise due to the complex index of refraction.
because the two electric fields are orthogonal and cannot interfere with each
other. The total reflected intensity is therefore
(p) (p)
I r(total) = I r(s) + I r = R s I i(s) + R p I i (3.23)
where the incident intensity is given by (2.62):
·¯ ¯ ¸
(total) (s) (p) 1 ¯ (s) ¯2 ¯ (p) ¯2
¯ ¯
Ii = I i + I i = n i ²0 c ¯E i ¯ + ¯E i ¯ (3.24)
2
Since intensity is power per area, we can rewrite (3.23) as incident and re-
flected power:
(p) (p)
P r(total) = P r(s) + P r = R s P i(s) + R p P i (3.25)
Using this expression and requiring that energy be conserved (i.e. P i(total) = P r(total) +
P t(total) ), we find that the portion of the power that transmits is
(p) (p)
P t(total) = P i(s) + P i − P r(s) + P r
¡ ¢ ¡ ¢
1
¢ (p) (3.26)
= (1 − R s ) P i(s) + 1 − R p P i
¡
0.8
From this expression we see that the transmittance (i.e. the fraction of the light
0.6
that transmits) for either polarization is
Ts ≡ 1 − Rs and Tp ≡ 1 − Rp (3.27) 0.4
Figure 3.4 shows typical reflectance and transmittance values for an air-glass 0.2
interface.
0
You might be surprised at first to learn that 0 20 40 60 80
¯ ¯2
T s 6= |t s |2 and T p 6= ¯t p ¯ (3.28)
Figure 3.4 The reflectance and
However, recall that the transmitted intensity (in terms of the transmitted fields)
transmittance plotted versus θi for
depends also on the refractive index. The Fresnel coefficients t s and t p relate the the case of an air-glass interface
bare electric fields to each other, whereas the transmitted intensity (similar to with n i = 1 and n t = 1.5.
(3.24)) is ·¯
1 ¯ ¸
¯ (s) ¯2 ¯ (p) ¯2
¯ ¯
(total) (s) (p)
It = I t + I t = n t ²0 c ¯E t ¯ + ¯E t ¯ (3.29)
2
Therefore, we expect T s and T p to depend on the ratio of the refractive indices n t
and n i as well as on the squares of t s and t p .
There is another more subtle reason for the inequalities in (3.28). Consider
a lateral strip of light associated with a plane wave incident upon the material
interface in Fig. 3.5. Upon refraction into the second medium, the strip is seen
to change its width by the factor cos θt / cos θi . This is a geometrical effect, owing
to the change in propagation direction at the interface. The change in direction
alters the intensity (power per area) but not the power. In computing the trans-
mittance, we must remove this geometrical effect from the ratio of the intensities,
which leads to the following transmittance coefficients:
n t cos θt
Ts = |t s |2
n i cos θi
(valid when no total internal reflection) (3.30)
n t cos θt ¯¯ ¯¯2
Tp = tp Figure 3.5 Light refracting into a
n i cos θi
surface
80 Chapter 3 Reflection and Refraction
Note that (3.30) is valid only if a real angle θt exists; it does not hold when the
incident angle exceeds the critical angle for total internal reflection, discussed in
section 3.5. In that situation, we must stick with (3.27).
Example 3.2
Show analytically for p-polarized light that R p + Tp = 1, where R p is given by (3.22)
and T p is given by (3.30).
Then
(3.31) into Snell’s law (3.7), we can solve for the incident angle θi that gives rise to
this special circumstance:
³π ´
n i sin θi = n t sin − θi = n t cos θi (3.32)
2
The angle that satisfies this equation, in terms of the refractive indices, is
readily found to be
nt
θB = tan−1 (3.33)
ni
We have replaced the specific θi with θB in honor of Sir David Brewster who first
discovered the phenomenon. The angle θB is called Brewster’s angle. At Brewster’s Oscillating 0
angle, no p-polarized light reflects (see L 3.4). Physically, the p-polarized light Dipole
cannot reflect because kr and kt are perpendicular. A reflection would require
the microscopic dipoles at the surface of the second material to radiate along
270 90
their axes, which they cannot do. Maxwell’s equations ‘know’ about this, and so
everything is nicely consistent.
180
3.5 Total Internal Reflection
Figure 3.6 The intensity radiation
pattern of an oscillating dipole as
From Snell’s law (3.7), we can compute the transmitted angle in terms of the
a function of angle. Note that the
incident angle:
dipole does not radiate along the
ni
µ ¶
θt = sin−1 sin θi (3.34) axis of oscillation, giving rise to
nt Brewster’s angle for reflection.
The angle θt is real only if the argument of the inverse sine is less than or equal to
one. If n i > n t , we can find a critical angle at which the argument begins to exceed
one:
nt
θc ≡ sin−1 (3.35)
ni
When θi > θc , then there is total internal reflection and we can directly show that
R s = 1 and R p = 1 (see P3.9).5 To demonstrate this, one computes the Fresnel
coefficients (3.18) and (3.20) while employing the following substitutions:
ni
sin θt = sin θi (Snell’s law) (3.36)
nt
and v
u 2
un
cos θt = i t i2 sin2 θi − 1 (θi > θc ) (3.37)
nt
(see P0.19).
In this case, θt is a complex number. However, we do not assign geometrical
significance to it in terms of any direction. Actually, we don’t even need to know
the value for θt ; we need only the values for sin θt and cos θt , as specified in (3.36)
and (3.37). Even though sin θt is greater than one and cos θt is imaginary, we can
use their values to compute r s , r p , t s , and t p . (Complex notation is wonderful!)
5 M. Born and E. Wolf, Principles of Optics, 7th ed., Sect. 1.5.4 (Cambridge University Press, 1999).
82 Chapter 3 Reflection and Refraction
Upon substitution of (3.36) and (3.37) into the Fresnel reflection coefficients
(3.18) and (3.20) we obtain
r
ni n i2
nt cos θi − i n t2
sin2 θi − 1
rs = r (θi > θc ) (3.38)
ni n i2
nt cos θi + i n t2
sin θi − 1
2
and r
ni n i2
cos θi − i nt n t2
sin2 θi − 1
rp = − r (θi > θc ) (3.39)
ni n i2
cos θi + i nt n t2
sin θi − 1
2
These Fresnel coefficients can be manipulated (see P3.9) into the forms
v
u 2
n t
u n
t i sin2 θ − 1
r s = exp −2i tan−1 i (θi > θc ) (3.40)
n i cos θi n 2
t
and
v
u 2
n i
u n
t i sin2 θ − 1
r p = − exp −2i tan−1 i (θi > θc ) (3.41)
n t cos θi n 2
t
Figure 3.8 plots the evanescent wave described by (3.42) along with the associ-
ated incident wave. The phase of the evanescent wave indicates that it propagates
parallel to the boundary (in the y-dimension). Its strength decays exponentially
away from the boundary (in the z-dimension). We leave the calculation of t s and
t p as an exercise (P3.10).
1999).
84 Chapter 3 Reflection and Refraction
r s = |r s | e i φs (3.47)
and
r p = ¯r p ¯ e i φp
¯ ¯
(3.48)
We refrain from putting (3.45) and (3.46) into this form using the general ex-
pressions; we would get a big mess. It is a good idea to let your calculator or
a computer do it after a specific value for N ≡ n + i κ is chosen. An important
point to notice is that the phases upon reflection can be very different for s and
p-polarization components (i.e. φp and φs can be very¯ different).
¯ This is true in
general, even when the reflectivity is high (i.e. |r s | and r p on the order of unity).
¯ ¯
Brewster’s angle exists also for surfaces with complex refractive index. How-
ever, in general the expressions (3.46) and (3.48) do not go to zero at any incident
angle θi . Rather, the reflection of p-polarized light can go through a minimum at
some angle θi , which we refer to as Brewster’s angle (see Fig. ¯ ¯3.9). This minimum
is best found numerically since the general expression for ¯r p ¯ in terms of n and κ
and as a function of θi can be unwieldy.
in the material with index n 2 . We have assumed that the rectangle is small enough
that the fields are uniform within the half rectangle on either side of the boundary.
Next, we shrink the loop down until it has zero surface area by letting the
lengths `1 and `2 go to zero. In this situation, the right-hand side of Faraday’s law
(3.49) goes to zero Z
B · n̂ d a → 0 (3.51)
S
∂E
I Z µ ¶
B · d ` = µ0 J + ²0 · n̂ d a (3.53)
∂t
C S
As before, we are able to perform the path integration on the left-hand side for
the geometry depicted in the figure, which gives
I
B · d ` = B 1|| d −B 1⊥ `1 −B 2⊥ `2 −B 2|| d +B 2⊥ `2 +B 1⊥ `1 = B 1|| − B 2|| d (3.54)
¡ ¢
The notation for parallel and perpendicular components on either side of the
interface is similar to that used in (3.50).
Again, we can shrink the loop down until it has zero surface area by letting the
lengths `1 and `2 go to zero. In this situation, the right-hand side of (3.53) goes to
zero (ignoring the possibility of surface currents):
∂E
Z µ ¶
J + ²0 · n̂ d a → 0 (3.55)
∂t
S
8 This form can be obtained from (1.4) by integration over the surface S in Fig. 3.10 and applying
Exercises
P3.1 Derive the Fresnel coefficients (3.20) and (3.21) for p-polarized light.
P3.2 Verify that each of the alternative forms given in (3.18)–(3.21) are equiv-
alent (given Snell’s law). Show that at normal incidence (i.e. θi = θt = 0)
the Fresnel coefficients reduce to
nt − ni 2n i
lim r s = lim r p = − and lim t s = lim t p =
θi →0 θi →0 nt + ni θi →0 θi →0 nt + ni
P3.3 Undoubtedly the most important interface in optics is when air meets
glass. Use a computer to make the following plots for this interface as a
function of the incident angle. Use n i = 1 for air and n t = 1.6 for glass.
Explicitly label Brewster’s angle on all of the applicable graphs.
(a) r p and t p (plot together on same graph)
(b) R p and T p (plot together on same graph)
(c) r s and t s (plot together on same graph)
(d) R s and T s (plot together on same graph)
L3.4 (a) In the laboratory, measure the reflectance for both s and p polarized
light from a flat glass surface at about ten points. You can normalize
the detector by placing it in the incident beam of light before the glass
surface. Especially watch for Brewster’s angle (described in section 3.4).
Figure 3.11 illustrates the experimental setup. (video)
High sensitivity
detector
Slide detector
with the beam
Uncoated glass
Polarizer on rotation stage
Laser
Take the index of refraction for glass to be n t = 1.54 and the index for air
to be one. Plot this theoretical calculation as a smooth line on a graph.
Plot your experimental data from (a) as points on this same graph (not
points connected by lines).
P3.8 Diamonds have an index of refraction of n = 2.42 which allows total in-
ternal reflection to occur at relatively shallow angles of incidence. Gem
cutters choose facet angles that ensure most of the light entering the B
top of the diamond will reflect back out to give the stone its expensive
sparkle. One such cut, the “Eulitz Brilliant" cut, is shown in Fig. 3.13. A 33.4°
18.0° 40.5°
(a) What is the critical angle for diamond?
(b) One way to spot fake diamonds is by noticing reduced brilliance in 50.6° 40.5°
(c) What is the phase shift due to reflection for s-polarized light at the
first internal reflection depicted in the figure (incident angle 40.5◦ ) in
diamond? What is the phase shift in fused quartz?
P3.9 Derive (3.40) and (3.41) and show that R s = 1 and R p = 1. HINT: See
problem P0.15.
P3.10 Compute t s and t p in the case of total internal reflection. Put your
answer in polar form (i.e. t = |t |e i φ ).
P3.12 Light (λvac = 500 nm) reflects internally from a glass surface (n = 1.5)
surrounded by air. The incident angle is θi = 45◦ . An evanescent wave
travels parallel to the surface on the air side. At what distance from the
surface is the amplitude of the evanescent wave 1/e of its value at the
surface?
Figure 3.14 Geometry for P3.15 P3.15 The complex index for silver is given by n = 0.13 and κ = 4.0. Find r s
and r p when reflecting from vacuum (n = 1, κ = 0) at θi = 80◦ and put
them into the forms (3.47) and (3.48).
9 Are you surprised that the real part of the index can be less than one?
Chapter 4
89
90 Chapter 4 Multiple Parallel Interfaces
middle region.1
As of yet, we do not know the amplitudes or phases of the net forward and net
backward traveling plane waves in the middle layer. We denote them by E 1(s) and
(p) (p)
E 1(s) or by E 1 and E 1 , separated into their s and p components as usual. Similarly,
(p) (p)
E 0(s) and E 0 as well as E 2(s) and E 2 are understood to include light that ‘leaks’
through the boundaries from the middle region. Thus, we need only concern
ourselves with the five plane waves depicted in Fig. 4.1.
The various plane-wave fields are connected to each other at the boundaries
via the single-boundary Fresnel coefficients (3.18)–(3.21). At the first surface we
define
sin θ1 cos θ0 − sin θ0 cos θ1 cos θ1 sin θ1 − cos θ0 sin θ0
r s01 ≡ r p01 ≡
sin θ1 cos θ0 + sin θ0 cos θ1 cos θ1 sin θ1 + cos θ0 sin θ0
(4.1)
2 sin θ1 cos θ0 2 cos θ0 sin θ1
t s01 ≡ t p01 ≡
sin θ1 cos θ0 + sin θ0 cos θ1 cos θ1 sin θ1 + cos θ0 sin θ0
The notation 0 1 indicates the first surface from the perspective of starting
on the incident side and propagating towards the middle layer. The Fresnel
coefficients for the backward traveling light approaching the first interface from
within the middle layer are given by
where 1 0 again indicates connections at the first interface, but from the per-
spective of beginning inside the middle layer. Finally, the single-boundary coeffi-
cients for light approaching the second interface are
The forward-traveling wave in the middle region arises from both a transmis-
sion of the incident wave and a reflection of the backward-traveling wave in the
middle region at the first interface. Using the Fresnel coefficients, we can write
E 1(s) as the sum of fields arising from E 0(s) and E 1(s) as follows:
The factor t s01 and r s10 are the single-boundary Fresnel coefficients selected
appropriately from (4.1). Similarly, the overall reflected field E 0(s) , is given by the
reflection of the incident field and the transmission of the backward-traveling
field in the middle region according to
Note that E 2(s) stand for the transmitted field at the point (y, z) = (0, d ); its local
phase can be built into its definition so no need to write an explicit phase.
The backward-traveling plane wave in the middle region arises from the
reflection of the forward-traveling plane wave in that region:
Like before, E 1(s) is referenced to the origin (y, z) = (0, 0). Therefore, the factor
e i k1 ·r = e −i k1 d cos θ1 is needed at (y, z) = (0, d ).
The relations (4.4)–(4.7) permit us to find overall transmission and reflection
coefficients for the two-interface problem.
Example 4.1
Derive the transmission coefficient that connects the final transmitted field to the
incident field for the double-interface problem according to t stot ≡ E 2(s) /E 0(s) .
2 In the middle region, k = k 1 ŷ sin θ1 + ẑ cos θ1 and k1 = k 1 ŷ sin θ1 − ẑ cos θ1 .
¡ ¢ ¡ ¢
1
4.2 Two-Interface Transmittance at Sub Critical Angles 93
E 2(s) −i k1 d cos θ1
E 1(s) = e (4.8)
t s12
r s12
E 1(s) = E 2(s) e i k1 d cos θ1 (4.9)
t s12
Next, substituting both (4.8) and (4.9) into (4.4) yields the connection we seek
between the incident and transmitted fields:
E 2(s) −i k1 d cos θ1 r 12
12
e = t s01 E 0(s) + r s10 E 2(s) s12 e i k1 d cos θ1 (4.10)
ts ts
The coefficient t stot derived in Example 4.1 connects the amplitude and phase
of the incident field to the amplitude and phase of the transmitted field in a
manner similar to the single-boundary Fresnel coefficients. The numerator of
(4.11) reminds us of the physics of the situation: the field transmits through the
first interface, acquires a phase due to propagating through the middle layer, and
transmits through the second interface. The denominator of (4.11) modifies the
result to account for feedback from multiple reflections in the middle region.3
The overall reflection coefficient is found to be (see P4.1)
Again the equation reminds us of the basic physics, and we did not completely
simplify the expression to make this more apparent. There is an initial reflection
from the first interface. That light is joined by light that transmits through the first
interface (looking at only the numerator of the second term), propagates through
the middle layer, reflects from the second interface, propagates back through the
middle layer, and transmits back through the first interface. The denominator of
the second term accounts for the effects of multiple-reflection feedback.
alternative approach arriving at the same result via an infinite geometric series, see M. Born and E.
Wolf, Principles of Optics, 7th ed., Sect. 7.6.1 (Cambridge University Press, 1999) or G. R. Fowles,
Introduction to Modern Optics, 2nd ed., Sect 4.1 (New York: Dover, 1975).
94 Chapter 4 Multiple Parallel Interfaces
a simpler form than the reflection coefficient (4.12), it will be easier to calcu-
late the total transmittance T stot and obtain the reflectance, if desired, from the
relationship
T stot + R stot = 1 (4.13)
When the transmitted angle θ2 is real, we may write the fraction of the transmitted
power as in (3.30):
where
T s01 T s12
T smax ≡ ³ p ´2 (4.16)
1 − R s10 R s12
The quantity T smax is the maximum possible transmittance of power through the
two surfaces. The single-interface transmittances (T s01 and T s12 ) and reflectances
(R s10 and R s12 ) are calculated from the single-interface Fresnel coefficients in
the usual way as described in chapter 3. The numerator of T smax represents the
combined transmittances for the two interfaces without considering feedback
due to multiple reflections. The denominator enhances this value to account for
reinforcing feedback in the middle layer.
The phase delay experienced by the plane wave in the middle region is de-
scribed by Φ. The term 2k 1 d cos θ1 represents the phase delay acquired during
round-trip propagation in the middle region. The terms δr s10 and δr s12 account
4 M. Born and E. Wolf, Principles of Optics, 7th ed., Sect. 7.6.1 (Cambridge University Press, 1999).
4.2 Two-Interface Transmittance at Sub Critical Angles 95
for possible phase shifts upon reflection from each interface. They are defined
indirectly from the single-boundary Fresnel reflection coefficients:
¯ ¯ i δ 10 ¯ ¯ i δ 1 2
r s10 = ¯r s10 ¯ e r s and r s12 = ¯r s12 ¯ e r s (4.19)
If all the indices in the double-boundary system are real, then δr s10 and δr s12
can only be zero or π (i.e. the coefficients can only be positive or negative real
numbers).
F s is called the coefficient of finesse (not to be confused with reflecting finesse
defined in section 4.6), which determines how strongly the transmittance is
influenced when Φ is varied (for example, through varying d or the wavelength
λvac ).
Example 4.2
Consider a ‘beam splitter’ designed for s-polarized light incident on a substrate of
Partial Anti-reflection
glass (n = 1.5) at 45◦ as shown in Fig. 4.2. A thin coating of zinc sulfide (n = 2.32)
reflection coating
is applied to the front of the glass to cause about half of the light to reflect. A coating
magnesium fluoride (n = 1.38) coating is applied to the back surface of the glass to
minimize reflections at that surface.5 Each coating constitutes a separate double- 46%
interface problem. The front coating is deferred to problem P4.5. In this example, 54%
find the highest transmittance possible through the antireflection film at the back
of the ‘beam splitter’ and the smallest possible d 2 that accomplishes this for light
with wavelength λvac = 633 nm. Glass
Solution: For the back coating, we have n 0 = 1.5, n 1 = 1.38, and n 2 = 1. We can
find θ0 and θ1 from θ2 = 45◦ using Snell’s law
Figure 4.2 Side view of a beam-
sin 45◦
µ ¶
n 1 sin θ1 = sin θ2 ⇒ θ1 = sin −1
= 30.82◦ splitter.
1.38
µ ◦¶
−1 sin 45
n 0 sin θ0 = sin θ2 ⇒ θ0 = sin = 28.13◦
1.5
Next we calculate the single-boundary Fresnel coefficients:
δr s10 = π , δr s12 = 0
5 We ignore possible feedback between the front and rear coatings. Since the antireflection
films are usually imperfect, beam splitter substrates are often slightly wedged so that unwanted
reflections from the second surface travel in a different direction.
96 Chapter 4 Multiple Parallel Interfaces
0.960
T stot =
θ1 +π
³ ´
1 + 0.0570 sin2 2k1 d2 cos
2
The maximum transmittance occurs when the sine is zero. In that case, T stot =
0.960, meaning that 96% of the light is transmitted. We find the thickness by setting
the argument of the sine to π
2k 1 d 2 cos θ1 + π = 2π
λvac 633 nm
d2 = = = 134 nm
4n 1 cos θ1 4(1.38) cos 30.82◦
Without the coating, (i.e. d 2 = 0), the transmittance through the antireflection
coating would be 0.908, so the coating does give an improvement.
Note that beyond the critical angle, sin θ1 is greater than one. We illustrate how to
apply (4.14) via a specific example:
Example 4.3
Calculate the transmittance of p-polarized light through the region between two
closely spaced 45◦ right prisms, as shown in Fig. 4.4, as a function of λvac and
the prism spacing d . Take the index of refraction of the prisms to be n = 1.5
surrounded by index n = 1, and use θ0 = θ2 = 45◦ . Neglect possible reflections
from the exterior surfaces of the prisms.
¯2
¯
2 cos θ1 sin θ2
¯2 ¯¯ 2 (i 0.3536) p1
¯ 12 ¯2 ¯¯
¯ ¯ ¯
¯ 2
¯t p ¯ = ¯ ¯ = ¯¯ ¯ = 0.640
¯
cos θ2 sin θ2 + cos θ1 sin θ1 ¯ ¯ p1 p1 + (i 0.3536) (1.061) ¯
2 2
For the last step in the r p12 calculation, see problem P0.15. Also note that r p12 =
r p10 = −r p01 since n 0 = n 2 . We also need
d
µ ¶
2π 2π
k 1 d cos θ1 = d cos θ1 = d (i 0.3536) = i 2.22
λvac λvac λvac
98 Chapter 4 Multiple Parallel Interfaces
1 We are now ready to compute the total transmittance (4.14). The factors out in
front vanish since θ0 = θ2 and n 0 = n 2 , and we have
¯ 01 ¯2 ¯ 12 ¯2
¯ ¯ ¯ ¯
¯t p ¯ ¯t p ¯
T ptot = ¯
¯e −i k1 d cos θ1 − r 10 r 12 e i k1 d cos θ1 ¯2
¯
p p
(5.76)(0.640)
=¯ h ³ ´i h ³ ´i ¯2
¯ −i i 2.22 d d
−i 1.287 e −i 1.287 e i i 2.22 λvac ¯
¯
¯e λvac − e
¯ ¯
0
0 0.5 1 1.5 2 3.69
=µ ³ ´ ³ ´ ¶µ ³ ´ ³ ´ ¶
2.22 λ d −2.22 λ d −i 2.574 2.22 λ d −2.22 λ d +i 2.574
e vac −e vac e vac −e vac
(4.22)
Figure 4.5 Plot of (4.22)
3.69
= ³
d
´ ³
d
´ ³ ´
4.44 λvac
−4.44 λvac e i 2.574 +e −i 2.574
e +e −2 2
3.69
= ³
d
´ ³
d
´
4.44 λvac
−4.44 λvac
e +e − 2 cos(2.574)
3.69
= ³
d
´ ³
d
´
4.44 λvac
−4.44 λvac
e +e + 1.69
Figure 4.5 shows a plot of the transmittance (4.22) calculated in Example 4.3.
Notice that the transmittance is 100% when the two prisms are brought together
as expected (T ptot (d /λvac = 0) = 1). When the prisms are about a wavelength apart,
the transmittance is significantly reduced, and as the distance gets large compared
to a wavelength, the transmittance quickly goes to zero (T ptot (d /λvac À 1) ≈ 0).
Maurice Paul Auguste Charles Fabry
(1867-1945, French) was born in Mar-
seille, France. At age 18, he entered the 4.4 Fabry-Perot
École Polytechnique in Paris where he
studied for two years. Following that, he
In the 1890s, Charles Fabry realized that a double interface could be used to
spent a number of years teaching state
secondary school while simultaneously distinguish wavelengths of light that are very close together. He and a talented
working on a doctoral dissertation on experimentalist colleague, Alfred Perot, constructed an instrument and began
interference phenomona. After com-
pleting his doctorate, he began working
to use it to make measurements on various spectral sources. The Fabry-Perot
as a lecturer and laboratory assistant instrument6 consists of two identical (parallel) surfaces separated by spacing d .
at the University of Marseille where a
decade later he was appointed a pro-
We can use our analysis in section 4.2 to describe this instrument. For simplicity,
fessor of physics. Soon after his arrival we choose the refractive index before the initial surface and after the final surface
to the University of Marseille, Fabry
to be the same (i.e. n 0 = n 2 ). We assume that the transmission angles are such
began a long and fruitful collaboration
with Alfred Perot (1863-1925). Fabry that total internal reflection is avoided. The transmission through the device
focused on theoretical analysis and mea- depends on the exact spacing between the two surfaces, the reflectivity of the
surements while his colleague did the
design work and construction of their surfaces, as well as on the wavelength of the light.
new interferometer, which they continu- If the spacing d separating the two parallel surfaces is adjustable (scanned),
ally improved over the years. During his
career, Fabry made signicant contribu-
the instrument is called a Fabry-Perot interferometer. If the spacing is fixed while
tions to spectroscopy and astrophysics the angle of the incident light is varied, the instrument is called a Fabry-Perot
and is credited with co-discovery of the
etalon. An etalon can therefore be as simple as a piece of glass with parallel
ozone layer. See J. F. Mulligan, Who
were Fabry and Perot?, Am. J. Phys. 6 M. Born and E. Wolf, Principles of Optics, 7th ed., Sect. 7.6.2 (Cambridge University Press, 1999).
66. 797-802 (1998).
4.4 Fabry-Perot 99
In principle, these equations should be evaluated for either s- or p-polarized light. where he began his collaboration with
Fabry. Perot contributed his consider-
However, a Fabry-Perot interferometer or etalon is usually operated near normal able talent of instrument fabrication to
incidence so that there is little difference between the two polarizations. the endeavor. Perot spent much of his
later career making precision astronom-
When using a Fabry-Perot instrument, one observes the transmittance T tot as ical and solar measurements. See J. F.
the parameter Φ is varied. The parameter Φ can be varied by altering d , θ1 , or λ Mulligan, Who were Fabry and Perot?,
4πn 1 d
Φ= cos θ1 + δr (4.26)
λvac
To increase the sensitivity of the instrument, it is desirable to have the transmit-
tance T tot vary strongly when Φ is varied. By inspection of (4.23), we see that T tot
d
varies strongest if the finesse coefficient F is large. We achieve a large finesse
coefficient by increasing the reflectance R.
The basic setup of a Fabry-Perot instrument is shown in Fig. 4.6. In order to
achieve a relatively high reflectivity R (and therefore large F ), special coatings can
be applied to the surfaces, for example, a thin layer of silver to achieve a partial
reflection, say 90%. Typically, two glass substrates are separated by distance d ,
Incident
with the coated surfaces facing each other as shown in the figure. The substrates light R T
Ag A
are aligned so that the interior surfaces are parallel to each other. It is typical for coatings
each substrate to be slightly wedge-shaped so that unwanted reflections from the
outer surfaces do not interfere with the double boundary situation between the Figure 4.6 Typical Fabry-Perot
two plates. setup. If the spacing d is variable,
it is called an interferometer; oth-
Technically, each coating constitutes its own double-boundary problem (or
erwise, it is called an etalon.
multiple-boundary as the case may be). We can ignore this detail and simply
think of the overall setup as a single two-interface problem. Regardless of the
details of the coatings, we can say that each coating has a certain reflectance R
and transmittance T . However, as light goes through a coating, it can also be
attenuated because of absorption. In this case, we have
R +T + A = 1 (4.27)
100 Chapter 4 Multiple Parallel Interfaces
Angle
Adjustment
Aperture
4.5 Setup of a Fabry-Perot Instrument
Figure 4.8 shows the typical experimental setup for a Fabry-Perot interferometer.
Trig Sig
A collimated beam of light is sent through the instrument. The beam is aligned so
Oscilloscope that it is normal to the surfaces. It is critical for the two surfaces of the interferom-
Figure 4.8 Setup for a Fabry-Perot eter to be extremely close to parallel. When aligned correctly, the transmission
interferometer. of a collimated beam will ‘blink’ all together as the spacing d is changed (by tiny
amounts). A mechanical actuator can be used to vary the spacing between the
plates while the transmittance is observed on a detector. To make the alignment
of the instrument somewhat less critical, a small aperture can be placed in front
Transmittance
plate separation. After all, to see fringes, we just need to cause Φ in (4.23) to vary
Etalon
in some way. According to (4.26), we can do that as easily by varying θ1 as we can Point
Source
by varying d . One way to obtain a range of angles is to observe light from a ‘point
source’, as depicted in Fig. 4.10. Different portions of the beam go through the
device at different angles. When aligned straight on, the transmitted light forms a Angle
Adjustment Screen
‘bull’s-eye’ pattern on a screen.
In Fig. 4.11 we graph the transmittance T tot (4.23) as a function of angle Figure 4.10 A diverging monochro-
(holding λvac = 500 nm and d = 1 cm fixed). Since cos θ1 is not a linear function, matic beam traversing a Fabry-
the spacing of the peaks varies with angle. As θ1 increases from zero, the cosine Perot etalon. (The angle of diver-
steadily decreases, causing Φ to decrease. Each time Φ decreases by 2π we get a gence is exaggerated.)
new peak. Not surprisingly, only a modest change in angle is necessary to cause
the transmittance to vary from maximum to minimum, or vice versa.
The bull’s-eye pattern in Fig. 4.10 can be understood as the curve in Fig. 4.11
rotated about a circle. Depending on the exact spacing between the plates, the
Transmission
radii (or angles) where the fringes occur can be different. For example, the center
spot could be dark.
Spectroscopic samples often are not compact point-like sources. Rather, they
are extended diffuse sources. The point-source setup shown in Fig. 4.10 won’t
work for extended sources unless all of the light at the sample is blocked except 0
0 5 10 15
for a tiny point. This is impractical if there remains insufficient illumination at
the final screen for observation.
Figure 4.11 Transmittance
In order to preserve as much light as possible, we can sandwich the etalon through a Fabry-Perot etalon
between two lenses. We place the diffuse source at the focal plane of the first lens. (F = 10) as the angle θ1 is varied. It
We place the screen at the focal plane of the second lens. This causes an image of is assumed that the distance d is
the source to appear on the screen.7 Each point of the diffuse source is mapped chosen such that Φ is a multiple of
to a corresponding point on the screen. Moreover, the light associated with any 2π when the angle is zero.
particular point of the source travels as a unique collimated beam in the region Diffuse
Source Screen
between the lenses. Each collimated beam traverses the etalon with a unique Lens Etalon Lens
angle. Thus, light associated with each emission point traverses the etalon with
higher or lower transmittance, according to the differing angles. The result is that
a bull’s eye pattern becomes superimposed on the image of the diffuse source.
The lens and retina of your eye can be used for the final lens and screen.
Thus far, we have examined how the transmittance through a Fabry-Perot instru-
ment varies with surface separation d and angle θ1 . However, the main purpose
of a Fabry-Perot instrument is to measure small changes in the wavelength of
light, which similarly affect the value of Φ (see (4.26)).8
7 If the diffuse source has the shape of Mickey Mouse, then an image of Mickey Mouse appears
4πn 1 d 0 cos θ1
Φ0 = + δr (4.28)
λ0
which we previously supposed is an integer times 2π. At a new wavelength (all
0
0 else remaining the same) we have
4πn 1 d 0 cos θ1
Φ= + δr (4.29)
Figure 4.13 Transmittance as the λ0 + ∆λ
spacing d is varied for two differ-
ent wavelengths (F = 100). The The change in wavelength ∆λ is usually very small compared to λ0 , so we can
solid line plots the transmittance represent the denominator with the first two terms of a Taylor-series expansion:
of light with a wavelength of λ0 ,
and the dashed line plots the 1 1 ∼ 1 − ∆λ/λ0
= = (4.30)
transmittance of a wavelength λ0 + ∆λ λ0 (1 + ∆λ/λ0 ) λ0
shorter than λ0 . Note that the
fringes shift positions for different Then the difference between Φ0 and Φ can be rewritten as
wavelengths. 4πn 1 d 0 cos θ1
∆Φ ≡ Φ0 − Φ = ∆λ (4.31)
λ20
If the change in wavelength is enough to cause ∆Φ = 2π, the fringes in Fig. 4.13
shift through a whole period, and the picture looks the same.
This brings up an important limitation of the instrument. If the fringes shift
by too much, we might become confused as to what exactly has changed, owing
to the periodic nature of the fringes. If two wavelengths aren’t sufficiently close,
the fringes of one wavelength may be shifted past several fringes of the other
wavelength, and we will not be able to tell by how much they differ.
This introduces the concept of free spectral range, which is the wavelength
change ∆λFSR that causes the fringes to shift through one period. We find this by
setting (4.31) equal to 2π. After rearranging, we get
λ2vac
∆λFSR = (4.32)
2n 1 d 0 cos θ1
The free spectral range tends to be extremely narrow; a Fabry-Perot instrument is
not well suited for measuring wavelength ranges wider than this. In summary, the
4.6 Distinguishing Nearby Wavelengths in a Fabry-Perot Instrument 103
free spectral range is the largest change in wavelength permissible while avoiding
confusion. To convert this wavelength difference ∆λFSR into a corresponding
frequency difference, one differentiates ν = c/λvac to get
c∆λFSR
|∆νFSR | = (4.33)
λ2vac
Example 4.4
A Fabry-Perot interferometer has plate spacing d 0 = 1 cm and index n 1 = 1. If it
is used in the neighborhood of λvac = 500 nm, find the free spectral range of the
instrument.
In solving for (4.35) for ΦFWHM , we see that this equation requires
2 ΦFWHM
µ ¶
F sin =1 (4.36)
4
where we have taken advantage of the fact that Φ0 is a multiple of 2π. Next,
we suppose that ΦFWHM is rather small so that we may represent the sine by its
104 Chapter 4 Multiple Parallel Interfaces
4
ΦFWHM ∼
=p . (4.37)
F
The ratio of the period between peaks 2π to the width ΦFWHM of individual peaks
is called the reflecting finesse (or just finesse).
p
2π π F
f ≡ = (4.38)
ΦFWHM 2
∆λFSR λ2vac
∆λFWHM = = p (4.39)
f πn 1 d 0 cos θ1 F
As a final note, the ratio of λ0 to ∆λmin , where ∆λmin is the minimum change
of wavelength that the instrument can distinguish in the neighborhood of λ0 is
called the resolving power. For a Fabry-Perot instrument it is
λ0
RP ≡ (4.40)
∆λFWHM
Fabry-Perot instruments tend to have very high resolving powers since they re-
spond to very small differences in wavelength.
Example 4.5
If the Fabry-Perot interferometer in Example 4.4 has reflectivity R = 0.85, find the
finesse, the minimum distinguishable wavelength separation, and the resolving
power.
∆λFSR 0.0125 nm
∆λFWHM = = = 0.00065 nm (4.41)
f 19
4.7 Multilayer Coatings 105
The instrument can distinguish two wavelengths separated by this tiny amount,
which gives an impressive resolving power of
λvac 500 nm
RP = = = 772, 000
∆λFWHM 0.00065 nm
For comparison, the resolving power of a typical grating spectrometer is much less
(a few thousand). However, a grating spectrometer has the advantage that it can
simultaneously observe wavelengths over hundreds of nanometers, whereas the
Fabry-Perot instrument is confined to the extremely narrow free spectral range.
z-direction
In each layer, only two plane waves exist, each of which is composed of light
arising from the many possible bounces from various layer interfaces. The arrows
pointing right indicate plane wave fields in individual layers that travel roughly
in the forward (incident) direction, and the arrows pointing left indicate plane
wave fields that travel roughly in the backward (reflected) direction. In the final
(p)
region, there is only one plane wave traveling with a forward direction (E N +1 )
which gives the overall transmitted field.
As we have studied in chapter 3 (see (3.9) and (3.13)), the boundary conditions
for the parallel components of the E field and for the parallel components of the
B field lead respectively to
cos θ0 E 0 + E 0 = cos θ1 E 1 + E 1
¡ (p) (p)
¢ ¡ (p) (p)
¢
(4.43)
and
¡ (p) (p)
¢ ¡ (p) (p)
¢
n 0 E 0 − E 0 = n 1 E 1 − E 1 (4.44)
Similar equations give the field connection for s-polarized light (see (3.8) and
(3.14)).
We have applied these boundary conditions at the first interface only. Of
course there are many more interfaces in the multilayer. For the connection
between the j th layer and the next, we may similarly write
³ ´
cos θ j E j e i k j ` j cos θ j + E j e −i k j ` j cos θ j = cos θ j +1 E j +1 + E j +1
(p) (p)
¡ (p) (p)
¢
(4.45)
and ³ ´
n j E j e i k j ` j cos θ j − E j e −i k j ` j cos θ j = n j +1 E j +1 − E j +1
(p) (p)
¡ (p) (p)
¢
(4.46)
Here we have set the origin within each layer at the left surface. Then when
making the connection with the subsequent layer at the right surface, we must
specifically take into account the phase k j · ` j ẑ = k j ` j cos θ j . This corresponds
¡ ¢
to the phase acquired by the plane wave field in traversing the layer with thickness
` j . The right-hand sides of (4.45) and (4.46) need no phase adjustment since the
( j + 1)th field is evaluated on the left side of its layer.
4.7 Multilayer Coatings 107
and ³ ´
n N E N e i k N `N cos θN − E N e −i k N `N cos θN = n N +1 E N +1
(p) (p) (p)
(4.48)
where ½
0 j =0
βj ≡ (4.50)
k j ` j cos θ j 1≤ j ≤N
and
(p)
E N +1 ≡ 0 (4.51)
(It would be good to take a moment to convince yourself that this set of matrix
equations properly represents (4.43)–(4.48) before proceeding.) We rewrite (4.49)
as
¸−1 ·
cos θ j e i β j cos θ j e −i β j cos θ j +1 cos θ j +1
· (p) ¸ · ¸· (p) ¸
E j E j +1
=
n j ei βj −n j e −i β j
(p) (p)
Ej n j +1 −n j +1 E j +1
(4.52)
Keep in mind that (4.52) represents a distinct matrix equation for each differ-
ent j . We can substitute the j = 1 equation into the j = 0 equation to get
¸−1
cos θ0 cos θ0 cos θ2 cos θ2
· (p) ¸ · · ¸· (p) ¸
E 0 (p) E 2
(p) = M1 (p) (4.53)
E0 n0 −n 0 n2 −n 2 E2
where we have grouped the matrices related to the j = 1 layer together via
¸−1
cos θ1 cos θ1 cos θ1 e i β1 cos θ1 e −i β1
· ¸·
(p)
M1 ≡ (4.54)
n1 −n 1 n 1 e i β1 −n 1 e −i β1
We can continue to substitute into this equation progressively higher order equa-
tions (i.e. for j = 2, j = 3, ... ) until we reach the j = N layer. All together this will
108 Chapter 4 Multiple Parallel Interfaces
give
¸−1 Ã N
!·
cos θ0 cos θ0 cos θN +1 cos θN +1
· (p) ¸ · (p) ¸ ¸·
E 0 E N +1
Y (p)
(p) = Mj
E0 n0 j =1
−n 0 0 n N +1 −n N +1
(4.55)
where the matrices related to the j th layer are grouped together according to
¸−1
cos θ j cos θ j cos θ j e i β j cos θ j e −i β j
· ¸·
(p)
Mj ≡
nj −n j n j ei βj −n j e −i β j
(4.56)
cos β j −i sin β j cos θ j /n j
· ¸
=
−i n j sin β j / cos θ j cos β j
The matrix inversion in the first line was performed using (0.35). The symbol Π
signifies the product of the matrices with the lowest subscripts on the left:
N
Y (p) (p) (p) (p)
M j ≡ M1 M2 · · · M N (4.57)
j =1
(p)
As a finishing touch, we divide (4.55) by the incident field E 0 as well as perform
the matrix inversion on the right-hand side to obtain
(p)
· ¸ · ± (p) ¸
1 (p) E N +1 E 0
(p)
± (p) =A (4.58)
E0 E 0 0
where
¸Ã N
!·
cos θ0 cos θN +1
· (p) (p) ¸ · ¸
(p) a 11 a 12 1 n0 0 Y (p)
A ≡ = Mj
− cos θ0
(p) (p)
a 21 a 22 2n 0 cos θ0 j =1
n0 0 n N +1
(4.59)
In the final matrix in (4.59) we have replaced the entries in the right column with
zeros. This is permissable since it operates on a column vector with zero in the
bottom component.
Equation (4.58) represents two equations, which must be solved simultane-
(p) (p) (p) (p)
ously to find the ratios E 0 /E 0 and E N +1 /E 0 . Once the matrix A (p) is computed,
this is a relatively simple task:
(p)
E N +1 1
tp ≡ (p)
= (p)
(Multilayer) (4.60)
E 0 a 11
(p) (p)
E0 a 21
rp ≡ (p)
= (p)
(Multilayer) (4.61)
E 0 a 11
The convenience of this notation lies in the fact that we can deal with an
arbitrary number of layers N with varying thickness and index. The essential
information for each layer is contained succinctly in its respective 2 × 2 char-
acteristic matrix M . To find the overall effect of the many layers, we need only
4.8 Repeated Multilayer Stacks 109
multiply the matrices for each layer together to find A from which we compute
the reflection and transmission coefficients for the whole system.
The derivation for s-polarized light is similar to the above derivation for p-
polarized light. The equation corresponding to (4.58) for s-polarized light turns
out to be · ¸ · (s) ± (s) ¸
1 (s) E N +1 E 0
(s)
± (s) =A (4.62)
E 0 E 0 0
where
¸Ã N
!·
n 0 cos θ0
· (s) (s)
¸ · ¸
(s) a 11 a 12 1 1 Y (s) 1 0
A ≡ = Mj
(s)
a 21 (s)
a 22 2n 0 cos θ0 n 0 cos θ0 −1 j =1
n N +1 cos θN +1 0
(4.63)
and
cos β j −i sin β j /(n j cos θ j )
· ¸
M (s) = (4.64)
j −i n j cos θ j sin β j cos β j
The transmission and reflection coefficients are found (as before) from
E N(s)+1 1
ts ≡ (s)
= (s) (Multilayer) (4.65)
E 0 a 11
E 0(s) (s)
a 21
rs ≡ = (Multilayer) (4.66)
E 0(s) a 11
(s)
substrate
Sometimes multilayer coatings are made with repeated stacks of layers. In
general, if the same series of layers in (4.69) is repeated many times, say q times,
Sylvester’s theorem (see appendix 0.3) can come in handy:
¸q
A sin qθ − sin q − 1 θ
· · ¡ ¢ ¸
A B 1 B sin qθ
= (4.67) Figure 4.16 A repeated multilayer
D sin qθ − sin q − 1 θ
¡ ¢
C D sin θ C sin qθ
structure with alternating high
where and low indexes where each layer
1 is a quarter wavelength in thick-
cos θ ≡
(A + D) . (4.68)
2 ness. This structure can achieve
This formula relies on the condition AD − BC = 1, which is true for matrices of very high reflectance.
the form (4.56) and (4.64) or any product of them. Here, A, B , C , and D represent
the elements of a matrix composed of a block of matrices corresponding to a
repeated pattern within the stack.
In general, high-reflection coatings are designed with alternating high and
low refractive indices. For high reflectivity, each layer should have a quarter-
wave thickness. Since the layers alternate high and low indices, at every other
110 Chapter 4 Multiple Parallel Interfaces
boundary there is a phase shift of π upon reflection from the interface. Hence,
the quarter wavelength spacing is appropriate to give constructive interference in
the reflected direction.
Example 4.6
Derive the reflection and transmission coefficients for p polarized light interacting
with a high reflector constructed using a λ/4 stack.
−i cos θ j /n j
· ¸
(p) 0
Mj =
−i n j / cos θ j 0
The matrices for a high and a low refractive index layer are multiplied together in
the usual manner. Each layer pair takes the form
θH θL
# " n cos θ
− i cos − i cos
" #" #
0 nH 0 nL
− nL cos θH 0
= H L
i nH i nL cos θL
− cos θH 0 − cos θL 0 0 − nn Hcos θ L H
n L cos θH q
³ ´
− n cos θ 0
H L
= ³
cos θL q
´
0 − nn Hcos θ L H
With A (p) in hand, we can now calculate the transmission coefficient from (4.60)
1
tp = ³ (λ/4 stack, p-polarized) (4.70)
θH q cos θN +1 cos θL q
´ ³ ´
n N +1
− nnL cos
cos θ cos θ0 + − nn Hcos θ n0
H L L H
t
-0.5
r
-1
0 5 10
q
Exercises
P4.3 Verify that in the case that θ1 and θ2 are real that (4.14) simplifies to
(4.15).
P4.4 A light wave impinges at normal incidence on a thin glass plate with
index n and thickness d .
(a) Show that the transmittance through the plate as a function of
wavelength is
1
T tot = 2
1 + ( 4n 2 ) sin2 2πnd
2
³ ´
n −1
λvac
HINT: Find
n −1
r 12 = r 10 = −r 01 =
n +1
and then use
T 01 = 1 − R 01
T 12 = 1 − R 12
(b) If n = 1.5, what is the maximum and minimum transmittance
through the plate?
(c) If the plate thickness is d = 150 µm, what wavelengths transmit with
maximum efficiency? Express your answer as a formula involving an
integer j .
P4.5 Show that the maximum reflectance possible from the front coating in
Example 4.2 is 46%. Find the smallest possible d 1 that accomplishes
this for light with wavelength λvac = 633 nm.
Exercises 113
P4.6 Re-compute (4.22) in the case of s-polarized light. Write the result in
the same form as the last expression in (4.22).
Figure 4.19
Separation (cm)
(a) Use a computer to plot the transmittance through the gap (i.e. the
result of P4.6) as a function of separation d (normal to gap surface). Figure 4.18 Theoretical vs. mea-
Neglect reflections from other surfaces of the prisms. sured microwave transmission
through wax prisms. Mismatch is
(b) Measure the transmittance of the microwaves through the prisms
presumably due to imperfections
as function of spacing d (normal to the surface) and superimpose the
in microwave collimation and/or
results on the graph of part (a). Figure 4.18 shows a plot of typical data extraneous reflections.
taken with this setup. (video)
P4.9 Generate a plot like Fig. 4.11(a), showing the fringes you get in a Fabry-
Perot etalon when θ1 is varied. Let Tmax = 1, F = 10, λ = 500 nm,
d = 1 cm, and n 1 = 1.
(a) Plot T vs. θ1 over the angular range used in Fig. 4.11(a).
(c) Suppose d was slightly different, say 1.00001 cm. Make a plot of T
vs θ1 for this situation.
114 Chapter 4 Multiple Parallel Interfaces
P4.10 Consider the configuration depicted in Fig. 4.10, where the center of the
diverging light beam λvac = 633 nm approaches the plates at normal
incidence. Suppose that the spacing of the plates (near d = 0.5 cm) is
just right to cause a bright fringe to occur at the center. Let n 1 = 1. Find
the angle for the m th circular bright fringe surrounding the central spot
(the 0th fringe corresponding to the center). HINT: cos θ ∼ = 1−θ 2 /2. The
p
answer has the form a m; find the value of a.
Diverging Lens
Laser
Filter
Fabry-Perot CCD
Etalon Camera
Figure 4.20
L4.12 Use the same Fabry-Perot etalon to observe the Zeeman splitting of the
N yellow line λ = 587.4 nm emitted by a krypton lamp when a magnetic
Filter
field is applied. As the line splits and moves through half of the free
spectral range, the peak of the decreasing wavelength and the peak of
CCD the increasing wavelength meet on the screen. When this happens, by
S Fabry-Perot Camera
Etalon how much has each wavelength shifted? (video)
Figure 4.21
Exercises for 4.7 Multilayer Coatings
P4.14 Show that (4.65) for a single layer (i.e. two interfaces), is equivalent to
(4.11). WARNING: This is more work than it may appear at first.
P4.15 (a) What should be the thickness of the high and the low index layers in
a periodic high-reflector mirror? Let the light be p-polarized and strike
Exercises 115
the mirror surface at 45◦ . Take the indices of the layers be n H = 2.32
and n L = 1.38, deposited on a glass substrate with index n = 1.5. Let
the wavelength be λvac = 633 nm.
(b) Find the reflectance R with 1, 2, 4, and 8 periods in the high-low
stack.
P4.16 Find the high-reflector matrix for s-polarized light that corresponds to
(4.69).
P4.17 Design an anti-reflection coating for use in air (assume the index of air
is 1):
(a) Show that for normal incidence and λ/4 films (thickness= 14 the
wavelength of light inside the material), the reflectance of a single layer
(n 1 ) coating on a glass is
!2
n g − n 12
Ã
R=
n g + n 12
(b) Show that for a two coating setup (air-n 1 -n 2 -glass; n 1 and n 2 are
each a λ/4 film), that !2
n 22 − n g n 12
Ã
R= 2
n 2 + n g n 12
(c) If n g = 1.5, and you have a choice of these common coating ma-
terials: ZnS (n = 2.32), CeF (n = 1.63) and MgF (n = 1.38), find the
combination that gives you the lowest R for part (b). (Be sure to specify
which material is n 1 and which is n 2 .) What R does this combination
give?
To this point, we have considered only isotropic media where the susceptibility
χ(ω) (and hence the index of refraction) is the same for all propagation directions
and polarizations. In anisotropic materials, such as crystals, it is possible for
light to experience a different index of refraction depending on the orientation
(i.e. polarization) of the electric field E. This difference in the index of refraction
occurs when the direction and strength of the induced dipoles depends on the
lattice structure of the material in addition to the propagating field.1 The unique
properties of anisotropic materials make them important elements in many
optical systems.
In section 5.1 we discuss how to connect E and P in anisotropic media using a
susceptibility tensor. In section 5.2 we apply Maxwell’s equations to a plane wave
traveling in a crystal. The analysis leads to Fresnel’s equation, which relates the
components of the k-vector to the components of the susceptibility tensor. In
section 5.3 we apply Fresnel’s equation to a uniaxial crystal (e.g. quartz, sapphire)
where χx = χ y 6= χz . In the context of a uniaxial crystal, we show that the Poynting
vector and the k-vector are generally not parallel.
More than a century before Fresnel, Christian Huygens successfully described
birefringence in crystals using the idea of elliptical wavelets. His method gives
the direction of the Poynting vector associated with the extraordinary ray in a
crystal. It was Huygens who coined the term ‘extraordinary’ since one of the
rays in a birefringent material appeared not to obey Snell’s law. Actually, the
k-vector always obeys Snell’s law, but in a crystal, the k-vector points in a different
direction than the Poynting vector, which delivers the energy seen by an observer.
Huygens’ approach is outlined in Appendix 5.D.
NaCl) are highly symmetric and respond to electric fields the same in any direction.
117
118 Chapter 5 Propagation in Anisotropic Media
However, at low intensities the response of materials is still linear (or propor-
tional) to the strength of the electric field. The linear constitutive relation which
connects P to E in a crystal can be expressed in its most general form as
Px χxx χx y χxz Ex
P y = ²0 χ y x χy y χy z E y (5.1)
Pz χzx χz y χzz Ez
The matrix in (5.1) is called the susceptibility tensor. To visualize the behavior
of electrons in such a material, we imagine each electron bound as though by
tiny springs with different strengths in different dimensions to represent the
anisotropy (see Fig. 5.1). When an external electric field is applied, the electron
experiences a force that moves it from its equilibrium position. The ‘springs’
Figure 5.1 A physical model of an
(actually the electric force from ions bound in the crystal lattice) exert a restoring
electron bound in a crystal lat-
tice with the coordinate system force, but the restoring force is not equal in all directions—the electron tends to
specially chosen along the princi- move more along the dimension of the weaker spring. The displaced electron
pal axes so that the susceptibility creates a microscopic dipole, but the asymmetric restoring force causes P to be in
tensor takes on a simple form. a direction different than E as depicted in Fig. 5.2.
To understand the geometrical interpretation of the many coefficients χi j ,
assume, for example, that the electric field is directed along the x-axis (i.e. E y =
E z = 0) as depicted in Fig. 5.2. In this case, the three equations encapsulated in
(5.1) reduce to
P x = ²0 χxx E x
P y = ²0 χ y x E x
P z = ²0 χzx E x
Notice that the coefficient χxx connects the strength of P in the x̂ direction with
the strength of E in that same direction, just as in the isotropic case. The other two
coefficients (χ y x and χzx ) describe the amount of polarization P produced in the
ŷ and ẑ directions by the electric field component in the x-dimension. Likewise,
the other coefficients with mixed subscripts in (5.1) describe the contribution to
P in one dimension made by an electric field component in another dimension.
Figure 5.2 The applied field E As you might imagine, working with nine susceptibility coefficients can get
and the induced polarization P in
complicated. Fortunately, we can greatly reduce the complexity of the description
general are not parallel in a crystal
by a judicious choice of coordinate system. In Appendix 5.A we explain how
lattice.
conservation of energy requires that the susceptibility tensor (5.1) for typical
non-aborbing crystals be real and symmetric (i.e. χi j = χ j i ).2
Appendix 5.B shows that, given a real symmetric tensor, it is always possible
to choose a coordinate system for which off-diagonal elements vanish. This is
true even if the lattice planes in the crystal are not mutually orthogonal (e.g.
rhombus, hexagonal, etc.). We will imagine that this rotation of coordinates
2 By ‘typical’ we mean that the crystal does not exhibit optical activity. Optically active crystals
have a complex susceptibility tensor, even when no absorption takes place. Conservation of energy
in this more general case requires that the susceptibility tensor be Hermitian (χi j = χ∗j i ).
5.2 Plane Wave Propagation in Crystals 119
has been accomplished. In other words, we can let the crystal itself dictate the
orientation of the coordinate system, aligned to the principal axes of the crystal
for which the off-diagonal elements of (5.1) are zero
With the coordinate system aligned to the principal axes, the constitutive
relation for a non absorbing crystal simplifies to
Px χx 0 0 Ex
P y = ²0 0 χy 0 Ey (5.2)
Pz 0 0 χz Ez
By assumption, χx , χ y , and χz are all real. (We have dropped the double subscript;
χx stands for χxx , etc.)
We immediately notice the following peculiarity: From its definition, the Poynting
vector S ≡ E × B/µ0 is perpendicular to both E and B, and by (5.6) the k-vector is
perpendicular to B. However, by (5.5) the k-vector is not necessarily perpendicu-
lar to E, since in general k · E 6= 0 if P points in a direction other than E. Therefore,
k and S are not necessarily parallel in a crystal. In other words, the flow of energy
and the direction of the phase-front propagation can be different in anisotropic
media.
120 Chapter 5 Propagation in Anisotropic Media
Our main goal here is to relate the k-vector to the susceptibility parameters χx ,
χ y , and χz . To do this, we plug our trial plane-wave fields into the wave equation
(1.41). Under the assumption Jfree = 0, we have
∂2 E ∂2 P
∇2 E − µ0 ²0 = µ0 + ∇ (∇ · E) (5.7)
∂t 2 ∂t 2
We begin by substituting the trial solutions (5.4) into the wave equation (5.7). After
carrying out the derivatives we find
k 2 E − ω2 µ0 (²0 E + P) = k (k · E) (5.8)
Inserting the constitutive relation (5.3) for crystals into (5.8) yields
k 2 E − ω2 µ0 ²0 1 + χx E x x̂ + 1 + χ y E y ŷ + 1 + χz E z ẑ = k (k · E)
£¡ ¢ ¡ ¢ ¡ ¢ ¤
(5.9)
This relationship is unwieldy because of the mix of electric field components that
appear in the expression. This was not a problem when we investigated isotropic
materials for which the k-vector is perpendicular to E, making the right-hand side
of the equations zero. However, there is a trick for dealing with this.
Relation (5.9) actually contains three equations, one for each dimension. Explicitly,
these equations are
ω2 ¡
· ¸
2
k − 2 1 + χx E x = k x (k · E)
¢
(5.10)
c
ω2 ¡
· ¸
k 2 − 2 1 + χ y E y = k y (k · E)
¢
(5.11)
c
and
ω2 ¡
· ¸
k2 − χ
¢
1 + z E z = k z (k · E) (5.12)
c2
We have replaced the constants µ0 ²0 with 1/c 2 in accordance with (1.43). We
multiply (5.10)–(5.12) respectively by k x , k y , and k z and also move the factor in
square brackets in each equation to the denominator on the right-hand side. Then
by adding the three equations together we get
k x2 (k · E) k y2 (k · E) k z2 (k · E)
i+h i+h i = k x E x + k y E y + k z E z = (k · E)
ω2 1+χ ω2 ω2 1+χ
k2 − ( x) (1+χ y ) k2 − ( z)
h
c2 k2 − c2 c2
(5.13)
Now k · E appears in every term and can be divided away. This gives the dispersion
relation (unencumbered by field components):
2
k x2 ky k z2 ω2
¢¤ + £ ¢¤ + £ ¢¤ = 2 (5.14)
k 2 c 2 /ω2 − 1 + χx k 2 c 2 /ω2 − 1 + χ y k 2 c 2 /ω2 − 1 + χz
£ ¡ ¡ ¡
c
The dispersion relation (5.14) allows us to find a suitable k, given values for ω,
χx , χ y , and χz . Actually, it only restricts the magnitude of k; we must still decide
on a direction for the wave to travel (i.e. we must choose the ratios between k x , k y ,
and k z ). To remind ourselves of this fact, we introduce a unit vector that points in
the direction of k
¡ ¢
k = k x x̂ + k y ŷ + k z ẑ = k u x x̂ + u y ŷ + u z ẑ = k û (5.15)
With this unit vector inserted, the dispersion relation (5.14) for plane waves in a
crystal becomes
u x2 u 2y u z2 ω2
¢¤ + £ ¢¤ + £ ¢¤ = 2 2 (5.16)
k 2 c 2 /ω2 − 1 + χx k 2 c 2 /ω2 − 1 + χ y k 2 c 2 /ω2 − 1 + χz
£ ¡ ¡ ¡
k c
We may define refractive index as the ratio of the speed of light in vacuum
c to the speed of phase propagation in a material ω/k (see P1.9). The relation
introduced for isotropic media (i.e. (2.19) for real index) remains appropriate.
That is
kc
n= (5.17)
ω
This familiar relationship between k and ω, in the case of a crystal, depends on
the direction of propagation in accordance with (5.16).
Inspired by (2.30), we will find it helpful to introduce several refractive-index
parameters:
n x ≡ 1 + χx
p
q
n y ≡ 1 + χy (5.18)
n z ≡ 1 + χz
p
u x2 u 2y u z2 1
¡
2 2
¢ + ¡
2 2
¢ + ¡
2 2
¢= 2 (5.19)
n − nx n − ny n − nz n
This is called Fresnel’s equation (not to be confused with the Fresnel coefficients
studied in chapter 3). The relationship contains the yet unknown index n that
varies with the direction of the k-vector (i.e. the direction of the unit vector û).
After multiplying through by all of the denominators (and after a fortuitous
cancelation owing to u x2 + u 2y + u z2 = 1), Fresnel’s equation (5.19) can be rewritten
as a quadratic in n 2 . The two solutions are
p
2 B ± B 2 − 4AC
n = (5.20)
2A
where
A ≡ u x2 n x2 + u 2y n 2y + u z2 n z2 (5.21)
³ ´ ³ ´
B ≡ u x2 n x2 n 2y + n z2 + u 2y n 2y n x2 + n z2 + u z2 n z2 n x2 + n 2y
¡ ¢
(5.22)
C ≡ n x2 n 2y n z2 (5.23)
122 Chapter 5 Propagation in Anisotropic Media
The upper and lower signs (+ and −) in (5.20) give two positive solutions for n 2 .
The positive square root of these solutions yields two physical values for n. It turns
out that each of the two values for n is associated with a polarization direction
of the electric field, given a propagation direction k. A broader analysis carried
out in appendix 5.C renders the orientation of the electric fields, whereas here we
only show how to find the two values of n. We refer to the two indices as the slow
and fast index, since the waves associated with each propagate at speed v = c/n.
In the special cases of propagation along one of the principal axes of the
crystal, the index n takes on two of the values n x , n y , or n z , depending on which
are orthogonal to the direction of propagation.
Example 5.1
Calculate the two possible values for the index of refraction when k is in the ẑ
direction (in the crystal principal frame).
Inserting this expression into (5.20), we find the two values for the index:
n = nx , n y
The index n x is experienced by light whose electric field points in the x-dimension,
and the index n y is experienced by light whose electric field points in the y-
dimension (see appendix 5.C ).
Before moving on, let us briefly summarize what has been accomplished so
far. Given values for χx , χ y , and χz associated with light in a crystal at a given
frequency, you can define the indices n x , n y , and n z , according to (5.18). Next, a
direction for the k-vector is chosen (i.e. u x , u y , and u z ). This direction generally
has two values for the index of refraction associated with it, found using Fresnel’s
equation (5.20). Each index is associated with a specific polarization direction
for the electric field as outlined in appendix 5.C. Every propagation direction û
has its own natural set of polarization components for the electric field. The two
polarization components travel at different speeds, even though the frequency is
the same. This is known as birefringence.
Figure 5.3 Spherical coordinates.
5.3 Biaxial and Uniaxial Crystals 123
v
u 2
nx u n z − n 2y
cos θ = ± t 2 (Optic axes directions, biaxial crystal) (5.25)
n y n z − n x2
While finding the direction of the optic axes in a biaxial crystal is not too bad, an
expression for the two indices of refraction is messy. The smaller value is com-
monly referred to as the ‘fast’ index and the larger value the ‘slow’ index. Figure 5.4
shows the two refractive indices (i.e. the solutions to Fresnel’s equation (5.20)) for
a biaxial crystal plotted with color shading on the surface of a sphere. Each point 0 0.19
on the sphere represents a different θ and φ. The two optic axes are apparent
in the plot of the difference between n slow and n fast . When propagating in these Figure 5.4 The fast and slow re-
fractive indices (and their differ-
directions, either polarization experiences the same index. For the remainder of
ence) as a function of direction
this chapter, we will focus on the simpler case of uniaxial crystals. for potassium niobate (KNbO3 ) at
In uniaxial crystals two of the coefficients χx , χ y , and χz are the same. In λ = 500 nm (n x = 2.22, n y = 2.35,
this case, there is only one optic axis for the crystal (hence the name uniaxial). and n z = 2.41) .
By convention, in uniaxial crystals we label the dimension that has the unique
susceptibility as the z-axis (i.e. χx = χ y 6= χz ). This makes the z-axis the optic axis.
The unique index of refraction is called the extraordinary index
n z = ne (5.26)
124 Chapter 5 Propagation in Anisotropic Media
n x = n y = no (5.27)
These names were coined by Huygens, one of the early scientists to study light
in crystals (see appendix 5.D). A uniaxial crystal with n e > n o is referred to as a
positive crystal, and one with n e < n o is referred to as a negative crystal.
To calculate the index of refraction for a wave propagating in a uniaxial crystal,
we use definitions (5.26) and (5.27) along with the spherical representation of û
(5.24) in Fresnel’s equation (5.20) to find the following two values for n (see P5.4):
and
no ne
n = n e (θ) ≡ q (uniaxial crystal) (5.29)
1.56 1.68
n o2 sin2 θ + n e2 cos2 θ
The index n e (θ) in (5.29) is also commonly referred to as the extraordinary index
along with the constant n e = n z . While this has the potential for some confusion,
the practice is so common that we will perpetuate it here. We will write n e (θ)
when the angle dependent quantity specified by (5.29) is required, and write n e
in formulas where the constant (5.26) is called for (as in the right hand side of
(5.29)). Notice that n e (θ) depends only on θ (the polar angle measured from the
optic axis ẑ) and not φ (the azimuthal angle). Figure 5.5 shows the two refractive
indices (5.28) and (5.29) as a function θ and φ. Since n e (θ) has no φ dependence
and n o is constant, the variation is much simpler than for the biaxial case.
1.68 1.68
As outlined in appendix 5.C, the index n o corresponds to an electric field
component that points perpendicular to the plane containing û and ẑ (e.g. if
û is in the x-z plane, n o is associated with light polarized in the y-dimension).
On the other hand, the index n e (θ) corresponds to field polarization that lies
within the plane containing û and ẑ. In this case, the polarization component
is directed partially along the optic axis (i.e. it has a z-component). That is why
(5.29) gives for the refractive index a mixture of n o and n e . If θ = 0, then the
k-vector is directed exactly along the optic axis, and n e (θ) reduces to n o so that
both polarization components experience same index n o .
0 0.12
Figure 5.5 The extraordinary and 5.4 Refraction at a Uniaxial Crystal Surface
ordinary refractive indices (and
their difference) as a function of Next we consider refraction as light enters a uniaxial crystal. Snell’s law (3.7)
direction for beta barium borate describes the connection between the k-vectors incident upon and transmitted
(BBO) at λ = 500 nm (n o = 1.68 through the surface. We must consider separately the portion of the light that ex-
and n e = 1.56).
periences the ordinary index from the portion that experiences the extraordinary
index. Because of the different indices, the ordinary and extraordinary polarized
light refract into the crystal at two different angles; they travel at two different
velocities in the crystal; and they have two different wavelengths in the crystal.
5.5 Poynting Vector in a Uniaxial Crystal 125
If we assume that the index outside of the crystal is one, Snell’s law for the
ordinary polarization is
where n o is the ordinary index inside the crystal. The extraordinary polarized
light also obeys Snell’s law, but now the index of refraction in the crystal depends
on direction of propagation inside the crystal relative to the optic axis. Snell’s law
for the extraordinary polarization is
where θ 0 is the angle between the optic axis inside the crystal and the direction of
propagation in the crystal (given by θt in the plane of incidence). When the optic
axis is at an arbitrary angle with respect to the surface the relationship between
θ 0 and θt is cumbersome. We will examine Snell’s law only for the specific case
when the optic axis is perpendicular to the crystal surface, for which θt = θ 0 .
y-axis
Example 5.2
Derive Snell’s law for a uniaxial crystal with optic axis perpendicular to the surface.
Solution: Refer to Fig. 5.6. With the optic axis perpendicular to the surface, if the z-axis
light hits the crystal surface straight on, the index of refraction is n o , regardless of
x-axis (directed into page)
the orientation of polarization since θ 0 = 0. When the light strikes the surface at an
angle, s-polarized light continues to experience the index n o , while p-polarized
light experiences the extraordinary index n e (θ). 3
When we insert (5.29) into Snell’s law (5.31) with θ 0 = θt , the expression can be
inverted to find the transmitted angle θt in terms of θi (see P5.5):
Figure 5.6 Propagation of light in a
n e sin θi
tan θt = q (extraordinary polarized, optic axis ⊥ surface) (5.32) uniaxial crystal with its optic axis
n o n e2 − sin2 θi perpendicular to the surface.
As strange as this formula may appear, it is Snell’s law, but with an angularly
dependent index.
is specific to the orientation of the optic axis in this example. For arbitrary orientations of the
optic axis with respect to the surface, the ordinary and extraordinary components will generally be
mixtures of s and p polarized light.
126 Chapter 5 Propagation in Anisotropic Media
giving rise to two different images. This phenomenon is one of the more com-
monly observed manifestations of birefringence. Since the Poynting vector dic-
tates the direction of energy flow, it is the direction of S that determines the
separation of the double image seen when looking through a birefringent crystal.
Snell’s law dictates the connection between the directions of the incident
and transmitted k-vectors. The Poynting vector S for purely ordinary polarized
light points in the same direction as the k-vector, so the direction of energy flow
for ordinary polarized light also obeys Snell’s law. However, for extraordinary
polarized light, the Poynting vector S is not parallel to k (recall the discussion in
connection with (5.5) and (5.6)). Thus, the energy flow associated with extraordi-
nary polarized light does not obey Snell’s law. When Christiaan Huygens saw this
in the 1600s, he exclaimed “how extraordinary!” Huygens’ method for describing
the phenomenon is outlined appendix 5.D.
To analyze this situation, it is necessary to derive an expression for extraordi-
nary polarized light similar to Snell’s law, but which applies to S rather than to k.
This describes the direction that the energy associated with extraordinary rays
takes upon entering the crystal. To calculate the direction that the extraordinary
polarized S takes upon entering a crystal, we first calculate the direction of k
inside the crystal using Snell’s law (5.31). Then we use the expression (5.62) for E
along with B = (k × E)/ω, to evaluate S = E × B/µ0 . In general, this process is best
done numerically, since Snell’s law (5.31) for extraordinary polarized light usually
does not have simple analytic solutions.
Example 5.3
Find a relationship between direction of the Poynting Vector in a uniaxial crystal
and the angle of incidence in the special case where the optic axis is perpendicular
to the surface.
²0 E + P = ²0 1 + χx E x x̂ + 1 + χ y E y ŷ + 1 + χz E z ẑ
£¡ ¢ ¡ ¢ ¡ ¢ ¤
(5.33)
= ²0 n o2 E x x̂ + n o2 E y ŷ + n e2 E z ẑ
¡ ¢
Let the k-vector lie in the y-z plane. We may write it as k = k ŷ sin θt + ẑ cos θt .
¡ ¢
Then the ordinary component of the field points in the x-direction, while the
extraordinary component lies in the y-z plane.
Equation (5.33) requires
= ²0 k n o2 E y sin θt + n e2 E z cos θt
¡ ¢
(5.34)
=0
5.5 Poynting Vector in a Uniaxial Crystal 127
Therefore, the y and z components of the extraordinary field are related through
n o2 E y
Ez = − tan θt (5.35)
n e2
n2
µ ¶
E = E y ŷ − ẑ o2 tan θt e i (k·r−ωt ) (extraordinary polarized) (5.36)
ne
n o2
³ ´
k × E k ŷ sin θt + ẑ cos θt × E y ŷ − ẑ ne2 tan θt i (k·r−ωt )
¡ ¢
B= = e
ω ω
µ 2
kE y n o
¶
= −x̂ sin θt tan θt + cos θt e i (k·r−ωt )
ω n e2
(extraordinary polarized) (5.37)
The time-averaged Poynting vector then becomes
¿ ½ ¾À
B
〈S〉t = Re {E} × Re
µ0 t
k|E y |2
à ! à !
n 2 n2 D E
=− ŷ − ẑ o2 tan θt × o2 sin θt tan θt + cos θt x̂ cos2 (k · r − ωt + φE y )
µ0 ω ne ne t
2
à !à !
k|E y | n o 2 n 2
= sin θt tan θt + cos θt ẑ + ŷ o2 tan θt
2µ0 ω n e2 ne
(extraordinary polarized) (5.38)
Let us label the direction of the Poynting vector with the angle θS . By definition,
the tangent of this angle is the ratio of the two vector components of S:
Sy n o2
tan θS ≡ = tan θt (extraordinary polarized) (5.39)
Sz n e2
While the k-vector is characterized by the angle θt , the Poynting vector is char-
acterized by the angle θS . Combining (5.32) and (5.39), we can connect θS to the
incident angle θi :
n o sin θi
tan θS = q (extraordinary polarized) (5.40)
n e n e2 − sin2 θi
As we noted in the last example, we have the case where ordinary polarized light is
s-polarized light, and extraordinary polarized light is p-polarized light due to our
specific choice of orientation for the optic axis in this section. In general, the s- and
p-polarized portions of the incident light can each give rise to both extraordinary
and ordinary rays.
128 Chapter 5 Propagation in Anisotropic Media
x χxx χx y χxz Fx
²0
χy x
N qe y = χy y χy z F y (5.41)
qe
z χzx χz y χzz Fz
The column vector on the left represents the components of the displacement
r. We next invert (5.41) to find the force of the electric field on an electron as a
function of its displacement4
Fx k xx kx y k xz x
Fy = kyx ky y kyz y (5.42)
Fz k zx kz y k zz z
where
−1
k xx kx y k xz χxx χx y χxz
2
N qe
kyx ky y kyz ≡ χy x χy y χy z (5.43)
²0
k zx kz y k zz χzx χz y χzz
The total work done on an electron in moving it to its displaced position is
given by Z
W= F(r0 ) · d r0 (5.44)
path
While there are many possible paths for getting the electron to any specific dis-
placement (each path specified by a different history of the electric field), the
work done along any of these paths must be the same if the system is conservative
(i.e. no absorption). For example, for a final displacement of r = x x̂ + y ŷ we could
have the following two paths:
(x,y,0) Path 2 (x,y,0)
Path 1
(0,0,0) (0,0,0)
4 This inversion assumes the field changes slowly so the forces on the electron are always es-
sentially balanced. This is not true for optical fields, but the proof gives the right flavor for why
conservation of energy results in the symmetry. A more formal proof that doesn’t make this as-
sumption can be found in Principles of Optics, 7th Ed., Born and Wolf, pp. 790-791.
5.B Rotation of Coordinates 129
We can use (5.42) in (5.44) to calculate the total work done on the electron
along path 1:
Z x Z y
W= F x (x 0 , y 0 = 0, z 0 = 0)d x 0 + F y (x 0 = x, y 0 , z 0 = 0)d y 0
0 0
Z x Z y
0 0
= k xx x d x + (k y x x + k y y y 0 ) d y 0
0 0
k xx 2 ky y 2
= x + kyx x y + y
2 2
If we take path 2, the total work is
Z y Z x
0 0 0 0
W= F y (x = 0, y , z = 0)d y + F x (x 0 , y 0 = y, z 0 = 0)d x 0
Z0 y Z x 0
0 0
= ky y y d y + (k xx x + k x y y) d x 0
0
0 0
ky y 2 k xx 2
= y + kx y x y + x
2 2
Since the work must be the same for these two paths, we clearly have k x y = k y x .
Similar arguments for other pairs of dimensions ensure that the matrix of k
coefficients is symmetric. From linear algebra, we learn that if the inverse of a
matrix is symmetric then the matrix itself is also symmetric. When we combine
this result with the definition (5.43), we see that the assumption of no absorption
requires the susceptibility matrix to be symmetric.
P = ²0 χE (5.45)
where
Ex Px χxx χx y χxz
E ≡ Ey P ≡ Py χ ≡ χx y χy y χy z (5.46)
Ez Pz χxz χy z χzz
Our task is to find a new coordinate system x 0 , y 0 , and z 0 for which the susceptibil-
ity tensor is diagonal. That is, we want to choose x 0 , y 0 , and z 0 such that
P0 = ²0 χ0 E0 , (5.47)
where
χ0 0 0
E x0 0 P x0 0 0 0
x0 x χ0y 0 y 0
E0 ≡ E y0 0 P0 ≡ P y0 0 0
χ ≡ 0 (5.48)
E z0 0 P z0 0 0 0 χ0z 0 z 0
130 Chapter 5 Propagation in Anisotropic Media
To arrive at the new coordinate system, we are free to make pure rotation trans-
formations. In a manner similar to (6.29), a rotation through an angle γ about the
z-axis, followed by a rotation through an angle β about the resulting y-axis, and
finally a rotation through an angle α about the new x-axis, can be written as
R 11 R 12 R 13
R ≡ R21 R22 R23
R 31 R 32 R 33
cos β 0 sin β cos γ sin γ 0
1 0 0
= 0 cos α sin α 0 1 0 − sin γ cos γ 0
0 − sin α cos α − sin β 0 cos β 0 0 1
cos β cos γ cos β sin γ sin β
= − cos α sin γ − sin α sin β cos γ cos α cos γ − sin α sin β sin γ sin α cos β
sin α sin γ − cos α sin β cos γ − sin α cos γ − cos α sin β sin γ cos α cos β
(5.49)
E = R−1 E0
(5.51)
P = R−1 P0
where
cos β cos γ − cos α sin γ − sin α sin β cos γ sin α sin γ − cos α sin β cos γ
R −1
= cos β sin γ cos α cos γ − sin α sin β sin γ − sin α cos γ − cos α sin β sin γ
sin β sin α cos β cos α cos β
R 11 R 21 R 31
= R 12 R 22 R 32 = RT (5.52)
R 13 R 23 R 33
Note that the inverse of the rotation matrix is the same as its transpose, an impor-
tant feature that we exploit in what follows.
Upon inserting (5.51) into (5.45) we have
or
P0 = ²0 RχR−1 E0 (5.54)
5.C Electric Field in Crystals 131
From this equation we see that the new susceptibility tensor we seek for (5.47) is
χ0 ≡ RχR−1
R 11 R 12 R 13 χxx χx y χxz R 11 R 21 R 31
= R 21 R 22 R 23 χx y χy y χ y z R 12 R 22 R 32
R 31 R 32 R 33 χxz χy z χzz R 13 R 23 R 33
χx 0 x 0 χ0x 0 y 0 χ0x 0 z 0
0
= x 0 y 0 χ0y 0 y 0 χ0y 0 z 0
χ0
(5.55)
We have expressly indicated that the off-diagonal terms of χ0 are symmetric (i.e.
χ0i j = χ0j i ). This can be verified by performing the multiplication in (5.55). It is a
consequence of χ being symmetric and R−1 being equal to RT
The three off-diagonal elements of χ0 (appearing both above and below the
diagonal) are found by performing the matrix multiplication in the second line
of (5.55). The specific expressions for these three elements are not particularly
enlightening. The important point is that we can make all three of them equal to
zero since we have three degrees of freedom in the angles α, β, and γ. Although,
we do not expressly solve for the angles, we have demonstrated that it is always
possible to set
χ0x 0 y 0 = 0
χ0x 0 z 0 = 0 (5.56)
χ0y 0 z 0 =0
ω2
1 + χx − k y2 − k z2
¡ ¢
kx k y kx kz
c2 Ex
ω2
2 2
χ
¡ ¢
kx k y Ey = 0
1 + y − kx − kz k y kz
c2
ω2 Ez
1 + χz − k x2 − k y2
¡ ¢
kx kz k y kz c2
(5.57)
slightly nicer:
n x2
− u 2y − u z2 ux u y ux uz
Ex
n2
n 2y Ey = 0
ux u y n2
− u x2 − u z2 u y uz (5.58)
n z2 Ez
ux uz u y uz n2
− u x2 − u 2y
For (5.58) to have a non-trivial solution (i.e. non zero fields), the determinant
of the matrix must be zero. Imposing this requirement is an equivalent way to
derive Fresnel’s equation (5.19) for n.
Given a direction for û and a value for n (from Fresnel’s equation), we can use
(5.58) to determine the direction of the electric field associated with that index.
It is left as an exercise to show that when all three components are nonzero (i.e.
u x 6= 0, u y 6= 0, and u z 6= 0), the appropriate field direction for a value of n is given
by
ux
n 2 − n x2
Ex
uy
Ey ∝ 2 (5.59)
n − n 2y
Ez
uz
n 2 − n z2
This is a proportionality rather than an equation because Maxwell’s equations
only specify the direction of E—we are free to choose the amplitude. Because
Fresnel’s equation gives two values for n, (5.59) specifies two distinct polarization
components associated with each propagation direction û. These polarization
components form a natural basis for describing light propagation in a crystal.
When light is composed of a mixture of these two polarizations, the two polariza-
tion components experience different indices of refraction.
If any of the components of û (i.e. u x , u y , or u z ) is precisely zero, the corre-
sponding entry in (5.59) yields a zero-over-zero situation. This happens when
at least one of the dimensions in (5.58) becomes decoupled from the others. In
these cases, you can and re-solve (5.58) for the polarization directions as in the
following example.
Example 5.4
Determine the directions of the two polarization components associated with light
propagating in the û = ẑ direction. (Compare with Example 5.1.)
n x2
−1 0 0
n2
n 2y Ex
Ey
0 −1 0 =0 (5.60)
n2
2
nz Ez
0 0 n2
5.C Electric Field in Crystals 133
Notice that all three dimensions are decoupled in this system (i.e. there are no
off-diagonal terms). In Example 5.1 we found that the two values of n associated
with û = ẑ are n x and n y . If we use n = n x in our set of equations, we have
0 0 0
n 2y Ex
0 −1 0 Ey = 0
n x2
(a) Polarization Direction for Slow Index
n z2 Ez
0 0 n x2
We can use (5.59) to study the behavior of polarization direction as the direc-
tion of propagation varies. Figure 5.7 shows plots of the polarization direction (i.e. (b) Polarization Direction for Fast Index
To find the directions of the electric field for light that experiences the normal
index of refraction in a uniaxial crystal, we insert n = n o into the requirement
(5.58), and solve for the allowed fields (see P5.9) to find Figure 5.7 Polarization direction
associated with the two values of n
− sin φ
in Potassium Niobate (KNbO3 ) at
Eo (û) ∝ cos φ (5.61) λ = 500 nm (n x = 2.22, n y = 2.34,
0 and n z = 2.41) and φ = π/4. Frame
(c) shows the angle between the
This field component is associated with the ordinary wave because just as in an
two polarization components.
isotropic medium such as glass, the index of refraction for light with this polariza-
tion does not vary with θ. The polarization component associated with n e (θ) is
6 The two components of the electric displacement vector D = ² E + P remain perpendiular.
0
134 Chapter 5 Propagation in Anisotropic Media
by the hypotenuse of the right triangle seen in Fig. 5.8. Let the point where the
wave front touches the ellipse be denoted by y, z = (z tan θS , z). The slope (rise
¡ ¢
over run) of the line that connects these two points is then
dz z
=− (5.65)
dy ct / sin θi − z tan θS
At the point where the wave front touches the ellipse (i.e., y, z = (z tan θS , z)), the
¡ ¢
dz −yn e2 n e2 y n e2
= =− =− tan θS (5.66)
n o2 z n o2
r
dy y2
n o ct 1 − (c t /n 2
e)
We would like these two slopes to be the same. We therefore set them equal to
each other:
n e2 z c t n e2 tan θS n e2
− tan θS = − ⇒ = 2 tan2 θS + 1 (5.67)
n o2 ct / sin θi − z tan θS z n o2 sin θi no
s
ct n e2
= no tan2 θS + 1 (5.68)
z n o2
This agrees with (5.40) as anticipated. Again, Huygens’ approach obtained the
correct direction of the Poynting vector associated with the extraordinary wave.
136 Chapter 5 Propagation in Anisotropic Media
Exercises
P5.1 Solve Fresnel’s equation (5.19) to find the two values of n associated
with a given û. Show that both solutions yield a positive index of
refraction
HINT: Show that (5.19) can be manipulated into the form
h³ ´ i
0= u x2 + u 2y + u z2 − 1 n 6
h³ ´ ³ ´ ³ ´i
+ n x2 + n 2y + n z2 − u x2 n 2y + n z2 − u 2y n x2 + n z2 − u z2 n x2 + n 2y n 4
¡ ¢
h³ ´ i
− n x2 n 2y + n x2 n z2 + n 2y n z2 − u x2 n 2y n z2 − u 2y n x2 n z2 − u z2 n x2 n 2y n 2 + n x2 n 2y n z2
P5.2 Suppose you have a crystal with n x = 1.5, n y = 1.6, and n z = 2.0. Use
Fresnel’s equation to determine what the two indices of p refraction are
for a k-vector in the crystal along the û = (x̂ + 2ŷ + 3ẑ)/ 14 direction.
P5.3 Given that the optic axes are in the x-z plane, show that the direction
of the optic axes are given by (5.25).
HINT: The two indices are the same when B 2 − 4AC = 0. You will want
to use polar coordinates for the direction unit vector, as in (5.24). Set
φ = 0 so you are in the x-z plane. Use sin2 θ + cos2 θ = 1 to get an
equation that only has cosine terms and solve for cos2 θ.
P5.4 Use definitions (5.26) and (5.27) along with the spherical representa-
tion of û (5.24) in Fresnel’s equation (5.20) to calculate the two values
for the index in a uniaxial crystal (i.e. (5.28) and (5.29)).
HINT: First show that
A = n o2 sin2 θ + n e2 cos2 θ
B = n o2 n e2 + n o4 sin2 θ + n e2 n o2 cos2 θ
C = n o4 n e2
P5.6 Suppose you have a quartz plate (a uniaxial crystal) with its optic axis
oriented perpendicular to the surfaces. The indices of refraction for
quartz are n o = 1.54424 and n e = 1.55335. A plane wave with wave-
length λvac = 633 nm passes through the plate. After emerging from
the crystal, there is a phase difference ∆ between the two polarization
components of the plane wave, and this phase difference depends on
incident angle θi . Use a computer to plot ∆ as a function of incident
angle from zero to 90◦ for a plate with thickness d = 0.96 mm .
HINT: For s-polarized light, show that the number of wavelengths that
d
fit in the plate is (s) . For p-polarized light, show that the
(λvac /n o ) cos θt
number of wavelengths that fit in the plate and the extra leg δ outside
d δ
of the plate (see Fig. 5.9) is (p) + λ , where
(λvac /n p ) cos θt vac
h i
(p) Figure 5.9 Diagram for P5.6.
δ = d tan θt(s) − tan θt sin θi
L5.7 In the laboratory, send a HeNe laser (λvac = 633 nm) through two
crossed polarizers, oriented at 45◦ and 135◦ . Place the quartz plate
described in P5.6 between the polarizers on a rotation stage. Now
equal amounts of s- and p-polarized light strike the crystal as it is
rotated from normal incidence. (video)
Dim spots
Bright spots
Laser
If the phase shift between the two paths discussed in P5.6 is an odd
Figure 5.10 Plot for P5.6 and L 5.7.
integer times π, the polarization direction of the light transmitted
through the crystal is rotated by 90◦ , and the maximum transmission
through the second polarizer results. (In this configuration, the crystal
acts as a half wave plate, which we discuss in Chapter 6. If the phase
shift is an even integer times π, then the polarization is rotated by 180◦
and minimum transmission through the second polarizer results. Plot
these measured maximum and minimum points on your computer-
generated graph of the previous problem.
P5.10 Show that the electric field for extraordinary polarized light Ee (û) in a
uniaxial crystal is not perpendicular to k (i.e. û), but that it is perpen-
dicular to the ordinary polarization component Eo (û).
Review, Chapters 1–5
To prepare for an exam, you should understand the following questions and
problems thoroughly enough to be able to work them without referring back to
previous chapters.
R4 T or F: The real part of the refractive index cannot be less than one.
R8 T or F: The critical angle for total internal reflection exists on both sides
of a material interface.
R10 T or F: From any given location beneath a (smooth flat) surface of water,
it is possible to see objects positioned anywhere above the water.
139
140 Review, Chapters 1–5
R13 T or F: For incident angles beyond the critical angle for total internal
reflection, the Fresnel coefficients t s and t p are both zero.
R15 T or F: For a given incident angle and value of n, there is only one
single-layer coating thickness d that will minimize reflections.
R17 T or F: As light enters a crystal, the Poynting vector always obeys Snell’s
law.
R18 T or F: As light enters a crystal, the k-vector does not obey Snell’s law
for the extraordinary wave.
Problems
(a) Find what each of these equations reduces to when θi = 0. Give your
answer in terms of n i and n t .
(b) What percent of light (intensity) reflects from a glass surface (n =
1.5) when light enters from air (n = 1) at normal incidence?
(c) What percent of light reflects from a glass surface when light exits
into air at normal incidence?
R22 Light goes through a glass prism with optical index n = 1.55. The light
enters at Brewster’s angle and exits at normal incidence as shown in
Fig. 5.13.
(a) Derive and calculate Brewster’s angle θB . You may use the results of Figure 5.13
R20 (c).
142 Review, Chapters 1–5
(b) Calculate φ.
(c) What percent of the light (power) goes all the way through the prism
if it is p-polarized? You may use the Fresnel coefficients given in R21.
(d) What percent for s-polarized light?
R23 A 45◦ - 90◦ - 45◦ prism is a good device for reflecting a beam of light
parallel to the initial beam (see Fig. 5.14). The exiting beam will be
parallel to the entering beam even when the incoming beam is not
normal to the front surface (although it needs to be in the plane of the
drawing).
(a) How large an angle θ can be tolerated before there is no longer total
internal reflection at both interior surfaces? Assume n = 1 outside of
the prism and n = 1.5 inside.
(b) If the light enters and leaves the prism at normal incidence, what
will the difference in phase be between the s and p-polarizations? You
Figure 5.14 may use the Fresnel coefficients given in R21.
R24 A thin glass plate with index n = 1.5 is oriented at Brewster’s angle so
that p-polarized light with wavelength λvac = 500 nm goes through
with 100% transmittance.
(a) What is the minimum thickness that will make the reflection of
s-polarized light be maximum?
(b) What is the total transmittance T stot for this thickness assuming
s-polarized light?
λ2
∆λFSR =
2nd cos θ
λ2
∆λFWHM = p
π F nd cos θ
4R
where F ≡ (1−R)2
.
(c) Derive the reflecting finesse f = ∆λFSR /∆λFWHM .
R26 For a Fabry-Perot etalon, let R = 0.90, λvac = 500 nm, n = 1, and d =
5.0 mm.
(a) Suppose that a maximum transmittance occurs at the angle θ = 0.
What is the nearest angle where the transmittance will be half of the
maximum transmittance? You may assume that cos θ ∼ = 1 − θ 2 /2.
143
(b) You desire to use a Fabry-Perot etalon to view the light from a large
diffuse source rather than a point source. Draw a diagram depicting
where lenses should be placed, indicating relevant distances. Explain
briefly how it works.
R27 You need to make an antireflective coating for a glass lens designed to
work at normal incidence.
The matrix equation relating the incident field to the reflected and
transmitted fields (at normal incidence) is
−i
E0 cos k 1 ` sin k 1 ` E 2
· ¸ · ¸ · ¸· ¸
1 1 n1 1
+ =
n0 −n 0 E 0 −i n 1 sin k 1 ` cos k 1 ` n2 E 0
where θ is the angle made with the optic axis. At the frequency of a
ruby laser, KDP has indices n o (ω) = 1.505 and n e (ω) = 1.465. At the
frequency of the second harmonic, the indices are n o (2ω) = 1.534 and
n e (2ω) = 1.487.
144 Review, Chapters 1–5
Selected Answers
R28: 51.12◦ .
Chapter 6
Polarization of Light
When the direction of the electric field of light oscillates in a regular, predictable
fashion, we say that the light is polarized. Polarization describes the direction
of the oscillating electric field, a distinct concept from dipoles per volume in a
material P – also called polarization. In this chapter, we develop a formalism for
describing polarized light and the effect of devices that modify polarization. If the
electric field oscillates in a plane, we say that it is linearly polarized. The electric
field can also spiral around while a plane wave propagates, and this is called
elliptical polarization. There is a convenient way for keeping track of polarization
using a two-dimensional Jones vector.
Many devices can affect polarization such as polarizers and wave plates. Their
effects on a light field can be represented by 2 × 2 Jones matrices that operate on
the Jones vector representing the light. A Jones matrix can describe, for example,
a linear polarizer oriented at an arbitrary angle with respect to the coordinate
system. Likewise, a Jones matrix can describe the manner in which a wave plate
introduces a relative phase between two components of the electric field. A wave
plate can be used to convert, for example, linearly polarized light into circularly
polarized light.
In this chapter, we will also see how reflection and transmission at a material Figure 6.1 Animation showing
interface influences field polarization. The Fresnel coefficients studied in chap- different polarization states of
light.
ters 3 and 4 can be conveniently incorporated into the 2×2 matrix formulation for
handling polarization. As we saw previously, the amount of light reflected from a
surface depends on the type of polarization, s or p. In addition, upon reflection,
s-polarized light can acquire a phase lag or phase advance relative to p-polarized
light. This is especially true at metal surfaces, which have complex indices of
refraction. Ellipsometry, outlined in appendix 6.A, is the science of characterizing
optical properties of materials through an examination of these effects.
Throughout this chapter, we consider light to have well characterized polar-
ization. However, most common sources of light (e.g. sunlight or a light bulb)
have an electric-field direction that varies rapidly and randomly. Such sources
are commonly referred to as unpolarized. It is common to have a mixture of
unpolarized and polarized light, called partially polarized light. The Jones vector
145
146 Chapter 6 Polarization of Light
E (z, t ) = E x x̂ + E y ŷ e i (kz−ωt )
¡ ¢
(6.2)
As always, only the real part of (6.2) is physically relevant. The complex amplitudes
of E x and E y keep track of the phase of the oscillating field components. In
general the complex phases of E x and E y can differ, so that the wave in one of the
dimensions lags or leads the wave in the other dimension.
The relationship between E x and E y describes the polarization of the light.
+ For example, if E y is zero, the plane wave is said to be linearly polarized along the
x-dimension. Linearly polarized light can have any orientation in the x–y plane,
and it occurs whenever E x and E y have the same complex phase (or a phase
differing by an integer times π). For our purposes, we will take the x-dimension
to be horizontal and the y-dimension to be vertical unless otherwise noted.
As an example, suppose E y = i E x , where E x is real. The y-component of the
field is then out of phase with the x-component by the factor i = e i π/2 . Taking the
real part of the field (6.2) we get
h i h i
E (z, t ) = Re E x e i (kz−ωt ) x̂ + Re e i π/2 E x e i (kz−ωt ) ŷ
= E x cos (kz − ωt ) x̂ + E x cos (kz − ωt + π/2) ŷ (left circular) (6.3)
y
= E x cos (kz − ωt ) x̂ − sin (kz − ωt ) ŷ
£ ¤
x
z In this example, the field in the y-dimension lags behind the field in the x-
Figure 6.2 The combination of
dimension by a quarter cycle. That is, the behavior seen in the x-dimension
two orthogonally polarized plane happens in the y-dimension a quarter cycle later. The field never goes to zero
waves that are out of phase results simultaneously in both dimensions. In fact, in this example the strength of the
in elliptically polarized light. Here electric field is constant, and it rotates in a circular pattern in the x-y dimensions.
we have left circularly polarized For this reason, this type of field is called circularly polarized. Figure 6.2 graph-
light created as specified by (6.3). ically shows the two linear polarized pieces in (6.3) adding to make circularly
polarized light.
6.2 Jones Vectors for Representing Polarization 147
|E x |2 + ¯E y ¯ e i φx
¯ ¯2
1941-1956. He also contributed greatly
E eff ≡ (6.6)
to the development of infrared detectors.
|E x | He was an avid train enthusiast, and
A≡ q (6.7) even wrote papers on railway engineer-
63, 519-522
¯ ¯2
|E x |2 + ¯E y ¯ ing. See J. Opt. Soc. Am.
¯ ¯ (1972). Also see SPIE oemagazine,
¯E y ¯ p. 52 (Aug. 2004).
B≡q ¯ ¯2 (6.8)
|E x |2 + ¯E y ¯
1 E. Hecht, Optics, 3rd ed., Sect. 8.12.2 (Massachusetts: Addison-Wesley, 1998).
148 Chapter 6 Polarization of Light
δ ≡ φ y − φx (6.9)
Please notice that A and B are real non-negative dimensionless numbers that
satisfy A 2 + B 2 = 1. If E y is zero, then B = 0 and everything is well-defined. On the
other hand, if E x happens to be zero, then its phase e i φx is indeterminant. In this
case we let E eff = |E y |e i φy , B = 1, and δ = 0.
The overall field strength E eff is often unimportant in a discussion of polariza-
Linearly polarized along x
tion. It represents the strength of an effective linearly polarized field that would
· ¸
1 correspond to the same intensity as (6.4). Specifically, from (6.5) and (2.62) we
0
have
1 1
Linearly polarized along y I = 〈S〉t = nc²0 E · E∗ = nc²0 |E eff |2 (6.10)
· ¸ 2 2
0
The phase of E eff represents an overall phase shift that one can trivially adjust by
1
physically moving the light source (a laser, say) forward or backward by a fraction
Linearly polarized at angle α of a wavelength.
(measured from the x-axis)
The portion of (6.5) that is relevant to our discussion of polarization is the
cos α vector A x̂+B e i δ ŷ, referred to as the Jones vector. This vector contains the essential
· ¸
sin α information regarding field polarization. Notice that the Jones vector is a kind
Right circularly polarized of unit vector, in that (A x̂ + B e i δ ŷ) · (A x̂ + B e i δ ŷ)∗ = 1. (The asterisk represents
· ¸ the complex conjugate.) When writing a Jones vector we dispense with the x̂ and
1 1
p ŷ notation and organize the components into a column vector (for later use in
2 −i
matrix algebra) as follows:
Left circularly polarized · ¸
A
(6.11)
B eiδ
· ¸
1 1
p
2 i
This vector can describe the polarization state of any plane wave field. Table 6.1
lists some Jones vectors representing various polarization states.
Table 6.1 Jones Vectors for several
common polarization states.
6.3 Elliptically Polarized Light
In general, the Jones vector (6.11) represents a polarization state between linear
and circular. This ‘between’ state is known as elliptically polarized light. As
the wave travels, the field vector makes a spiral motion. If we observe the field
vector at a point as the field goes by, the field vector traces out an ellipse oriented
perpendicular to the direction of travel (i.e. in the x–y plane). One of the axes of
the ellipse occurs at the angle
−1 2AB cos δ
µ ¶
1
α = tan (6.12)
2 A2 − B 2
with respect to the x-axis (see P6.8). This angle sometimes corresponds to the
minor axis and sometimes to the major axis of the ellipse, depending on the exact
values of A, B , and δ. The other axis of the ellipse (major or minor) then occurs at
α ± π/2 (see Fig. 6.3).
We can deduce whether (6.12) corresponds to the major or minor axis of the
ellipse by comparing the strength of the electric field when it spirals through the
6.4 Linear Polarizers and Jones Matrices 149
direction specified by α and when it spirals through α ± π/2. The strength of the
electric field at α is given by (see P6.8)
p
E α = |E eff | A 2 cos2 α + B 2 sin2 α + AB cos δ sin 2α (E max or E min ) (6.13)
and the strength of the field when it spirals through the orthogonal direction
(α ± π/2) is given by
p
E α±π/2 = |E eff | A 2 sin2 α + B 2 cos2 α − AB cos δ sin 2α (E max or E min ) (6.14)
After computing (6.13) and (6.14), we decide which represents E min and which
E max according to
E max ≥ E min (6.15)
We could predict in advance which of (6.13) or (6.14) corresponds to the major
axis and which corresponds to the minor axis. However, making this prediction is
as complicated as simply evaluating (6.13) and (6.14) and determining which is
greater.
Elliptically polarized light is often characterized by the ellipticity, given by the
ratio of the minor axis to the major axis:
E min
e≡ (6.16)
E max
The ellipticity e ranges between zero (corresponding to linearly polarized light) Figure 6.3 The electric field of el-
liptically polarized light traces an
and one (corresponding to circularly polarized light). Finally, the helicity or
ellipse in the plane perpendicular
handedness of elliptically polarized light is as follows (see P6.2): to its propagation direction. The
two plots are for different values
0<δ<π → left-handed helicity (6.17) of A, B , and δ. The angle α can de-
scribe the major axis (top) or the
π < δ < 2π → right-handed helicity (6.18) minor axis (bottom), depending
on the values of these parameters.
6.4 Linear Polarizers and Jones Matrices
In 1928, Edwin Land invented Polaroid at the age of nineteen. He did it by stretch-
ing a polymer sheet and infusing it with iodine. The stretching caused the polymer
chains to align along a common direction, whereupon the sheet was cemented
to a substrate. The infusion of iodine caused the individual chains to become
conductive, like microscopic wires.
When light impinges upon a Polaroid sheet, the component of electric field
that is parallel to the polymer chains causes a current Jfree to oscillate in that
dimension. The resistance to the current quickly dissipates the energy (i.e. the re-
fractive index is complex) and the light is absorbed. The thickness of the Polaroid
sheet is chosen sufficiently large to ensure that virtually none of the light with
electric field component oscillating along the chains makes it through the device.
The component of electric field that is orthogonal to the polymer chains
encounters electrons that are essentially bound to the narrow width of individual
150 Chapter 6 Polarization of Light
Arbitrary incident polymer molecules. For this polarization component, the wave passes through
polarization the material much like it does through typical dielectrics such as glass (i.e. the
refractive index is real). Today, there is a wide variety of technologies for making
Transmission Axis polarizers, many very different from Polaroid.
A polarizer can be represented as a 2×2 matrix that operates on Jones vectors.2
The function of a polarizer is to pass only the component of electric field that
is oriented along the polarizer transmission axis. Thus, if a polarizer is oriented
with its transmission axis along the x-dimension, then only the x-component
of polarization transmits; the y-component is killed. If the polarizer is oriented
Transmitted polarization with its transmission axis along the y-dimension, then only the y-component of
component
the field transmits, and the x-component is killed. These two scenarios can be
Figure 6.4 Light transmitting represented with the following Jones matrices:
through a Polaroid sheet. The · ¸
1 0
conducting polymer chains run (polarizer with transmission along x-axis) (6.19)
vertically in this drawing, and 0 0
light polarized along the chains · ¸
0 0
is absorbed. Light polarized per- (polarizer with transmission along y-axis) (6.20)
pendicular to the polymer chains
0 1
passes through the polarizer. These matrices operate on any Jones vector representing the polarization of
incident light. The result gives the Jones vector for the light exiting the polarizer.
Example 6.1
Use the Jones matrix (6.19) to calculate the effect of a horizontal polarizer on
light that is initially horizontally polarized, vertically polarized, and arbitrarily
polarized.
While you might readily agree that the matrices given in (6.19) and (6.20)
can be used to get the right result for light traversing a horizontal or a vertical
polarizer, you probably aren’t very impressed as of yet. In the next few sections,
we will derive Jones matrices for a number of optical elements that can modify
polarization: polarizers at arbitrary angle, wave plates at arbitrary angle, and
reflection or transmissions at an interface. Table 6.2 shows Jones matrices for
each of these devices. Before deriving these specific Jones matrices, however, we
take a moment to appreciate why the Jones matrix formulation is useful.
The real power of the formalism becomes clear as we consider situations Linear polarizer
where light encounters multiple polarization elements in sequence. In these situ- ·
cos2 θ sin θ cos θ
¸
ations, we use a product of Jones matrices to represent the effect of the compound sin θ cos θ sin2 θ
systems. We can represent this situation by
Half wave plate
· 0 ¸ · ¸
A A ·
cos 2θ sin 2θ
¸
= Jsystem (6.21)
B0 B eiδ sin 2θ − cos 2θ
and B 0 will turn out to be complex. However, if desired they can be changed into (1 − i ) sin θ cos θ sin2 θ + i cos2 θ
Transmission through an
where Jn is the matrix for the n th polarizing optical element encountered in the interface
system. Notice that the matrices operate on the Jones vector in the order that · ¸
tp 0
the light encounters the devices. Therefore, the matrix for the first device (J1 ) is 0 ts
written on the right, and so on until the last device encountered, which is written
on the left, farthest from the Jones vector.
When part of the light is absorbed by passing through one or more polarizers Table 6.2 Common Jones Matri-
ces. The angle θ is measured with
in a system, the Jones vector of the exiting light does not necessarily remain
respect to the x-axis and specifies
normalized to magnitude one (see Example 6.1). Since the components of a Jones the transmission axis of a linear
vector represent the electric field, we find the factor by which the intensity of the polarizer or the fast axis of a wave
light decreases by dotting the vector with its complex conjugate. In accordance plate.
with (6.10), the intensity of the exiting light is
1 ³ ´ ³ ´∗
I = nc²0 |E eff |2 A 0 x̂ + B 0 e i δ ŷ · A 0 x̂ + B 0 e i δ ŷ
0 0
2 (6.23)
1 ³¯ ¯ ¯ ¯ ´
2 2
= nc²0 |E eff |2 ¯ A 0 ¯ + ¯B 0 ¯
2
152 Chapter 6 Polarization of Light
¯ ¯2 ¯ ¯2
Notice that the intensity is attenuated by the factor ¯ A 0 ¯ + ¯B 0 ¯ after propagating
through the system. Recall that E eff represents the effective strength of the field
before it enters the polarizer (or other device), so that the initial Jones vector is
normalized to one (see (6.10)). As a reminder, we normally remove an overall
phase factor from the Jones vector so that A 0 is real and non-negative, and we
choose δ0 so that B 0 is real and non-negative. However, if we don’t bother doing
this, the absolute value signs on A 0 and B 0 in (6.23) ensure that we get the correct
value for intensity.
Let the transmission axis of the polarizer be specified by the unit vector ê1
Figure 6.5 Light transmitting and the absorption axis of the polarizer be specified by ê2 (orthogonal to the
through a polarizer oriented with
transmission axis). The vector ê1 is oriented at an angle θ from the x-axis, as
transmission axis at angle θ from
x-axis.
shown in Fig. 6.6. We need to write the electric field components in terms of the
new basis specified by ê1 and ê2 . By inspection of the geometry, the x-y unit
vectors are connected to the new coordinate system via:
where
E 1 ≡ E x cos θ + E y sin θ
(6.27)
E 2 ≡ −E x sin θ + E y cos θ
Now we introduce the effect of the polarizer on the field: E 1 is transmitted
unaffected, while E 2 is extinguished. To account for the effect of the device, we
multiply E 2 by a parameter ξ. In the case of the polarizer, ξ is zero, but when we
consider wave plates we will use other values for ξ. After traversing the polarizer,
the field becomes
Figure 6.6 Electric field compo- Eafter (z, t ) = (E 1 ê1 + ξE 2 ê2 ) e i (kz−ωt ) (6.28)
nents written in the ê1 –ê2 basis.
6.6 Jones Matrices for Wave Plates 153
We now have the field after the polarizer, but it would be nice to rewrite it in terms
of the original x–y basis. By inverting (6.25), or again by inspection of Fig. 6.6, we
see that
ê1 = cos θx̂ + sin θŷ
(6.29)
ê2 = − sin θx̂ + cos θŷ
Substitution of these relationships into (6.28) together with the definitions (6.27)
for E 1 and E 2 yields
(6.30)
Notice that if ξ = 1 (i.e. no polarizer), then we get back exactly what we started
with (i.e. (6.30) reduces to (6.24)).
To get to the Jones matrix for the polarizer, we note that (6.30) is a linear mix-
ture of E x and E y which can be represented with matrix algebra. If we represent
the electric field as a two-dimensional column vector with its x component in the
top and its y component in the bottom (like a Jones vector), then we can rewrite
(6.30) as
The matrix here is a properly normalized Jones matrix, even though we did not
bother factoring out E eff to make a properly normalized Jones vector, as specified
in (6.5). We can now write down the Jones matrix for a polarizer by inserting ξ = 0
into the matrix:
cos2 θ sin θ cos θ
· ¸
(polarizer with transmission axis at angle θ) (6.32)
sin θ cos θ sin2 θ
Notice that when θ = 0 this matrix reduces to that of a horizontal polarizer (6.19),
and when θ = π/2, it reduces to that of a vertical polarizer (6.20).
When a plane wave passes through a wave plate, the component of the electric
field oriented along the fast axis travels faster than its orthogonal counterpart,
which introduces a relative phase between the two polarization components.
As light passes through a wave plate of thickness d , the phase difference that
accumulates between the fast and the slow polarization components is
2πd
k slow d − k fast d = (n slow − n fast ) (6.33)
λvac
By adjusting the thickness of the wave plate, one can introduce any desired phase
difference.
Slow axis The most common types of wave plates are the quarter-wave plate and the
half-wave plate. The quarter-wave plate introduces a phase difference of
This describes a relative phase delay for the light emerging with polarization along
the slow axis. Substituting (6.36) into (6.30) yields the Jones matrix for a quarter
wave plate:
For the half-wave plate, the appropriate factor applied to the slow axis is
Remember that θ refers to the angle that the fast axis makes with respect to the
x-axis.
Before moving on, consider the following two examples that illustrate how
wave plates are often used:
Example 6.2
Calculate the Jones matrix for a quarter-wave plate at θ = 45◦ , and determine its
effect on horizontally polarized light.
Solution: At θ = 45◦ , the Jones matrix for the quarter-wave plate (6.37) reduces to
Figure 6.8 Animation showing
e i π/4
· ¸
1 −i ◦ effects of polarizers and wave
p (quarter-wave plate, fast axis at θ = 45 ) (6.40)
2 −i 1 plates on polarized light.
The overall phase factor e i π/4 in front is not important since it merely accompanies
the overall phase of the beam, which can be adjusted arbitrarily by moving the
light source forwards or backwards through a fraction of a wavelength.
Now we calculate the effect of the quarter-wave plates (oriented at θ = 45◦ ) operat-
ing on horizontally polarized light:
· ¸· ¸ · ¸
1 1 −i 1 1 1
p =p (6.41)
2 −i 1 0 2 −i
The previous example shows that a quarter-wave plate (properly oriented) can
turn linearly polarized light into right-circularly polarized light (see Table 6.1).
On the other hand, as seen in the next example, a half-wave plate can rotate the
polarization angle of linearly polarized light by varying degrees while preserving
the linear polarization.
Example 6.3
Calculate the effect of a half wave plate at an arbitrary θ on horizontally polarized
light.
The resulting Jones vector describes linearly polarized light an angle of α = 2θ from
the x-axis (see Table 6.1).
156 Chapter 6 Polarization of Light
multiplies the horizontal component of the field, and r s multiplies the vertical
component of the field. Especially in the case of reflection from an absorbing
surface such as a metal, the phases of the two polarization components can vary
markedly (see P6.11). Thus, linearly polarized light containing both s- and p-
components in general becomes elliptically polarized when reflected from such a
surface. When light undergoes total internal reflection, again the phases of the s-
and p-components differ markedly, which can cause linearly polarized light to
become elliptically polarized (see P6.12).
Transmission through a material interface can also influence the polarization
of the field, although typically to a lesser degree. However, there is no handedness
inversion, since the light continues on in a forward sense. The Jones matrix for
transmission is
· ¸
tp 0
(Jones matrix for transmission) (6.44)
0 ts
here. The rotation can be accomplished by multiplying the following matrix onto
the incident Jones vector: Figure 6.10 If the plane of inci-
dence does not coincide for suc-
cos θ sin θ cessive elements in an optical
· ¸
(rotation of coordinates through an angle θ) (6.45) system, a rotation matrix must be
− sin θ cos θ
applied to rotate the x-axis to the
This is understood as a rotation about the z-axis. The angle of rotation θ is plane of incidence before comput-
chosen such that the rotated x-axis lies in the plane of incidence for the mirror. ing the effect of each element.
When such a reorientation of coordinates is necessary, the two orthogonal field
components in the initial coordinate system are stirred together to form the field
components in the new system. This does not change the intrinsic characteristics
of the polarization, just its representation.
do not try to extract the helicity of the light, but only the ellipticity. In this case
only polarizers are needed, which can be made to work over a wide range of
wavelengths. If, in addition, a variety of incident angles are measured, it is possible
to extract detailed information about the optical constants n and κ and the
thicknesses of possibly many layers of materials influencing the reflection.
Commercial ellipsometers4 typically employ two polarizers, one before and
one after the sample, where s and p-polarized reflections take place. The first
polarizer ensures that linearly polarized light arrives at the test surface (polarized
at angle α to give both s and p-components). The Jones matrix for the test surface
reflection is given by (6.43), and the Jones matrix for the analyzing polarizer
oriented at angle θ is given by (6.32). The Jones vector for the light arriving at the
detector is then
cos2 θ sin θ cos θ cos α
· ¸· ¸· ¸
−r p 0
sin θ cos θ sin2 θ 0 rs sin α
−r p cos α cos2 θ + r s sin α sin θ cos θ
· ¸
= (6.46)
−r p cos α sin θ cos θ + r s sin α sin2 θ
and the intensity arriving to the detector is
¯2 ¯ ¯2
I ∝ ¯−r p cos α cos2 θ + r s sin α cos θ sin θ ¯ +¯−r p cos α cos θ sin θ + r s sin α sin2 θ ¯
¯
³ ´
¯ ¯2 r p r s∗ + r s r p∗
= ¯r p ¯ cos2 α cos2 θ + |r s |2 sin2 α sin2 θ − sin 2α sin 2θ
4
(6.47)
For ellipsometry measurements, it is customary to express the ratio of Fresnel
coefficients as
r p r s ≡ tan Ψe i ∆
±
(6.48)
In this case, the intensity may be shown to be proportional to (see problem P6.13)
I ∝ 1 − η sin 2θ + ξ cos 2θ (6.49)
where
tan Ψ cos ∆ tan α tan2 Ψ − tan2 α
η≡2 and ξ ≡ (6.50)
tan2 Ψ + tan2 α tan2 Ψ + tan2 α
In commercial ellipsometers, the angle θ of the analyzing polarizer often rotates at
a high speed, and the time dependence of the light reaching a detector is analyzed.
From this type of measurement, the coefficients η and ξ can be extracted with
high precision. Then equations (6.50) can be inverted (see problem P6.13) to
reveal s
1+ξ η
tan Ψ = |tan α| and cos ∆ = p sign(α) (6.51)
1−ξ 1 − ξ2
From a series of these types of measurements, it is possible to extract the values
of n and κ for materials from the expressions for r s and r p (with the aid of a
computer!). A more extensive series of such measurements are needed in the case
of multilayers involving multiple layers with varying thicknesses.
4 See Spectroscopic Ellipsometry Tutorial at J. A. Woollam Co.
6.B Partially Polarized Light 159
The main characteristic of unpolarized light is that it cannot be extinguished he became a professor of mathematics
at Cambridge where he later worked
by a single polarizer (even in combination with a wave plate). Moreover, the with James Clerk Maxwell and Lord
transmission of unpolarized light through an ideal polarizer is always 50%. On the Kelvin to form the Cambridge School
of Mathematical Physics. Stokes was a
other hand, polarized light (be it linearly, circularly, or elliptically polarized) can powerful mathematician as well as good
always be represented by a Jones vector, and it is always possible to extinguish experimentalist, often testing his the-
oretical solutions in the laboratory. In
polarized light with a wave plate and a single polarizer.
addition to his contributions to optics,
We may introduce the degree of polarization as the fraction of the intensity Stokes made important contributions to
The degree of polarization takes on values between zero and one. Thus, if the light
is completely unpolarized (such that I pol = 0), the degree of polarization is zero,
and if the beam is fully polarized (such that I un = 0), the degree of polarization is
one.
A Stokes vector, which characterizes a partially polarized beam, is written as
S0
S
1
S2
S3
The parameter
I
S0 ≡ (6.54)
I in
5 E. Hecht, Optics, 3rd ed., Sect. 8.12.1 (Massachusetts: Addison-Wesley, 1998).
160 Chapter 6 Polarization of Light
2I hor
S1 ≡ − S0 (6.55)
I in
Here, I hor represents the amount of light detected if an ideal linear polarizer is
placed with its axis aligned horizontally directly in front of the detector (inserted
where the light is characterized). S 1 ranges between negative one and one, taking
on its extremes when the light is linearly polarized either horizontally or vertically,
respectively. If the light has been attenuated, it may still be perfectly horizontally
polarized even if S 1 has a magnitude less than one. (Alternatively, you might wish
to examine S 1 /S 0 , which is guaranteed to range between negative one and one.)
The parameter S 2 describes how much the light looks linearly polarized along
the diagonals. It is given by
2I 45◦
S2 ≡ − S0 (6.56)
I in
Similar to the previous case, I 45◦ represents the amount of light detected if an
ideal linear polarizer is placed with its axis at 45◦ directly in front of the detector
(inserted where the light is characterized). As before, S 2 ranges between negative
one and one, taking on extremes when the light is linearly polarized either at 45◦
or 135◦ .
Finally, S 3 characterizes the extent to which the beam is either right or left
circularly polarized:
2I r-cir
S3 ≡ − S0 (6.57)
I in
Here, I r-cir represents the amount of light detected if an ideal right-circular po-
larizer is placed directly in front of the detector. A right-circular polarizer is
one that passes right-handed polarized light, but blocks left handed polarized
light. One way to construct such a polarizer is a quarter wave plate, followed
by a linear polarizer with the transmission axis aligned 45◦ from the wave-plate
fast axis, followed by another quarter wave plate at −45◦ from the polarizer (see
P6.14).6 Again, this parameter ranges between negative one and one, taking on
the extremes for right and left circular polarization, respectively.
Importantly, if any of the parameters S 1 , S 2 , or S 3 take on their extreme values
(i.e. a magnitude equal to S 0 ), the other two parameters necessarily equal zero. As
an example, if a beam is linearly horizontally polarized with I = I in , then we have
6 The final quarter wave plate is to put the light back into the original circular state – not needed
Example 6.4
Find the Stokes
£ parameters for perfectly polarized light, represented by an arbitrary
Jones vector BA where A and B are complex.7 Depending on the values A and B ,
¤
Solution: The input intensity of this polarized beam is I in = I pol = |A|2 + |B |2 , ac-
cording to Eq. (6.23), where we absorb the factor 12 ²0 c |E eff |2 into |A|2 and |B |2
for convenience. The Jones vector for the light that passes through a horizontal
polarizer is · ¸· ¸ · ¸
1 0 A A
=
0 0 B 0
which gives a measured intensity of I hor = |A|2 . Similarly, the Jones vector when
the beam is passed through a polarizer oriented at 45◦ is
A +B
· ¸· ¸ · ¸
1 1 1 A 1
=
2 1 1 B 2 1
leading to an intensity of
|A + B |2 |A|2 + |B |2 + A ∗ B + AB ∗
I 45◦ = =
2 2
Finally, the Jones vector for light passing through a right-circular polarizer (see
P6.14) is
A +iB
· ¸· ¸ · ¸
1 1 i A 1
=
2 −i 1 B 2 −i
giving an intensity of
|A + i B |2 |A|2 + |B |2 + i (A ∗ B − AB ∗ )
I r-cir = =
2 2
Thus, the Stokes parameters become
|A|2 + |B |2
S0 = =1
I in
|A|2 + |B |2 + A ∗ B + AB ∗ |A|2 + |B |2 A ∗ B + AB ∗
S2 = − =
I in I in I in
|A|2 + |B |2 + i (A ∗ B − AB ∗ ) |A|2 + |B |2 (A ∗ B − AB ∗ )
S3 = − =i
I in I in I in
I I pol + I un
S0 = = (6.58)
I in I in
and in the other cases the unpolarized portion of the light does not contribute to
the Stokes parameters. Half of the unpolarized light survives any of the test filters,
which cancels neatly with the unpolarized portion of S 0 in Eqs. (6.55)–(6.57).
With the aid of the results in Example 6.4, a completely general form of the
Stokes vector may then be written as
S0 I pol + I un
S 1 |A|2 − |B |2
1
= (6.59)
S 2 I in A ∗ B + AB ∗
S3 i (A ∗ B − AB ∗ )
where the Jones vector for the polarized portion of the light is
· ¸
A
B
Again, we have hidden the factor 12 ²0 c |E eff |2 for the polarized portion of the light
inside |A|2 and |B |2 .
We would like to express the degree ofq polarization in terms of the Stokes
parameters. We first note that the quantity S 12 + S 22 + S 32 can be expressed as
s
¶2 ¶2 ¶2
|A|2 − |B |2 (A ∗ B + AB ∗ ) i (A ∗ B − AB ∗ )
q µ µ µ
S 12 + S 22 + S 32 = + +
I in I in I in
|A|2 + |B |2 (6.61)
=
I in
I pol
=
I in
6.B Partially Polarized Light 163
Substituting (6.58) and (6.61) into the expression for the degree of polarization
(6.53) yields
1q 2
ξpol ≡ S 1 + S 22 + S 32 (6.62)
S0
If the light is polarized such that it perfectly transmits through or is perfectly
extinguished by one of the three test polarizers associated with S 1 , S 2 , or S 3 , then
the degree of polarization will be unity. Obviously, it is possible to have pure
polarization states that are not aligned with the axes of any one of these test
polarizers. In this situation, the degree of polarization is still one, although the
values S 1 , S 2 , and S 3 may all three contribute to (6.62).
Finally, it is possible to represent polarizing devices as matrices that operate
on the Stokes vectors in much the same way that Jones matrices operate on
Jones vectors. Since Stokes vectors are four-dimensional, the matrices used are
four-by-four. These are known as Mueller matrices.8
We know that the 50% of the unpolarized light transmits through a polarizer,
ending up as polarized light with Jones vector
r
A 01 I un cos θ
· ¸ · ¸
=
B 10 2 sin θ
(see table 6.1). As usual, let θ give the angle of the transmission axis relative to the
horizontal. The Jones matrix (6.23) acts on the polarized portion of the light as
follows
Hans Mueller (Swiss) was a shepherd
cos2 θ cos θ sin θ cos θ
· 0 ¸ · ¸· ¸ · ¸
A2 A
= = [A cos θ + B sin θ] until his late teens. As a physics pro-
B 20 cos θ sin θ sin2 θ B sin θ fessor at MIT, he built on the work of
Stokes and in 1943 formulated a ma-
trix method for manipulating Stokes
A 01 A 02
h i h i
One might be tempted to add B 10
and B 20
, but this would be wrong, since vectors. He was an engaging lecturer
into the 1950s and was known for his
the two beams are not coherent. As mentioned previously, unpolarized light exciting demonstrations. He was a stu-
necessarily contains multiple frequencies, and so the fields from the polarized and dent of Arnold Sommerfeld, and did
unpolarized beams destructively interfere as often as they constructively interfere. seminal work on ferroelectricity (he is
reported to have coined the term). See
In this case, we simply add intensities rather than fields. That is, we have
Laszlo Tisza,Adventures of a Theoret-
ical Physicist, Part II: America, Phys.
¯ A ¯ = ¯ A ¯ + ¯ A ¯ = I un + |A cos θ + B sin θ|2 cos2 θ
· ¸
11, 120-168 (2009).
¯ 0 ¯2 ¯ 0 ¯2 ¯ 0 ¯2
Perspect.
1 2
2
I un
· ¸
+ |A| cos θ + |B |2 sin2 θ + A ∗ B + AB ∗ sin θ cos θ cos2 θ
2 2
¡ ¢
=
2
S 0 cos 2θ
· ¸
sin 2θ
= I in + S1 + S 2 cos2 θ
2 2 2
Similarly,
Since the light has gone through a linear polarizer, we are guaranteed that A 0 and
B 0 have the same phase. Therefore, A 0∗ B 0 = A 0 B 0∗ = |A 0 ||B 0 |. In view of (6.59), these
results lead to
¯ 0 ¯2 ¯ 0 ¯2
¯ A ¯ + ¯B ¯ S 0 cos 2θ sin 2θ
0
S0 = = + S1 + S2
I in 2 2 2
¯ 0 ¯2 ¯ 0 ¯2 ·
¯ A ¯ − ¯B ¯ S 0 cos 2θ sin 2θ
¸
0
S 2 cos2 θ − sin2 θ
¡ ¢
S1 = = + S1 +
I in 2 2 2
cos 2θ cos2 2θ sin 4θ
= S0 + S1 + S2
2
¯ 0¯ ¯ 0¯ ¯ 0¯ ¯ 0¯2 4
¯ A ¯ ¯B ¯ + ¯ A ¯ ¯B ¯ ·
S 0 cos 2θ sin 2θ
¸
S 20 = =2 + S1 + S 2 cos θ sin θ
I in 2 2 2
sin 2θ sin 4θ sin2 2θ
= S0 + S1 + S2
¯ 20 ¯ ¯ 0 ¯ ¯ 40 ¯ ¯ 0 ¯ 2
¯ A ¯ ¯B ¯ − ¯ A ¯ ¯B ¯
S 30 = i =0
I in
S 00 1 cos 2θ sin 2θ 0 S0
1
0
S 1 cos 2θ
1 = cos2 2θ 2 sin 4θ 0 S1
1
S 0 2 sin 2θ sin2 2θ 0 S2
2 sin 4θ
2
S 30 0 0 0 0 S3
The Mueller matrix for a half wave plate is worked out below. The Mueller
matrix for a quarter wave plate is deferred to problem P6.15
We know that all of the light transmits through the wave plate. This immediately
gives
S 00 = S 0
The wave plate does nothing to unpolarized light. On the other hand, the polarized
portion of the light is influenced by the wave plate as follows (see (6.39)):
A0
· ¸ · ¸· ¸ · ¸
cos 2θ sin 2θ A A cos 2θ + B sin 2θ
= =
B0 sin 2θ − cos 2θ B A sin 2θ − B cos 2θ
As usual, θ is the angle of the fast axis relative to the horizontal. (As expected,
¯ 0 ¯ 2 ¯ 0 ¯2
¯ A ¯ + ¯B ¯ = |A|2 +|B |2 ; the intensity of the light is unaltered.) Using (6.59) we get
¯ 0 ¯ 2 ¯ 0 ¯2
¯ A ¯ − ¯B ¯ |A cos 2θ + B sin 2θ|2 − |A sin 2θ − B cos 2θ|2
S 10 = =
I in Ii n
¡ 2
|A| − |B |2 cos 4θ + (A ∗ B + AB ∗ ) sin 4θ
¢
= = S 1 cos 4θ + S 2 sin 4θ
Ii n
6.B Partially Polarized Light 165
¯ 0 ¯ 2 ¯ 0 ¯2
¯ A ¯ − ¯B ¯ |A cos 2θ + B sin 2θ|2 − |A sin 2θ − B cos 2θ|2
S 10 = =
I in Ii n
¡ 2
|A| − |B |2 cos 4θ + (A ∗ B + AB ∗ ) sin 4θ
¢
= = S 1 cos 4θ + S 2 sin 4θ
Ii n
A 0∗ B 0 + A 0 B 0∗
S 20 =
Ii n
(A ∗ cos 2θ + B ∗ sin 2θ) (A sin 2θ − B cos θ)
=
Ii n
(A cos 2θ + B sin 2θ) (A ∗ sin 2θ − B ∗ cos θ)
+
Ii n
|A|2 − |B |2 AB ∗ + A ∗ B
= sin 4θ − cos 4θ = S 1 sin 4θ − S 2 cos 4θ
Ii n Ii n
A 0∗ B 0 − A 0 B 0∗
S 30 = i
Ii n
(A ∗ cos 2θ + B ∗ sin 2θ) (A sin 2θ − B cos θ)
=i
Ii n
(A cos 2θ + B sin 2θ) (A ∗ sin 2θ − B ∗ cos θ)
−i
Ii n
A ∗ B − AB ∗
= −i = −S 3
Ii n
S 00 1 0 0 0 S0
S0 0 cos 4θ sin 4θ S1
0
1 =
S0 0 sin 4θ − cos 4θ 0 S2
2
S 30 0 0 0 −1 S3
Exercises
P6.2 Prove that if 0 < δ < π, the helicity is left-handed, and if π < δ < 2π the
helicity is right-handed.
HINT: Write the relevant real field associated with (6.5)
where φ is the phase of E eff . Freeze time at, say, t = φ/ω. Determine the
field at z = 0 and at z = λ/4 (a quarter cycle), say. If E (0, t ) × E (λ/4, t )
points in the direction of k, then the helicity matches that of a wood
screw.
Polarizer Polarizer
Laser
HINT: Linearly polarized light contains equal amounts of right and left
circularly polarized light. Consider
eiφ
· ¸ · ¸
1 1 1
+
2 i 2 −i
where φ is the phase delay of the right circular polarization. Show that
this can be written as
cos φ/2
· ¸
eiδ
sin φ/2
Exercises 167
sin α
where α is the angle of linearly polarized light (see table 6.1).
P6.4 For the following cases, what is the orientation of thepmajor axis, and
p = B = 1/ 2; δ = 0 Case II:
what is thepellipticity of the light? Case I: A
A = B = 1/ 2; δ = π/2; Case III: A = B = 1/ 2; δ = π/4.
P6.5 (a) Suppose that linearly polarized light is oriented at an angle α with
respect to the horizontal axis (x-axis) (see table 6.1). What fraction of
the original intensity gets through a vertically oriented polarizer?
(b) If the original light is right-circularly polarized, what fraction of the
original intensity gets through the same polarizer?
P6.7 (a) Suppose that linearly polarized light is oriented at an angle α with
respect to the horizontal or x-axis. What fraction of the original inten-
sity emerges from a polarizer oriented with its transmission at angle θ
from the x-axis?
Answer: cos2 (θ − α); compare with P6.5.
(b) If the original light is right circularly polarized, what fraction of the
original intensity emerges from the same polarizer?
Polarizer
Screen
HINT: A polarizer alone can reveal the direction of the major and minor
axes and the ellipticity, but it does not reveal the helicity. Use a quarter-
wave plate (oriented at a special angle θ) to convert the unknown
elliptically polarized light into linearly polarized light. A subsequent
polarizer can then extinguish the light, from which you can determine
the Jones vector of the light coming through the wave plate. This must
equal the original (unknown) Jones vector (6.11) operated on by the
wave plate (6.37). As you solve the matrix equation, it is helpful to note
that the inverse of (6.37) is its own complex conjugate.
Figure 6.13 Geometry for P6.11 P6.12 Calculate the angle θ to cut the glass in a Fresnel rhomb such that after
the two internal reflections there is a phase difference of π/2 between
Exercises 169
the two polarization states. The rhomb then acts as a quarter wave
plate.
Fresnel
HINT: You need to find the phase difference between (3.40) and (3.41). Rhomb
Set the difference equal to π/4 for each bounce. The equation you get
does not have a clean analytic solution, but you can plot it to find a
numerical solution. Side
View
= 50◦ and θ ∼
Answer: There are two angles that work: θ ∼ = 53◦ .
P6.13 Derive (6.49) and (6.51), often used for ellipsometry measurements.
1−cos 2θ 1+cos 2θ
HINT: Using sin2 θ = 2 and cos2 θ = 2 , first show
³ ´. ¯ ¯2 . 2
r p r s∗ + r s r p∗ |r s |2 tan α ¯r p ¯ |r s | − tan2 α
I ∝ 1− ¯ ¯ . sin 2θ + ¯ ¯ . cos 2θ
¯r p ¯2 |r s |2 + tan2 α ¯r p ¯2 |r s |2 + tan2 α
Superposition of Quasi-Parallel
Plane Waves
171
172 Chapter 7 Superposition of Quasi-Parallel Plane Waves
ment of the center of the wave packet. For narrowband packets (i.e. packets
comprised of a narrow range of frequencies and hence long duration), the packet
tends to maintain its shape (with some spreading) while propagating at the group
velocity. On the other hand, broadband pulses (i.e. packets comprised of many
frequencies and possibly of short duration) tend to distort severely while prop-
agating in materials. Nevertheless, the group velocity tracks the center of the
pulse.
It turns out that group velocity can become superluminal when significant
absorption and/or amplification of the light pulse is involved. This is no cause
for alarm (nor is it cause for an abundance of gee-wiz papers on the subject).
Absorption and amplification can cause a pulse to appear to move unexpectedly
fast through a reshaping effect. Group velocity, or rather its inverse group delay,
takes this into account, which makes it remarkably general. In such a scenario,
Sir Isaac Newton (16431727, En- energy can be lost from the back of a pulse or perhaps added to an already-present
glish) was born in Lincolnshire, England
three months after the death of his fa-
forward portion of a pulse such that the average pulse position appears to advance
ther who was a farmer. Newton spent abruptly. When all energy is accounted for (both the energy in the medium and in
much of his childhood with his ma-
the light pulse), however, nothing advances faster than the universal speed limit
ternal grandmother, after his mother
remarried. (Newton did not like his step- c. Appendix 7.B gives a good look under the hood at how a medium exchanges
father.) In his teenage years, Newton's energy with a pulse to produce these eye-catching effects.
mother tried to persuade him to take up
farming, but his love for education won
out. He became the top-ranked student
and was admitted into Trinity College, 7.1 Intensity of Superimposed Plane Waves
Cambridge at age 18. Newton was in-
uenced by the works of Descartes,
Copernicus, Galileo, and Kepler. Upon
We can construct arbitrary waveforms by adding together many plane waves with
graduation four years later, the univer- different propagation directions, amplitudes, phases, frequencies and polariza-
sity closed for two years because of a
tions. Consider the following discrete sum of plane waves:
plague. Newton's return to farm life co-
incided with a remarkable period when
E j e i (k j ·r−ω j t )
X
he rst developed ideas on calculus, E(r, t ) = (7.1)
gravitation, and optics. Newton later j
returned to Cambridge where he spent
his extraordinarily prolic career and
The corresponding magnetic field according to (2.56) is
became the rst scientist to be knighted.
In optics, Newton advanced the ray the-
ory of light and image formation. He
X kj × Ej
B j e i (k j ·r−ω j t ) = e i (k j ·r−ω j t )
X
B(r, t ) = (7.2)
showed that `white' light is comprised of
j j ωj
many colors and that the amount of re-
fraction depends on color. He built the
rst reecting telescope, which avoids As usual, the (time- and space-independent) individual field components E j
chromatic aberration. Newton advo-
cated against the wave theory of light in
contain both amplitude and phase information for each plane wave.
favor of his `corpuscular' theory. (Imag- The Poynting vector (2.52) associated with the fields (7.1) and (7.2) is
ining that by this Newton foresaw the
quantized nature of light energy gives Re {B (r, t )}
too much credit!) (Wikipedia) S(r, t ) = Re{E (r, t )} ×
µ0
1 n o ³ n o´ (7.3)
Re E j e i (k j ·r−ω j t ) × km × Re Em e i (km ·r−ωm t )
X
=
j ,m ωm µ0
where we have assumed that the km vectors are real. (Recall the conspiracy that
only the real parts of the fields are relevant – crucial when multiplying.) The above
expression is cumbersome because of the many cross terms that arise when
7.1 Intensity of Superimposed Plane Waves 173
For simplicity, we assume that all vectors k j are real. If the wave vectors are
complex, the result is essentially the same, but, as in (2.62), the field amplitudes E j
correspond to local amplitudes (adjusted for absorption or amplification during
prior propagation). We apply the BAC-CAB rule (P0.3) to (7.3) and obtain
X 1 h ³ n ¡ ¢o n o´
S(r, t ) = km Re E j e i k j ·r−ω j t · Re Em e i (km ·r−ωm t )
j ,m ωm µ0 (7.4)
n o³ n o ¡ ´i ¢
− Re Em e i (km ·r−ωm t ) Re E j e i k j ·r−ω j t
· km
The last term in (7.4) can be dismissed if all k-vectors are approximately parallel to
each other, in which case all of the km are essentially perpendicular to each of the
E j . We will make this rather stringent assumption and kill the last line in (7.4). The
magnitude of the Poynting vector then becomes (with the help of (0.30))
¡ ¢ ¡ ¢
i k ·r−ω j t
X km E j e j + E∗j e −i k j ·r−ω j t
S(r, t ) =
j ,m ωm µ0 2
( )
Em e i (km ·r−ωm t ) + E∗m e −i (km ·r−ωm t )
·
2 (parallel k-vectors)
X km n
E j · Em e i k j +km ·r− ω j +ωm t + E∗j · E∗m e −i k j +km ·r− ω j +ωm t
£¡ ¢ ¡ ¢ ¤ £¡ ¢ ¡ ¢ ¤
=
j ,m 4ωm µ0
¢ ¤o
+ E j · E∗m e i k j −km ·r− ω j −ωm t + E∗j · Em e −i k j −km ·r− ω j −ωm t
£¡ ¢ ¡ ¢ ¤ £¡ ¢ ¡
(7.5)
The terms involving (ω j + ωm )t oscillate rapidly and time-average to zero. By
comparison, the terms involving (ω j − ωm )t oscillate slowly (especially when the
ω j are all in the neighborhood of the ωm ) or not at all when j = m. We retain the
slower fluctuations and discard the rapid oscillations. For purposes of computing
the intensity (as opposed to determining phase changes with propagation) we can
approximate the index as a constant, and write k m /(ωm µ0 ) ≈ n²0 c. (We seldom
measure intensity inside of materials anyway.) With these simplifications, (7.5)
becomes
∗ i k j −km ·r− ω j −ωm t
£¡ ¢ ¡ ¢ ¤ £¡ ¢ ¤
n²0 c X E j · Em e + E∗j · Em e −i k j −km ·r−(ωn −ωm )t
〈S(r, t )〉osc =
2 j ,m 2
( )
n²0 c X ¡
i k j ·r−ω j t
¢ X
∗ −i (km ·r−ωm t )
= Re Ej e · Em e
2 j m
n²0 c ©
Re E (r, t ) · E∗ (r, t ) .
ª
=
2
(parallel k-vectors) (7.6)
174 Chapter 7 Superposition of Quasi-Parallel Plane Waves
As we previously studied (see P1.9), the velocities of the wave crests for these two
Figure 7.1 Animation showing su-
waves are
perposition of two plane waves
(electric fields) with different fre-
v p1 = ω1 /k 1 and v p2 = ω2 /k 2 (7.9)
quencies and traveling at different These are known as the phase velocities of the individual plane waves.
speeds.
7.2 Group vs. Phase Velocity: Sum of Two Plane Waves 175
Next consider a composite wave created from the superposition of the above
two plane waves:
E(r, t ) = E0 e i (k1 ·r−ω1 t ) + E0 e i (k2 ·r−ω2 t ) (7.10)
The two plane waves interfere, producing regions of higher and lower intensity
that move in time. Remarkably, these intensity peaks can propagate at speeds
quite different from either of the phase velocities in (7.9). The intensity (7.7) for
the field (7.10) is computed as follows:
n²0 c h ih i
I (r, t ) = E0 · E∗0 e i (k1 ·r−ω1 t ) + e i (k2 ·r−ω2 t ) e −i (k1 ·r−ω1 t ) + e −i (k2 ·r−ω2 t )
2
n²0 c h i
= E0 · E∗0 2 + e i [(k2 −k1 )·r−(ω2 −ω1 )t ] + e −i [(k2 −k1 )·r−(ω2 −ω1 )t ]
2 (7.11)
∗
= n²0 cE0 · E0 [1 + cos [(k2 − k1 ) · r − (ω2 − ω1 ) t ]]
= n²0 cE0 · E∗0 [1 + cos (∆k · r − ∆ωt )]
where
∆k ≡ k2 − k1
(7.12)
∆ω ≡ ω2 − ω1
The darker line in Fig. 7.2 shows the intensity computed with (7.11). Keep in
mind that this intensity is averaged over rapid oscillations. For comparison, the
lighter line shows the Poynting flux with the rapid oscillations retained, according
Intensity
to (7.5). It is left as an exercise (see P7.3) to show that the rapid-oscillation peaks
in Fig. 7.2 move with a phase velocity derived from the average k and average ω
of the two plane waves:
ave{ω}
vp ≡ (7.13)
ave{k}
An examination of the cosine argument in (7.11) reveals that the time-averaged
curve in Fig. 7.2 (solid) travels with speed Position
Example 7.1
Determine the phase velocity and group velocity for the superposition of two plane
waves in a plasma (see P2.7).
176 Chapter 7 Superposition of Quasi-Parallel Plane Waves
The function E (r, ω), called the spectrum, has units of field per frequency. Essen-
tially, it gives the amplitude and phase of each plane wave that makes up the over-
all waveform. It includes any spatially dependent factors such as exp {i k (ω) · r}.
We distinguish the spectrum E (r, ω) from the wholly separate function E(r, t ) by
its argument (i.e. ω instead of t ). (Sorry for using E for both functions, but this
is standard notation.) The operation (7.18) is called an inverse Fourier transform
as outlined in section 0.4; actually, it would be a good idea to review section 0.4
p
thoroughly. Now. Why haven’t you turned to section 0.4 yet? The factor 1/ 2π is
introduced to match our Fourier-transform convention. Regardless of what the
function is called, please notice that (7.18) merely sums together a range of plane
waves in much the same way that our earlier discrete summation (7.1) does.
If we already have/know a waveform E(r, t ), one might wonder what plane
waves should be added together in order to construct it. Equation (7.18) can be
inverted, which remarkably has a very similar form:
Z∞
1
E (r, ω) = p E (r, t ) e i ωt d t (7.19)
2π
−∞
This operation is called the Fourier transform. It is used to generate the spectrum
E (r, ω) from the field E(r, t ) in much the same way that (7.18) is used to generate
the field E(r, t ) from the spectrum E (r, ω).
Although only the real part of E(r, t ) is physically relevant, we can continue
our habit of working with the complex field and taking the real part of E (r, t ) at
our leisure.1 In fact, we will find it advantageous to work with the complex field
instead of only the real part. We will not run into trouble as long as we remember
never to discard the imaginary part of E (r, ω), only the imaginary part of E (r, t ).
The intensity formula (7.7) remains useful for continuous superpositions of
plane waves (i.e. a field defined by the inverse Fourier transform (7.18)):
n²0 c
I (r, t ) ≡ E(r, t ) · E∗ (r, t ) (7.20)
2
Remember, this formula specifically requires the fields to be in complex for-
mat, and it takes care of the time-average over rapid oscillations automatically.2
Moreover, the above expression for I (r, t ) assumes that all relevant k-vectors are
essentially parallel.
Similarly, we will define the power spectrum produced from E (r, ω), which we
write as
n²0 c
I (r, ω) ≡ E (r, ω) · E∗ (r, ω) (7.21)
2
1 Since Fourier transforms are linear, one can take the Fourier transform of the real and imaginary
parts of a field separately. Appropriate modifications to E (r, ω) in the frequency domain will not
cause the two parts to become mingled. Upon taking the inverse Fourier transform to obtain E(r, t )
again, the original real part remains purely real, and the original imaginary part remains purely
imaginary.
2 To use this expression there needs to be a sufficient number of oscillations within the waveform
The power spectrum I (r, ω) is what one observes when the waveform is sent into
a spectral analyzer or spectrometer. We must apologize again for the potentially
confusing notation (in wide usage): I (r, ω) is not the Fourier transform of I (r, t )!
They are defined exclusively through (7.20) and (7.21).
Parseval’s theorem (see Example 0.7) imposes an interesting connection be-
tween the time-integral of the intensity and the frequency-integral of the power
spectrum:
Z∞ Z∞
I (r, t )d t = I (r, ω) d ω (7.22)
−∞ −∞
With the above formalities out of the way, we will illustrate the use of Fourier
transforms through some examples.
Example 7.2
Find E (r, ω) associated with the field
2 2T 2
e −i ω0 t
±
E(r, t ) = E0 (r) e −t (7.23)
Figure 7.3 Real part of electric
field (7.23) with T = 4π/ω0 and The real part of this field is shown in Fig. 7.3 for two different durations T . The
T = 10π/ω0 , where 2π/ω0 is the intensity profile computed by (7.20) is shown in Fig. 7.4 .
period of the carrier frequency.
Solution: The argument r is unimportant to our calculation. It merely specifies
that we are considering the field at the point r. We compute the Fourier transform
as follows:
Z∞
1 2
± 2
E (r, ω) = p E0 (r) e −t 2T e −i ω0 t e i ωt d t
2π
−∞
(7.24)
Z∞
E0 (r) −t 2 /2T 2 +i (ω−ω0 )t
= p e dt
2π
−∞
This integral can be performed with the help of (0.55), and we obtain
T 2 (ω−ω0 )2
E (r, ω) = T E0 (r) e − 2 (7.25)
Notice that E (r, ω) has units of Field multiplied by time, or in other words, field per
frequency.
Example 7.3
Check Parseval’s theorem for the field and spectrum in Example 7.2.
Z∞ Z∞
n²0 c 2
±
T2
I (r, t )d t = E0 (r) · E∗0 (r) e −t dt
2
−∞ −∞
n²0 c p
= E0 (r) · E∗0 (r) T π
2
where we have used (0.55) to perform the integration. This result has units of
energy per area. It is the energy per area absorbed by a detector after the pulse has
concluded. The frequency integration in (7.22) yields
Figure 7.5 Spectral components
Z∞ Z∞ (7.25) of the fields in Fig. 7.3 with
n²0 c 2
(ω−ω0 )2 T = 4π/ω0 and T = 10π/ω0 , where
I (r, ω) d ω = E0 (r) · E∗0 (r) T 2 e −T dω
2 2π/ω0 is the period of the carrier
−∞ −∞
p frequency.
n²0 c π
= E0 (r) · E∗0 (r) T 2
2 T
which is the same answer.
Example 7.4
Take the inverse Fourier transform of (7.25) to recover the original waveform (7.23).
This integral can be performed with the help of (0.55), which gives
2
(T 2 ω0 −i t ) T 2 ω2
π
r 0
T E0 (r) 4(T 2 /2)
− 2
E(r, t ) = p e
2π T 2 /2
2 2T 2
e −i ω0 t
±
= E0 (r) e −t
Since only the real part of the time profile E(r, t ) is physically relevant, you
might be curious about how the Fourier transform of the real part of the field
compares with that of the complex version of the field that we have been using.
Indeed, there are situations where it is more appropriate to use the real version
of the field rather than its complex form. For example, if a waveform includes
multiple propagation directions or if a waveform contains only a few cycles, then
the motivation/interpretation behind (7.20) and the convenience of the complex
format begins to wane.
Example 7.5
Take the Fourier transform of just the real part of waveform (7.23).
E(r, t ) + E∗ (r, t )
Er (r, t ) =
2
−i ω0 t
(7.27)
−t 2 2T 2 E0 (r) e
± + E∗0 (r) e i ω0 t
=e
2
2 /2T 2
If E0 (r) is real, then this field can be written as E0 (r) e −t cos (ω0 t ). The Fourier
transform (7.19) yields (see P0.24)
From the above example, you might notice that the transform of the real
part of a field tends to be more cumbersome than the transform of the entire
complex field. For the real field, both positive and negative frequency components
contribute to the overall spectrum.3 Moreover, the Fourier transform of a real
function Er (r, t ) obeys the following symmetry relation:
The spectrum in Fig. 7.7 obeys this symmetry relation, whereas the Fourier trans-
form of the complex field depicted in Fig. 7.5 does not.
3 Essentially, the spectrum of the complex representation of the field can be understood to be
twice the spectrum of the real representation, but plotted only for the positive frequencies.
7.4 Packet Propagation and Group Delay 181
Z∞
1
E(r0 + ∆r, t ) = p E(r0 + ∆r, ω)e −i ωt d ω
2π
−∞
Z∞
1
= p E(r0 , ω)e i (k(ω)·∆r−ωt ) d ω (7.31)
2π
−∞
T 2 (ω−ω0 )2
E (0, ω) = T E0 e − 2
4 See J. D. Jackson, Classical Electrodynamics, 3rd ed., Sect. 7.8 (New York: John Wiley, 1999).
182 Chapter 7 Superposition of Quasi-Parallel Plane Waves
To find the field downstream we invoke (7.30), assuming k (ω) = k vac (ω) ẑ =
ω
c ẑ, which gives the appropriate phase shift for each plane wave compo-
nent: 2 2 T (ω−ω0 ) ω
E (z, ω) = E (0, ω) e i k(ω)·∆r = T E0 e − 2 ei c z
We compute the final waveform using (7.31) and obtain
Z∞
1 T 2 (ω−ω0 )2 ω
2
− (t −z/c)
E (z, t ) = p E0 Te − 2 e i c z e −i ωt d ω = E0 e 2
2T e −i ω0 (t −z/c)
2π
−∞
(7.32)
Not surprisingly, after traveling a distance z though vacuum, the pulse looks
identical to the original pulse, only delayed by time z/c.
∂k ¯¯ 1 ∂2 k ¯¯
· ¯ ¯ ¸
k · ∆r ∼
= k|ω0 + (ω − ω0 ) + (ω − ω0 ) 2
+ · · · · ∆r (7.33)
∂ω ¯ω0 2 ∂ω2 ¯ω0
and assumed that the imaginary part of k is roughly constant near ω0 so that t 0
is real. Then the integral in (7.34) is recognized as the Fourier transform of the
original pulse with a new time argument:
0
E (r0 + ∆r, t ) = E r0 , t − t 0 e i (k(ω0 )·∆r−ω0 t )
¡ ¢
(7.36)
7.5 Quadratic Dispersion 183
Notice that (7.32) from Example 7.6 agrees with this result, since kvac (ω0 ) · ∆r =
ω0 z/c. The second factor in (7.36) merely gives an overall phase shift due to prop-
agation. The phase shift is dictated by the phase velocity of the carrier frequency
(see (7.9)):
ω0
v p (ω0 ) = (7.37)
k (ω0 )
Otherwise (7.36) is unaltered except for a delay t 0 , the time required for the pulse
to traverse the displacement ∆r.
The function ∂Rek ∂ω · ∆r is known as the group delay function, and in (7.35)
±
∂Re{k(ω)} ¯¯
¯
−1
v g (ω0 ) = (7.38)
∂ω ¯
ω0
Group delay (or group velocity) essentially tracks the center of the packet.
In our derivation we have assumed that the phase delay k(ω)·∆r could be well-
represented by the first two terms of the expansion (7.33). While this assumption
gives results that are often useful, higher-order terms can also play a role. In
section 7.5 we’ll find that the next term in the expansion controls the rate at which
the wave packet spreads as it travels. We should also note that there are times
when the expansion (7.33) fails to converge (when ω0 is near a resonance of the
medium), and the above expansion approach is not valid. We’ll analyze pulse
propagation in these sticky situations in section 7.6.
k (ω) z ∼
= k 0 z + v g−1 (ω − ω0 ) z + α (ω − ω0 )2 z + · · · (7.39)
where
ω0 n (ω0 )
k 0 ≡ k (ω0 ) = (7.40)
c
∂k ¯ = (ω0 ) + ω0 n (ω0 )
0
¯
n
v g−1 ≡
¯
(7.41)
∂ω ¯ω0 c c
2 ¯¯
1 ∂ k¯ n (ω0 ) ω0 n 00 (ω0 )
0
α≡ = + (7.42)
2 ∂ω2 ¯ ω0 c 2c
As before, we have supposed that the imaginary part of the index is negligible.
Unfortunately, we can’t calculate a general formula for the affect of quadratic
dispersion on an arbitrary initial pulse. However, we can get a general idea for
how quadratic dispersion works by considering the specific example of a Gaussian
pulse.
Example 7.7
25 fs A Gaussian waveform similar to that in Example 7.6 propagates throught a piece of
56 fs
glass with thickness ∆r = z. Compute the waveform exiting the glass.
Solution: Again, the Fourier transform of the Gaussian pulse before propagation
Figure 7.8 A 25 fs pulse traversing is given by (7.25):
T 2 (ω−ω0 )2
an ` = 1 cm piece of BK7 glass. E (0, ω) = T E0 e − 2
With the aid of expansion (7.39), the inverse Fourier transform (7.31) (which yields
the pulse after propagation) becomes
Z∞ T 2 (ω−ω0 )2
1 −1
(ω−ω0 )z+i α(ω−ω0 )2 z −i ωt
E (z, t ) = p E0 Te − 2 e i k0 z+i v g e dω
2π
−∞
(7.43)
Z∞
T E0 e i (k0 z−ω0 t ) −(T 2 /2−i αz )(ω−ω0 )2 i v g−1 (ω−ω0 )z−i (ω−ω0 )t
= p e e dω
2π
−∞
The above integral can be performed with the aid of (0.55). The result is
2
(
t −z/v g )
T E0 e i (k0 z−ω0 t ) π −
s
T 2
4 2 1−i 2αz/T 2
( )
E (z, t ) = p ¢e
T2
¡
2π 2 1 − i 2αz/T 2
2 (7.45)
i
tan−1 2αz
−
( t −z/v g ) (1+i 2αz/T 2 )
e2 T2 2
³ ´
2T 2 1+ 2αz/T 2
( )
= E0 e i (k0 z−ω0 t ) q ¢2 e
4 ¡
1 + 2αz/T 2
where
2α
Φ(z) ≡ z (7.47)
T2
and p
T̃ (z) ≡ T 1 + Φ2 (z) (7.48)
We can immediately make a few observations about (7.46). First, note that
at z = 0 (i.e. zero thickness of glass), (7.46) reduces to the input pulse E (0, t ) =
E0 e −t /2T e −i ω0 t , as it should. Secondly, the peak of the pulse moves at speed v g
2 2
2 2
since the factor e −(t −z/v g ) /2T̃ (z) controls the pulse amplitude, while the other
terms (multiplied by i ) in the exponent of (7.46) merely alter the phase. Also
note that the duration of the pulse increases and its peak intensity decreases as it
travels, since T̃ (z) increases with z. In P7.8 we will find that (7.46) also predicts
that for large z, the field of the spread-out pulse oscillates less rapidly at the begin- Figure 7.9 Animation of a
ning of the pulse than at the end (assuming α > 0). This phenomenon, known as Gaussian-envelope pulse (elec-
pulse chirping, means that red frequencies get ahead of blue frequencies during tric field) undergoing dispersion
propagation since the red frequencies experience a lower index of refraction. during transit.
While Example 7.7 is worked out for the specific case of a Gaussian pulse,
the results are qualitatively similar for all pulses. The exact details vary with
pulse shape, but all short pulses eventually broaden and chirp as they propagate
through a dispersive medium such as glass. Higher-order terms in the expansion
(7.33) that were neglected cause additional spreading, chirping, and other defor-
mations to the pulses as they propagates. The influence of each order becomes
progressively more cumbersome to study analytically. In that case, it is easier to
perform the inverse Fourier transform numerically; there is no need to resort to
the expansion of k (ω) if the integration is done numerically.
Figure 7.11 Transit time defined as the difference between arrival time at two points.
describe accurately the phase delay k (ω) · ∆r. Moreover, if the bandwidth of the
waveform is wider than the spectral resonance of the medium, the series alto-
gether fails to converge. These difficulties have led to the traditional viewpoint
that group velocity loses meaning for broadband waveforms near a resonance. In
this section, we study a broader context for group velocity (or rather its inverse,
group delay d k/d ω), which is always valid, even for broadband pulses where the
expansion (7.33) utterly fails. The analysis avoids the expansion and so is not
restricted to a narrowband context.
We are interested in the arrival time of a waveform (or pulse) to a point, say,
where a detector is located. The definition of the arrival time of pulse energy
need only involve the Poynting flux (or the intensity), since it alone is responsible
for energy transport. To deal with arbitrary broadband pulses, the arrival time
should avoid presupposing a specific pulse shape, since the pulse may evolve
in complicated ways during propagation. For example, the pulse peak or the
midpoint on the rising edge of a pulse are poor indicators of arrival time if the
pulse contains multiple peaks or a long and non-uniform rise time.
For the reasons given, we use a time expectation integral (or time ‘center-of-
mass’) to describe the arrival time of a pulse:
R∞
t I (r, t ) d t
−∞
〈t 〉r ≡ (7.49)
R∞
I (r, t ) d t
−∞
For simplification, we have assumed that the light travels in a uniform direction
by using intensity rather than the Poynting vector.
Consider a pulse as it travels from point r0 to point r = r0 + ∆r in a homoge-
neous medium. The difference in arrival times at the two points is
∆t ≡ 〈t 〉r − 〈t 〉r0 (7.50)
The pulse shape can evolve in complicated ways between the two points, spread-
ing with different portions being absorbed (or amplified) during transit as de-
picted in Fig. 7.11. Nevertheless, (7.50) renders an unambiguous time interval
between the passage of the pulse center at each point.
7.6 Generalized Context for Group Delay 187
This difference in arrival time can be shown to consist of two terms (see
P7.11):6
∆t = ∆tG (r) + ∆t R (r0 ) (7.51)
The first term, called the net group delay, dominates if the field waveform is
initially symmetric in time (e.g. an unchirped Gaussian). It amounts to a spectral
average of the group delay function taken with respect to the spectral content of
the pulse arriving at the final point r = r0 + ∆r: Before Propagation
R∞ ³
∂Rek
´
I (r, ω) ∂ω · ∆r d ω
−∞
∆tG (r) = (7.52)
R∞
I (r, ω) d ω
−∞
where I (r, ω) is given in (7.21). The two curves in Fig. 7.12 show I (r0 , ω) (before
propagation) and I (r, ω) (after propagation) for an initially Gaussian pulse. As
After Propagation
seen in (7.52), the pulse travel time depends on the spectral shape of the pulse at
the end of propagation.
Note the close resemblance between the formulas (7.49) and (7.52). Both are
expectation integrals. The former is executed as a ‘center-of-mass’ integral on Figure 7.12 Normalized power
time; the latter is executed in the frequency domain on ∂Rek · ∆r/∂ω, the group spectrum of a broadband pulse
delay function (7.38). The group delay at every frequency present in the pulse before and after propagation
through an absorbing medium
influences the result. If the pulse has a narrow bandwidth in the neighborhood
with the complex index shown in
of ω0 , the integral reduces to ∂Rek/∂ω|ω0 · ∆r, in agreement with (7.38) (see P7.9).
Fig. 7.10. The absorption line eats
The net group delay depends only on the spectral content of the pulse, indepen- a hole in the spectrum.
dent of its temporal organization (i.e., the phase of E (r, ω) has no influence). Only
the real part of the k-vector plays a direct role in (7.52).
The second term in (7.51) is the reshaping delay ∆t R . It represents a delay
that arises solely from a reshaping of the spectral amplitude. Often this term is
negligible. The term takes into account how the pulse time center-of-mass shifts
as portions of the spectrum are removed (or added), as illustrated in Fig. 7.13. It
is computed at r0 before propagation takes place:7
Here 〈t 〉r0 represents the usual arrival time of the pulse at the initial point r0 ,
according to (7.49). The intensity at this point is associated with a field E (r0 , t )
whose spectrum is E (r0 , ω). On the other hand,
¯
〈t 〉 r0 altered is the arrival time of
¯
a pulse with modified spectrum E (r0 , ω) e −Imk·∆r . Notice that E (r0 , ω) e −Imk·∆r is Figure 7.13 The center of a
still evaluated at the initial point r0 . Only the spectral amplitude (not the phase) chirped pulse can shift owing
is modified, according to what is anticipated to be lost (or gained) during the trip. to the reshaping effect when spec-
trum is removed.
In contrast to the net group delay, the reshaping delay is sensitive to how a pulse
6 M. Ware, S. A. Glasgow, and J. Peatross, “The Role of Group Velocity in Tracking Field Energy in
net group delay should be computed with the initial rather than final spectrum.
188 Chapter 7 Superposition of Quasi-Parallel Plane Waves
Example 7.8
Find the time required for a Gaussian pulse (7.23) to traverse a slab of absorption
material (neglecting possible surface reflections). Let the material response be
described by the Lorentz model described in section 2.2 with the carrier frequency
of the pulse ω0 , coinciding with the material resonance frequency. Let the slab
have thickness ∆r = cγ−1 /10 and absorption strength ω2p = 10γ.
Solution: The spectrum of the initially Gaussian pulse is given by (7.25), and its
power spectrum is8
2 2
I (r0 , ω) ∝ e −T (ω−ω0 )
After propagating from r0 to r = r0 + ∆r , the power spectrum becomes
2
(ω−ω0 )2 −2 κ(ω)ω
c ∆r
I (r, ω) ∝ e −T e
R∞ ³
∂(ωn/c)
´ R∞ 2
(ω−ω0 )2 −2 κω
³
∂n
´
Figure 7.14 Animation compar- I (r, ω) dω e −T e c ∆r n + ω ∂ω dω
∂ω ∆r
−∞ −∞
ing narrowband vs. broadband ∆tG (r) = ∆r =
R∞ c R∞ κω
c ∆r
2 (ω−ω )2
Gaussian pulses traversing an I (r, ω)d ω e −T 0 e −2 dω
absorbing slab (green stripe) on −∞ −∞
resonance. Note the logarithmic The index of refraction n + i κ is given by (2.39) (see also (2.27) and (2.29)). Since
scale. See Example 7.8. the expressions for n and κ are complicated, the integration in the above formula
must be performed numerically.
p
The result when T = T1 = 10γ−1 / 2 (narrowband) is
The narrowband pulse (with duration T1 ) in Example 7.8 traverses the ab-
sorbing medium superluminally (i.e. faster than c). The negative transit time
means that the ‘center-of-mass’ of the exiting pulse emerges even before the
‘center-of-mass’ of the entering pulse reaches the medium! On the other hand,
the broadband pulse (with the shorter duration T2 ) has a large positive delay time,
indicating that the exiting pulse emerges subluminally.
8 In general, one should write ω̄ to distinguish the carrier frequency of the pulse from the
0
resonance frequency of the material ω0 ; in practice, these are often different.
7.A Pulse Chirping in a Grating Pair 189
Figure 7.14 shows the intensity profiles for these two pulses as they traverse
the absorption slab, calculated with the aid of (7.31). By eye, one can see how
the centers of the two pulses are either advanced or delayed as they go through
the absorption medium. In both cases, the pulse that emerges is well within
the envelope of the original pulse propagated forward at c. In the case of the
broadband pulse, the absorption peak eats a hole in the center of the spectrum
as shown in Fig. 7.12, causing the emerging pulse to be distorted in time. The
analysis in this section predicts the center of pulses, whereas to see the shape of Figure 7.15 Delay as a function of
pulses one needs to calculate (7.31). pulse duration.
The results for the two pulse durations in Example 7.8 indicate a trend. Su-
perluminal behavior only occurs for long boring pulses. In the case of a single
absorption resonance, this comes with a severe cost of attenuation. Figure 7.15
shows the delay time as a function of pulse duration. As the injected pulse be-
comes more sharply defined in time, the superluminal behavior does not persist.
Sharply defined waveforms (i.e. broadband) cannot propagate superluminally
precisely because much of their bandwidth lies away from the frequencies with
superluminal group delays.
We should mention that superluminal propagation cannot persist for indefi-
nite distances since the medium eventually removes the superluminal spectral
components through absorption (or else adds subluminal spectral components Figure 7.16 Narrowband pulse
in the case of amplification). This limits the amount that a pulse center can be traversing an absorbing medium.
advanced—on the scale of the pulse’s own duration.
As we saw for the absorption situation the exiting pulse is tiny and resides
well within the original envelope of the pulse propagated forward at speed c,
as depicted in Fig. 7.16. Without the absorbing material in place, the signal
would be detectable just as early. This statement is also true for any spectral
behavior of a medium, including amplifying media. you can use the Lorentz
model (2.40) to describe an amplifying medium with a negative oscillator strength
f . Figure 7.17 shows narrowband and broadband pulse traversing an amplifying
medium. In this case, superluminal behavior occurs for spectra near by but not
on an amplifying resonance. If the pulse is too broadband, its spectrum will be
amplified, which adds slower components to the overall group delay.
Figure 7.17 Animation compar-
ing narrowband vs. broadband
Appendix 7.A Pulse Chirping in a Grating Pair Gaussian pulses traversing an am-
plifying slab (green stripe) slightly
Grating pairs can be used to introduce large amounts of dispersion into a light off resonance.
pulse. Gratings are especially useful for amplification of ultrashort laser pulses,
where laser pulses are first stretched in time before amplification (to prevent
damage to the amplifier) and then compressed back to short duration just before
the experiment (called chirped pulse amplification). Diffraction from a grating
causes each k-vector to travel at a different angle. A second grating parallel to the
first can realign all of the k-vectors to be parallel to each other. Since laser beams
are not infinitely wide, the light is typically sent through the grating pair twice
to undo the tendency of the different frequency components becoming laterally
190 Chapter 7 Superposition of Quasi-Parallel Plane Waves
separated. In the present analysis, we will consider an infinitely wide plane wave
pulse incident upon grating. The scenario is depicted in Fig. 7.19: A short plane
wave pulse strikes the grating at an angle, and a spreading pulse emerges.
Consider a plane-wave pulse that ricochets between a pair of parallel grating
surfaces. Although different k-vectors point with different angles, they are all
straightened out upon diffracting from the second grating. For simplicity, we will
consider a pulse just before the first bounce and just after the second bounce,
even though we are interested in the dispersion that takes place between the
gratings. Therefore, we can consider all k-vectors as being parallel with each
other.
First Second
Grating Grating Consider the a plane wave incident on a grating at an incident angle θi with
respect to the grating normal (aligned with the x-axis in our coordinate system)
as depicted in Fig. 7.18. The plane wave diffracts from the first grating at an angle
θr (also referenced from the grating normal). This angle is governed by the grating
diffraction formula9 µ ¶
−1 2πc
θr (ω) = sin − sin θi (7.54)
ωd
where d is the grating groove spacing. By examining the geometry of the figure,
we see that the reflected k-vector is given by k = x̂ cos θr + ŷ sin θr ω/c.
¡ ¢
Figure 7.18 Direction of k-vector Suppose we know the pulse at a point r0 on the first grating. Next we choose a
between parallel gratings (top point r0 + ∆r on the second grating where we will determine the outgoing pulse.
view). Grating rulings run in and Since we are considering an infinitely wide plane-wave pulse, it doesn’t matter
out of the page.
where we choose that point as long as it lies on the surface of the second grating.
The waveform will be the same everywhere along the surface of the second gratin,
only its arrival time will trivially differ. For convenience, we might as well take the
second point to be r0 + ∆r = r0 + L x̂ as shown in Fig. 7.18.
The phase delay needed for (7.30) becomes
Lω
k (ω) · ∆r = cos θr (7.55)
c
We will express this as a Taylor-series expansion similar to (7.39) so that we can
perform the inverse Fourier transform analytically. We will approximate (7.55) as
so that we can take advantage of formula (7.46). To calculate the terms in this
expansion we will need the derivative of θr :
d θr
µ ¶ µ ¶
1 2πc 1 2πc
=q ¢2 − ω2 d = p − 2
dω ¡ 2πc
1 − sin2 θr ω d
1− ωd − sin θi
(7.57)
2πc sin θi + sin θr
=− 2 =−
ω d cos θr ω cos θr
dk L d θr
µ ¶
· ∆r = cos θr − ω sin θr
dω c dω
L sin θi + sin θr
µ ¶
= cos θr + sin θr (7.58)
c cos θr
L 1 + sin θr sin θi
µ ¶
=
c cos θr
and
∆r ω0
k 0 ≡ k|ω0 · = (7.60)
L c
∆r 1 + sin θr sin θi ¯¯
¯ ¯
d k ¯¯
v g−1 ≡ · = (7.61)
d ω ¯ω0 L c cos θr ¯
ω0
1 d 2 k ¯¯ ∆r (sin θi + sin θr )2 ¯¯
¯ ¯
α≡ · =− (7.62)
2 d ω2 ¯ω0 L 2cω cos3 θr ¯ω0
In the case of a Gaussian pulse, we can employ (7.46), where L takes the place of
z, and k 0 , v g−1 and α are defined by (7.60) – (7.62). The duration of the pulse is Figure 7.19 Animation showing a
controlled by (7.62) and the spacing between the gratings L. short plane-wave pulse diffracting
from a grating positioned along
the left edge of the frame.
Appendix 7.B Causality and Exchange of Energy with the
Medium
The group delay function indicates the average arrival of field energy to a point.
Since this is only part of the whole energy story, there is no problem when it
becomes superluminal. The overly rapid appearance of electromagnetic energy at
one point and its simultaneous disappearance at another point merely indicates
an exchange of energy between the electric field and the medium.10
We should not be dazzled by the magician who invites the audience to look
only at the field energy while energy transfers into and out of the ‘unwatched’
domain of the medium. Extra field energy seems to appear ‘prematurely’ down-
stream only if there is already non-zero field energy downstream to stimulate a
10 M. Ware, S. A. Glasgow, and J. Peatross, “Energy Transport in Linear Dielectrics,” Opt. Express 9,
519-532 (2001).
192 Chapter 7 Superposition of Quasi-Parallel Plane Waves
transfer of energy from the medium. The actual transport of energy is strictly
bounded by c; superluminal propagation of a signal front is impossible.
In accordance with Poynting’s theorem (2.51), the total energy density stored
in an electromagnetic field and in a medium is given by
Zt
¡ 0 ¢ ∂P r, t 0
¡ ¢
u med (r, t ) = E r, t · dt0 (7.64)
∂t 0
−∞
The expression (7.63) for the energy density includes all (relevant) forms of energy,
including a non-zero integration constant u (r, −∞) corresponding to energy
stored in the medium before the arrival of any pulse (important in the case of an
amplifying medium). u field (r, t ) and u med (r, t ) are both zero before the arrival of
the pulse (i.e. at t = −∞). In addition, u field (r, t ), given by (2.53), returns to zero
after the pulse has passed (i.e. at t = +∞).
As u med increases, the energy in the medium increases. Conversely, as u med
decreases, the medium surrenders energy to the electromagnetic field. While it is
possible for u med to become negative, the combination u med + u (−∞) (i.e. the net
energy in the medium) can never go negative since a material cannot surrender
more energy than it has to begin with.
Poynting’s theorem (2.51) has the form of a continuity equation which when
integrated spatially over a small volume V yields
∂
I Z
S · da = − u dV (7.65)
∂t
A V
where the left-hand side has been transformed into an surface integral (via the
divergence theorem (0.11)) representing the power leaving the volume. Let the
volume be small enough to take S to be uniform throughout V .
We can define an energy transport velocity (directed along S) as the effective
speed at which all of the energy density would need to travel in order to achieve
the Poynting flux:
S
vE ≡ (7.66)
u
Note that this ratio of the Poynting flux to the energy density has units of velocity.
When the total energy density u is used in computing (7.66), the energy transport
velocity has a fictitious nature; it is not the actual velocity of the total energy
(since part is stationary), but rather the effective velocity necessary to achieve
the same energy transport that the electromagnetic flux alone delivers. If we
reduce the denominator to the subset of the energy that can move, namely u field ,
the Cauchy-Schwartz inequality (i.e. α2 + β2 ≥ 2αβ) ensures an energy transport
7.B Causality and Exchange of Energy with the Medium 193
velocity v E remains strictly bounded by the speed of light in vacuum c. The total
energy density u is at least as great as the field energy density u field . Hence, this
strict luminality is maintained.
Centroid of Energy
∂ 〈r〉 ru d 3 r
R
〈vE 〉 = where 〈r〉 ≡ R (7.69)
∂t u d 3r
The latter expression represents the ‘center-of-mass’ or centroid of the total en-
ergy in the system, which is guaranteed to evolve strictly luminally since vE is
everywhere luminal.11
We can use this to express u med in terms of the electric field and material suscepti-
bility.
11 Although (7.69) guarantees that the centroid of the total energy moves strictly luminally, there is
no such limitation on the centroid of field energy alone. The steps leading to (7.69) are not possible
if u field is used in place of u. Explicitly, that is
∂ ru field d 3 r
¿ À R
S
6=
∂t u field d 3 r
R
u field
As was pointed out, the left-hand side is strictly luminal. However, the right-hand side can easily
exceed c as the medium exchanges energy with the field. In an amplifying medium, for example, the
rapid appearance of a pulse downstream can occur when the leading portion of a pulse stimulates
energy already present in the medium to convert to the form of field energy. Group velocity is
related to this method of accounting, which is why it also can become superluminal.
194 Chapter 7 Superposition of Quasi-Parallel Plane Waves
The field E(r, t ) can be expressed as an inverse Fourier transform (7.18). Similarly,
the polarization P can be written as12
Z∞ Z∞
1 ∂P(r, t ) −i
P(r, t ) = p P (r, ω) e −i ωt d ω ⇒ =p ωP (r, ω) e −i ωt d ω (7.71)
2π ∂t 2π
−∞ −∞
Z∞ Z∞ Z∞
p1
0 0 −i ² 0
E r, ω0 e −i ω t d ω0 · p ωχ (r, ω) E (r, ω) e −i ωt d ω d t 0
¡ ¢ 0
u med (r, ∞) =
2π 2π
−∞ −∞ −∞
(7.72)
where we have incorporated (7.70) and evaluated u med after the pulse is over at
t = ∞. We may change the order of integration and write
Z∞ Z∞ Z∞
¢ 1 0 0
u med (r, ∞) = −i ²0 d ωωχ (r, ω) E (r, ω) · d ω0 E r, ω0 e −i (ω+ω )t d t 0
¡
2π
−∞ −∞ −∞
(7.73)
The final integral is a delta function a delta function similar to (0.54), which allows
the middle integral also to be performed. The expression for u med then reduces to
Z∞
u med (r, ∞) = −i ²0 ωχ (r, ω) E (r, ω) ·E (r, −ω) d ω (7.74)
−∞
In this derivation, we take E(r, t ) and P(r, t ) to be real functions, so we can employ
the symmetry (7.29) along with
Then we obtain
Z∞
u med (r, ∞) = ²0 ωImχ (r, ω) E (r, ω) · E∗ (r, ω) d ω (7.75)
−∞
The expression (7.75) describes the net energy density transfered to a point
in the medium after all action has finished (i.e. at t = ∞). It involves the power
spectrum of the pulse. We can modify this formula in an intuitive way so that it
describes the transfer of energy density to the medium for any time during the
pulse.
Since the medium is unable to anticipate the spectrum of the entire pulse
before experiencing it, the material responds to the pulse according to the history
of the field up to each instant. In particular, the material has to be prepared for
12 We assume that the real forms of the fields in the time domain are used for the sake of this
multiplication.
7.B Causality and Exchange of Energy with the Medium 195
the possibility of an abrupt cessation of the pulse at any moment, in which case
all exchange of energy with the medium immediately ceases. In this extreme sce-
nario, there is no possibility for the medium to recover from previously incorrect
attenuation or amplification, so it must have gotten it right already.
If the pulse were in fact to abruptly terminate at a given instant, it would
not be necessary to integrate the inverse Fourier transform (7.19) beyond the
termination time t after which all contributions are zero. Causality requires that
the medium be indifferent to whether a pulse actually terminates if that possibility
lies in the future. Therefore, (7.75) can apply for any time t (not just for t = ∞)
if the spectrum (7.19) is evaluated just for that portion of the field previously
experienced by the medium (up to time t ).
The following is then an exact representation for the energy density (7.64)
transferred to the medium:
Z∞
u med (r, t ) = ²0 ωImχ (r, ω) Et (r, ω) · E∗t (r, ω) d ω (7.76)
−∞
where
Zt
1
E r, t 0 e i ωt d t 0
0
E t (r, ω) ≡ p
¡ ¢
(7.77)
2π
−∞
This time dependence enters only through Et (r, ω) · E∗t (r, ω), known as the instan-
taneous power spectrum.
The expression (7.76) gives physical insight into the manner in which causal
dielectric materials exchange energy with different parts of an electromagnetic
pulse. Since the function E t (ω) is the Fourier transform of the pulse truncated
at the current time t and set to zero thereafter, it can include many frequency
components that are not present in the pulse taken in its entirety. This explains
why the medium can respond differently to the front of a pulse compared to the
back. Even though absorption or amplification resonances may lie outside of
the spectral envelope of a pulse taken in its entirety, the instantaneous spectrum Figure 7.20 Real and imaginary
on a portion of the pulse can momentarily lap onto or off of resonances in the parts of the refractive index for an
medium. amplifying medium.
In view of (7.76) and (7.77) it is straightforward to predict when the electro-
magnetic energy of a pulse will exhibit superluminal or subluminal behavior. In
section 7.5, we saw that this behavior is controlled by the group velocity function.
However, with (7.76) and (7.77), it is not necessary to examine the group velocity
directly, but only the imaginary part of the susceptibility χ (r, ω).
If the entire pulse passing through point r has a spectrum in the neighborhood
of an amplifying resonance, but not on the resonance, superluminal behavior
can result. The instantaneous spectrum during the front portion of the pulse is
generally wider and can therefore lap onto the nearby gain peak. The medium
accordingly amplifies this perceived spectrum, and the front of the pulse grows.
The energy is then returned to the medium from the latter portion of the pulse
as the instantaneous spectrum narrows and withdraws from the gain peak. The
196 Chapter 7 Superposition of Quasi-Parallel Plane Waves
effect is not only consistent with the principle of causality, it is a direct and general
consequence of causality as demonstrated by (7.76) and (7.77). p
As an illustration, consider the broadband waveform with T2 = γ−1 / 2 de-
scribed in Example 7.8. Consider an amplifying medium with index shown in
Fig. 7.20 with the amplifying resonance (negative oscillator strength) set on the
frequency ω0 = ω̄0 + 2γ, where ω̄0 is the carrier frequency. Thus, the resonance
structure is centered a modest distance above the carrier frequency, and there is
only minor spectral overlap between the pulse and the resonance structure.
Superluminal behavior can occur in amplifying materials when the forward
edge of a narrow-band pulse receives extra amplification. Fig. 7.21 shows how the
early portion of a pulse has a wide instantaneous spectrum computed by (7.77)
that can lap onto the amplifying resonance. As the wings grow and access the
neighboring resonance, the pulse extracts more energy from the medium. As the
wings diminish, the pulse surrenders much of that energy back to the medium,
which shifts the center of the pulse forward.
In this appendix we have indirectly proven that a sharply defined signal edge
cannot propagate faster than c. If a signal edge begins abruptly at time t 0 , the
Figure 7.21 Animation of a nar- instantaneous spectrum E t (ω) clearly remains identically zero until that time. In
rowband pulse traversing an am- other words, no energy may be exchanged with the medium until the field energy
plifying medium off resonance. from the pulse arrives. Since, as was pointed out in connection with (7.66), the
The black dot shows the move-
Cauchy-Schwartz inequality prevents the field energy from traveling faster than c,
ment of the center of all energy.
at no point in the medium can a signal front exceed c.
The red line inside the medium
shows the energy held in that
medium, which cannot go nega-
tive. The lower figure shows the
Appendix 7.C Kramers-Kronig Relations
instantaneous spectrum of the
In the late 1920s, of Ralph Kronig and Hendrik Kramers independently discovered
pulse at the front of the medium
relative to the narrow amplifying a remarkable relationship between the real and imaginary parts of a material’s
resonance. susceptibility χ (ω). Recall that the susceptibility as defined in (2.16) relates the
polarization of a material to the field that stimulates the medium:
They made an argument based on causality (i.e. effect cannot precede cause),
which allows one to obtain the real part of χ (ω) from the imaginary part of χ (ω),
if it is known for all ω. Similarly, one can obtain the imaginary part of χ (ω) from
the real part of χ (ω). We develop the Kramers-Kronig formulas below.13
We can replace E (ω) in (7.78) with the Fourier transform of E (t ) in accordance
with (7.19). In addition, we take the inverse Fourier transform (7.19) of both sides
of (7.78) and obtain
Z∞ Z∞
²0 1
E t 0 e i ωt d t 0 e −i ωt d ω
0
χ (ω) p
¡ ¢
P (t ) = p (7.79)
2π 2π
−∞ −∞
13 See J. D. Jackson, Classical Electrodynamics, 3rd ed., Sect. 7.10 (New York: John Wiley, 1999).
Also B. Y.-K. Hu, “Kramers-Kronig in two lines,” Am. J. Phys. 57, 821 (1989).
7.C Kramers-Kronig Relations 197
Z∞
∞
²0
Z
E t χ (ω) e −i ω(t −t ) d ω d t 0
¡ 0¢ 0
P (t ) = (7.80)
2π
−∞ −∞
Now for the causality argument: The polarization of the medium P (t ) cannot
depend on the field E t 0 at future times t 0 > t . Therefore the expression in square
¡ ¢
brackets must be identically zero unless t − t 0 > 0. This places a restriction on the
functional form of χ (ω) as we shall see.
The causality argument comes explicitly into play when we employ the fol-
lowing integral formula:14
Z∞ 0 0
−i ω(t −t 0 ) 0 1 e−iω (t−t ) 0
e = sign{t − t } dω (7.81)
iπ ω − ω0
−∞
+1 (t > t 0 )
½
Apparently, we require the positive sign since sign{t − t0 } ≡ .
−1 (t < t 0 )
Upon substitution of (7.81) into (7.80) and after changing the order of integra-
tion within the square brackets we obtain
Z∞
∞
Z∞
²0 1 χ (ω)
Z
d ω e −i ω (t −t ) d ω0 d t 0
¡ 0¢ 0 0
P (t ) = E t (7.82)
2π iπ ω − ω0
−∞ −∞ −∞
Z∞
χ ω0
¡ ¢
1
χ (ω) = d ω0 (7.83)
iπ ω0 − ω
−∞
or
Z∞
Reχ ω0 + i Imχ ω0
¡ ¢ ¡ ¢
1
Reχ (ω) + i Imχ (ω) = d ω0 (7.84)
iπ ω0 − ω
−∞
Finally, equating separately the real and imaginary parts of the above equation
yields
Z∞ Z∞
Imχ ω0 Reχ ω0
¡ ¢ ¡ ¢
1 0 1
Reχ (ω) = dω and Imχ (ω) = − d ω0 (7.85)
π ω0 − ω π ω0 − ω
−∞ −∞
14 This integral, which is a specific instance of Cauchy’s theorem, is tricky because it involves two
diverging pieces, to either side of the singularity ω = ω0 . The divergences have opposite sign so that
they cancel. The integration must approach the singularity in the same manner from either side, in
which case the result is called the principal value. In practical terms, if the integral is performed
numerically, the sampling of points should straddle the singularity symmetrically; other sampling
schemes can change the result dramatically, which is incorrect.
198 Chapter 7 Superposition of Quasi-Parallel Plane Waves
These are known as the Kramers-Kronig relations on real and imaginary parts of
χ.15 If the real part of χ is known at all frequencies, we can use the Kramers-Kronig
relations to generate the imaginary part, and visa versa. We see that the real and
imaginary parts of χ cannot be chosen independently, if we are to respect the
principle of causality.
Example 7.9
Show that the expression in square brackets of (7.80) is zero when t 0 > t , if χ (ω)
satisfies the Kramers-Kronig relations (7.85).
Z∞ Z∞ Z∞
0 0 0
χ (ω) e −i ω(t −t ) d ω = Reχ (ω) e −i ω(t −t ) d ω + i Imχ (ω) e −i ω(t −t ) d ω
−∞ −∞ −∞
Z∞ Z∞ Z∞
Reχ ω0
¡ ¢
t −t 0 − 1
0
= Reχ (ω) e −i ω( )d ω + i d ω0 e −i ω(t −t ) d ω
π ω0 − ω
−∞ −∞ −∞
Z∞ Z∞ Z∞ −i ω(t −t 0 )
t −t 0 1 e
Reχ (ω) e −i ω( )d ω + Reχ ω0 d ω d ω0
¡
¢
=
iπ ω0 − ω
−∞ −∞ −∞
(7.86)
where we have invoked the Kramers-Kronig relation for Imχ (ω) (7.85) and inter-
changed the order of integration in the final expression. Since we are specifically
considering future times t 0 > t , we have by (7.81)
Z∞ 0
1 e −i ω(t −t ) 0 0
d ω = −e −i ω (t −t )
iπ ω0 − ω
−∞
Hence
Z∞ Z∞ Z∞
−i ω(t −t 0 ) −i ω(t −t 0 ) 0 0
χ (ω) e d ω = Reχ (ω) e dω − Reχ ω0 e −i ω (t −t ) d ω0
¡ ¢
−∞ −∞ −∞
=0
(7.87)
Finally, it is worth noting that the Kramers-Kronig relations also apply to the
15 As with (7.81), the principal value of the integral must be calculated. If the integral is performed
numerically, the sampling of points should straddle the singularity symmetrically. Separately, the
integral on each side of ω0 = ω diverges, but with opposite sign.
7.C Kramers-Kronig Relations 199
Z∞ Z∞
κ ω0 n ω0 − 1
¡ ¢ ¡ ¢
1 0 1
n (ω) − 1 = dω and κ (ω) = − d ω0 (7.88)
π ω0 − ω π ω0 − ω
−∞ −∞
One can use the Kramers-Kronig relations to find the real part of the index from
a measurement of absorption, if the measurement is done over a broad enough
range of the spectrum. This is the most useful form of the Kramers-Kronig rela-
tions.
It is sometimes convenient to multiply the numerator and denominator inside
the integrands of (7.88) by ω0 + ω. Then noting that n is an even function and
κ is an odd function allows us to dismiss either ω0 or ω in the numerator and
integrate17 over positive frequencies only:
Z∞ Z∞ ¡ 0 ¢
ω0 κ ω0 n ω −1
¡ ¢
2 0 2ω
n (ω) − 1 = dω and κ (ω) = − d ω0 (7.89)
π ω02 − ω2 π ω02 − ω2
0 0
16 This follows from Cauchy’s theorem since the index (subtract one) is the square root of χ (ω).
The Kramers-Kronig relations for χ (ω) guarantee that χ (ω) has no poles in the upper half complex
plane, when ω is considered (for mathematical purposes) to be a complex variable. Taking the
square root does not introduce poles into the upper half plane.
17 The integrals (7.88) and (7.89) diverge to either side of ω0 = ω, but with opposite sign. Again,
the principal value of the integral is required, which means a numeric grid should straddle the
singularity symetrically.
200 Chapter 7 Superposition of Quasi-Parallel Plane Waves
Exercises
P7.2 Equation (7.7) implies that there is no interference between fields that
are polarized along orthogonal dimensions. That is, the intensity of
Exercises for 7.2 Group vs. Phase Velocity: Sum of Two Plane Waves
(k 1 + k 2 ) (ω1 + ω2 )
k̄ ≡ and ω̄ ≡
2 2
P7.7 The intensity of a Gaussian laser pulse has a FWHM duration TFWHM =
25 fs with carrier frequency ω0 corresponding to λvac = 800 nm. The
pulse goes through a lens of thickness ` = 1 cm (laser quality glass type
BK7) with index of refraction given approximately by
ω
n (ω) ∼
= 1.4948 + 0.016
ω0
TFWHM
T= p
2 ln 2
(see P7.6).
P7.8 If the pulse defined in (7.46) travels through the material for a very long
distance z such that T̃ (z) → T Φ (z) and tan−1 Φ (z) → π/2, show that
the instantaneous frequency of the pulse is
t − 2z/v g
ω0 +
4αz
202 Chapter 7 Superposition of Quasi-Parallel Plane Waves
COMMENT: As the wave travels, the earlier part of the pulse oscillates
more slowly than the later part. This is called chirp, and it means that
the red frequencies get ahead of the blue ones since they experience a
lower index.
P7.10 When the spectrum is very broad the reshaping delay (7.53) also tends
to zero and can be ignored. Show that when the spectrum is extremely
broad, the net group delay reduces to
∆r
lim ∆tG (r) =
T →0 c
assuming k and ∆r are parallel. This implies that a sharply defined
signal cannot travel faster than c.
HINT: The real index of refraction n goes to unity far from resonance,
and the imaginary part κ goes to zero.
Coherence Theory
Coherence theory is the study of correlations that exist between different parts of
a light field. In temporal coherence theory, we focus on the correlation between
the fields at different times, E(r, t ) and E(r, t + τ). In spatial coherence theory,
we focus on the correlations between fields at different spatial locations, E(r, t )
and E(r + ∆r, t ). Because light oscillations are too fast to resolve directly, we
usually need to study optical coherence using interference techniques. In these
techniques, light from different times or places in the light field are brought
together at a detection point. If the two fields have a high degree of coherence,
they consistently interfere either constructively or destructively at the detection
point. If the two fields are not coherent, the interference at the detection point
rapidly fluctuates between constructive and destructive interference, so that a
time-averaged signal does not show interference.
You are probably already familiar with two instruments that measure coher-
ence: the Michelson interferometer, which measures temporal coherence, and
Young’s two-slit interferometer, which measures spatial coherence. Your pre-
liminary understanding of these instruments was probably gained in terms of
single-frequency plane waves, which are perfectly coherent for all separations in
time and space. In this chapter, we build on that foundation and derive descrip-
tions that are appropriate when light with imperfect coherence is sent through
these instruments. We also discuss a practical application known as Fourier spec-
troscopy (Section 8.4) which allows us to measure the spectrum of light using a
Michelson interferometer rather than a grating spectrometer.
Beam
Splitter
8.1 Michelson Interferometer
A Michelson interferometer employs a 50:50 beamsplitter to divide an initial
beam into two identical beams and then delays one beam with respect to the
other before bringing them back together (see Fig. 8.1). Depending on the relative
Detector
path difference d (roundtrip by our convention) between the two arms of the
system, the light can interfere constructively or destructively in the direction of Figure 8.1 Michelson interferome-
the detector. The relative path difference d introduces a relative time delay τ, ter.
203
204 Chapter 8 Coherence Theory
defined by τ ≡ d /c.
If the input light is a plane-wave, the net field at the detector consists of the
field coming from one arm of the interferometer E0 e i (kz−ωt ) added to the field
coming from the other arm E0 e i (kz−ω(t −τ)) . These two fields are identical except
for the delay τ. The intensity seen at the detector as a function of path difference
is computed to be
c²0 h i h i∗
I tot (τ) = E0 e i (kz−ωt ) + E0 e i (kz−ω(t −τ)) · E0 e i (kz−ωt ) + E0 e i (kz−ω(t −τ))
2
Figure 8.2 The intensity seen at c²0 £
2E0 · E∗0 + 2E0 · E∗0 cos(ωτ)
¤
=
the detector of a Michelson in- 2
terferometer with a plane-wave = 2I 0 [1 + cos(ωτ)]
input. Because the plane wave is (Plane Wave Input) (8.1)
infinitely coherent, the output os- c²0 ∗
where I 0 ≡ 2 E0 · E0 is the intensity from one beam alone (when the other arm of
cillates forever in both directions.
the interferometer is blocked). This formula is probably familiar. It describes how
Energy is conserved, so when the
intensity at the detector is zero,
the intensity at the detector oscillates between zero and four times the intensity
all of the input light is being sent of the beam from one arm when the other is blocked,1 as plotted in Fig. 8.2.
back on the input arm of the inter- When light containing a continuous band of frequencies is sent through the
ferometer. interferometer, (8.1) no longer holds. Instead of repeating indefinitely, the oscilla-
tions in the intensity at the detector become less pronounced as τ increases. The
concept of temporal coherence describes how fast fringe visibility diminishes as
delay is introduced in an arm of the Michelson interferometer. The less coherent
the light source, the faster the fringes die out as τ is increased. To model this
behavior, we need to expand our analysis beyond (8.1).
Consider an arbitrary waveform E(t ) (comprised of many frequency compo-
nents) that has traveled through the first arm of a Michelson interferometer to
arrive at the detector in Fig. 8.1. Again, E(t ) is the value of the field at the detector
when the second arm is blocked. The beam that travels through the second arm
of the interferometer is identical, but delayed by the round-trip delay τ: E (t − τ).
The total field at the detector is the sum of these two fields:
Etot (t , τ) = E (t ) + E (t − τ) (8.2)
The total intensity I tot at the detector is found using (7.21) with n = 1:
c²0
I tot (t , τ) = Etot (t , τ) · E∗tot (t , τ)
2
c²0 £
E(t ) · E∗ (t ) + E(t ) · E∗ (t − τ) + E(t − τ) · E∗ (t ) + E(t − τ) · E∗ (t − τ)
¤
=
2
c²0 £
= I (t ) + I (t − τ) + E(t ) · E∗ (t − τ) + E(t − τ) · E∗ (t )
¤
2
= I (t ) + I (t − τ) + c²0 Re E(t ) · E∗ (t − τ)
© ª
(8.3)
The function I (t ) corresponds to the intensity of one of the beams arriving at the
detector while the opposite path of the interferometer is blocked.
1 Keep in mind that if a 50:50 beam splitter is used, then the intensity arriving to the detector
from one arm alone (with other arm blocked) is one fourth of the original beam, since the light
meets the beam splitter twice.
8.1 Michelson Interferometer 205
For now we treat E (t ) as a pulse with a finite duration and energy to simplify
the math. Later we illustrate how to adapt this analysis for continuous light
sources. In (8.3) we have retained the t dependence of I tot (t , τ) in addition to the
dependence on the path delay τ. This allows for pulses with arbitrary duration
and shape. The rapid oscillations of the light are automatically averaged away in
I (t ) since we used (7.21), but the slowly varying envelope of the pulse is retained.
For a pulsed source, the physical signal from a Michelson interferometer is
proportional to the total amount of pulse energy arriving at the detector as a
function of τ.2 This physical signal, which we’ll denote by Sig(τ), is proportional
to the total energy per area, or fluence, accumulated at the detector:
Z∞
Sig(τ) ∝ Itot (t, τ) dt (8.4)
−∞
The proportionality constant will depend on the area of the beam, as well as the
units with which the detector reports Sig(τ) (volts, etc.). We can manipulate the Albert Abraham Michelson (1852
fluence integral in (8.4) into a more useful form that will make the coherence 1931, United States) was born in
Poland, but he immigrated to the
properties more evident. US with his parents and grew up in
the rough mining towns of California
and Nevada where his father was a
pulse since a detector cannot keep up with temporal variations on such a rapid time scale. For
longer pulses, it may be necessary to force the integration.
3 Note that the second integral is insensitive to τ since a change of variables t 0 = t − τ converts it
p p
simplifies to 2πE (ω) · E∗ (ω) = 2π2I (ω) /c²0 . Then with the aid of (8.6) and (8.7),
the overall fluence (8.5) becomes
Z∞ Z∞
1
I tot (t , τ) d t = 2E 1 + Re I (ω)e −i ωτ d ω (8.8)
E
−∞ −∞
With (8.8), we can rewrite the physical signal (8.4) in the more useful form
Sig(τ) ∝ 2E 1 + Re γ (τ)
£ © ª¤
(8.9)
where the dependence on the path delay τ is entirely contained in the degree of
coherence function γ (τ):4
R∞
I (ω) e −i ωτ d ω
−∞
γ (τ) ≡ (8.10)
R∞
I (ω) d ω
−∞
The denominator of (8.10) was rewritten with the help of Parseval’s theorem
R∞ R∞
E ≡ −∞ I (t )d t = −∞ I (ω) d ω. Remarkably, the signal out of the Michelson inter-
ferometer does not depend on the phase of E (ω). It depends only on the amount
of light associated with each frequency through I (ω) ≡ ²20 c E (ω) · E∗ (ω).
We could have derived (8.9) using another strategy, which may seem more intuitive
than the approach above. Equation (8.1) gives the intensity at the detector when a
single plane wave of frequency ω goes through the interferometer. Now suppose
that a waveform composed of many frequencies is sent through the interferometer.
The intensity associated with each frequency acts independently, obeying (8.1)
individually.
The total energy (per area) accumulated at the detector is then a linear superposi-
tion of the spectral intensities of all frequencies present:
Z∞ Z∞
I tot (ω, τ) d ω = 2I (ω) [1 + cos (ωτ)] d ω (8.11)
−∞ −∞
While this procedure may seem obvious, the fact that we can do it is remarkable!
Remember that it is usually the fields that we must add together before finding the
intensity of the resulting superposition. The formula (8.11) with its superposition
of intensities relies on the fact that the different frequencies inside the interferom-
eter when time-averaged (over all time) do not interfere. Certainly, the fields at
different frequencies do interfere (or beat in time). However, they constructively
interfere as often as they destructively interfere, and in a time-averaged picture it
is as though the individual frequency components transmit independently. Again,
4 M. Born and E. Wolf, Principles of Optics, 7th ed., p. 570 (Cambridge University Press, 1999).
8.1 Michelson Interferometer 207
This is the same as (8.8) since we can replace cos(ωτ) with Re e −i ωτ , and we can
© ª
apply Parseval’s theorem (8.6) to the other integrals. Thus, the above arguments
lead to (8.9) and (8.10).
Example 8.1
Compute the output signal when a Gaussian pulse with spectrum (7.25) is sent
into a Michelson interferometer.
The degree of coherence (8.10) is then Figure 8.3 The output or signal
Z∞ from a Michelson interferometer
T 2
(ω−ω0 )2 −i ωτ for light with a Gaussian spec-
γ (τ) = p e −T e dω
π trum.
−∞
Z∞ 2 ω −i τ 2
π (2T )
r
T −T 2 ω2 +(2T 2 ω0 −i τ)ω−T 2 ω20 T 0
−T 2 ω20
=p e dω = p e 4T 2
π π T2
−∞
τ2
e −i ω0 τ
−
=e 4T 2
Formula (0.55) was used to complete the integration. According to (8.9), the signal
at the detector is then
2
· ¸
− τ2
Sig(τ) ∝ 2E 1 + Re γ (τ) = 2E 1 + e cos (ω0 τ)
£ © ª¤
4T
Figure 8.3 shows this signal for a given T . As delay is added (or subtracted), the
output signal oscillates. Eventually enough delay is introduced such that the
very short pulses no longer interfere (arriving sequentially), and the output signal
becomes steady.
5 Technically, the output intensity is one fourth this, but our calculation of the degree of coher-
The coherence length is the distance that light travels in this time:
`c ≡ cτc (8.14)
V (τ) = ¯γ (τ)¯
¯ ¯
(8.16)
Note that the fringe visibility depends only upon the frequency content of the
light without regard to whether the frequency components are organized into a
short pulse or a longer time pattern.
6 M. Born and E. Wolf, Principles of Optics, 7th ed., p. 570 (Cambridge University Press, 1999).
8.3 Temporal Coherence of Continuous Sources 209
Example 8.2
Find the fringe visibility and the coherence time for the Gaussian pulse studied in
Example 8.1.
This is shown as the dashed line in Fig. 8.4. As expected, the fringe visibility dies
off as delay τ gets farther from the origin, the point where the interferometer arms
are equidistant. From (8.13) the coherence time is
Z∞ Z∞ Figure 8.4 Re γ(τ) (solid) and
© ª
2
− τ2 p
τc = ¯γ (τ)¯2 d τ = d τ = 2πT
¯ ¯
e 2T |γ(τ)| (dashed) for a light pulse
−∞ −∞ with a Gaussian spectrum as in
which is the delay necessary to cause the fringes to substantially diminish. examples 8.1 and 8.2.
The duration T must be large enough to average over any fluctuations that are
present in the light source. The average in (8.17) should not be used on a pulsed
light source since the result would depend on the duration T of the temporal
window.
For a continuous light source, the signal at the detector (8.9) becomes
£ ¤
Sig(τ) ∝ 2 〈I (t )〉t 1 + Reγ (τ) (continuous source) (8.18)
Although technically the integrals used in (8.10) to compute γ (τ) also diverge
in the case of continuous light, the numerator and the denominator diverge in
the same way. Therefore, we may renormalize I (ω) in any way we like to deal
with this problem. Both the numerator and denominator of (8.10) contain I (ω),
so regardless of how large I (ω) is or what units the measurement gives (volts
or whatever), we can just plug the instrument reading directly into (8.10). The
units in the numerator and denominator cancel so that γ (τ) always remains
dimensionless. Once we have the degree of coherence function γ(τ), we can
calculate the coherence time and fringe visibility just as we did for pulsed sources.
210 Chapter 8 Coherence Theory
Typically, the signal comes in the form of a voltage or a current from a sensor.
However, the signal can easily be normalized to the beam fluence. In particular,
for large τ the fringe visibility goes to zero (i.e. γ (τ) = 0), and the normalized
signal must approach
Z ∞
lim Sig (τ) = 2E = 2 I (t )d t (8.20)
τ→∞ −∞
We will assume that this normalization has taken place and write (8.19) as an
equality.
Given our measurement of Sig(τ), we would like to find the power spectrum
I (ω). Unfortunately, I (ω) is buried within an integral in (8.19). However, since the
integral looks like an inverse Fourier transform of I (ω), we will be able to extract
the desired spectrum after some manipulation. This procedure for extracting I (ω)
from an interferometric measurement is known as Fourier spectroscopy.7
Extracting I (ω)
The left-hand side is known since it is the measured data, and a computer can be
employed to take the Fourier transform of it. The first term on the right-hand side
is the Fourier transform of a constant:
Z∞ p
1
F {2E } = 2E p e i ωτ d τ = 2E 2πδ (ω) (8.22)
2π
−∞
Notice that (8.22) is zero everywhere except where ω = 0, where a spike occurs.
This represents the DC component of F Sig (τ) .
© ª
7 J. Peatross and S. Bergeson, “Fourier Spectroscopy of Ultrashort Laser Pulses,” Am. J. Phys. 74,
842-845 (2006).
8 This is weird since normally we take Fourier transforms on fields rather than expressions
involving intensity!
8.5 Young’s Two-Slit Setup and Spatial Coherence 211
Z∞ Z∞ Z∞ Z∞
1 0 1 0
p I (ω0 )e −i ω τ d ω0 e i ωτ d τ + p I (ω0 )e i ω τ d ω0 e i ωτ d τ
2π 2π
−∞ −∞ −∞ −∞
which we rearrange to
∞
Z∞ Z∞ Z∞
p Z
0 1 −i (ω0 −ω)τ 0 0 1 −i (ω0 +ω)τ 0
2π I (ω ) e d τ d ω + I (ω ) e d τ d ω
2π 2π
−∞ −∞ −∞ −∞
From (0.52) we note that the terms in parentheses are delta functions, so we have
∞
Z∞
p Z
2π I (ω0 )δ ω0 − ω d ω0 + I (ω0 )δ ω0 + ω d ω0
¡ ¢ ¡ ¢
−∞ −∞
The remaining frequency integrals can then be easily performed to obtain our final
form:
Z∞
p
F 2Re I (ω) e −i ωτ d ω = 2π [I (ω) + I (−ω)] (8.23)
−∞
F Sig (τ)
© ª
p = 2E 0 δ (ω) + I (ω) + I (−ω) (8.24)
2π
The Fourier transform of the measured signal is seen to contain three terms, one
of which is the power spectrum I (ω) that we are after. Fortunately, when graphed
as a function of ω (shown in Fig. 8.5), the three terms on the right-hand side
typically do not overlap. As a reminder, the measured signal as a function of τ
looks something like that in Fig. 8.3. The oscillation frequency of the fringes lies
in the neighborhood of ω0 . The procedure to obtain I (ω) is (1) Record Sig (τ); (2)
if desired, normalize by its value at large τ; (3) take its Fourier transform; and (4) A graphical depiction of
Figure 8.5±p
F {Sig(τ)} 2π .
extract the curve at positive frequencies.
Fringe Pattern
Point Source
Figure 8.6 A point source produces coherent (locked phases) light. When this light
which traverses two slits and arrives at a screen it produces a fringe pattern.
Depending on the coherence of the light entering each slit, the fringe pattern
observed can exhibit good or poor visibility. Just as the Michelson interferometer
is sensitive to the spectral content of light, the Young’s two-slit setup is sensitive
to the spatial extent of the light source illuminating the two slits. For example, if
light from a distant star (restricted by a filter to a narrow spectral range) is used to
illuminate a double-slit setup, the resulting interference pattern appearing on a
subsequent screen shows good or poor fringe visibility depending on the angular
width of the star. Michelson was the first to use this type of setup to measure the
angular width of stars.
Light emerging from a single ideal point source has wave fronts that are
spatially uniform in a lateral sense (see Fig. 8.6). Such wave fronts are said to be
spatially coherent, even if the temporal coherence is not perfect (i.e. if a range
of frequencies is present). When spatially coherent light illuminates a Young’s
two-slit setup, fringes of maximum visibility are seen at a distant screen, meaning
the fringes vary between a maximum intensity and zero.
Consider a Young’s two-slit setup illuminated by a single point source. We
represent the fields on a subsequent screen that transmit through each slit, re-
spectively, as E0 e i (kd1 −ωt ) and E0 e i (kd2 −ωt ) . We have assumed that the slits are
equidistant from the point source and that the two fields at the screen are identi-
cal other than for their phases. In close analogy with (8.1), the resulting intensity
pattern on a far-away screen is
£ ¡ ¢¤
I tot (h) = 2I 0 [1 + cos (kd 2 − kd 1 )] = 2I 0 1 + cos kh y/D (8.25)
Notice the close similarity between this expression and the output from a Michel-
son interferometer for a plane wave (8.1). We will consider h (the separation of
the slits) to be the counterpart of τ (the delay introduced by moving a mirror in
the Michelson interferometer). To obtain the final expression in (8.25) we made
8.5 Young’s Two-Slit Setup and Spatial Coherence
Extended Source 213
Fringe Pattern
Figure 8.7 Light from an extended source is only partially coherent. Fringes are still
possible, but they exhibit less contrast.
and
s ¡ ¢2 " ¡ ¢2 #
y + h/2 y + h/2
q
¢2
∼
¡ ¢ ¡
d2 y = y + h/2 + D2 =D 1+ = D 1+ + · · · (8.27)
D2 2D 2
Instead, in this section we consider a source such as the surface of a star (filtered to a narrow
frequency range). See appendix 8.B for more discussion.
10 The results can be generalized to a two-dimensional source.
11 Random phase fluctuations necessarily imply some frequency bandwidth, however small.
screen at y, both originating from the point y 0j , but traveling respectively through
the two different slits. We assume that these fields have the same polarization,
and we will suppress the vectorial nature of the fields. For simplicity, we assume
the two fields have the same (real) amplitude at the screen E 0 (y 0j ). Thus, we write
the two fields as
n h i o
0 0
0 0 i k r 1 (y j )+d 1 (y) −ωt +φ(y j )
E 1 (y j ) = E 0 (y j )e (8.28)
and n h i o
i k r 2 (y 0j )+d 2 (y) −ωt +φ(y 0j )
E 2 (y 0j ) = E 0 (y 0j )e (8.29)
We have explicitly included an arbitrary phase φ(y 0j ), which we will take to be
different for each point source.
We now set about finding the cumulative field at y arising from the many
points indexed by the subscript j . The total field on the screen at point y is
Xh i
E tot (h) = E 1 (y 0j ) + E 2 (y 0j ) (8.30)
j
We have introduced
kh y 0
kh y j
e −i I (y 0j )e −i
P
D R
j
γ (h) ≡ (8.35)
I (y 0j )
P
j
which is known as the degree of coherence. It controls the fringe pattern seen at
the screen.
We can generalize (8.34) so that it applies to the case of a continuous distribu-
tion of light as opposed to a collection of discrete point sources. In Appendix 8.A
we show how summations in (8.34) and (8.35) become integrals over the source
intensity distribution, and we write
where
kh y R∞ kh y 0
e −i D I (y 0 )e −i R d y0
−∞
γ (h) ≡ (8.37)
R∞
I (y 0 )d y 0
−∞
and very good fringe visibility results. γ (h) dictates the degree of spatial coherence
in much the same way that γ (τ) dictates the degree of temporal coherence. Notice
the close similarity between (8.37) and (8.10).
As the slit separation h increases, the fringe visibility
V (h) = ¯γ (h)¯
¯ ¯
(8.38)
216 Chapter 8 Coherence Theory
Z∞
¯2
h c ≡ 2 ¯γ (h)¯ d h
¯
(8.39)
0
Z∞ Z∞
E 1 (y 0j ) → E 1 (y 0 )d y 0 0
E 1 (y 00 )d y 00
X X
and E 1 (y m ) →
j m
−∞ −∞
(8.40)
Z∞ Z∞
E 2 (y 0j ) → E 2 (y 0 )d y 0 0
E 2 (y 00 )d y 00
X X
and E 2 (y m ) →
j m
−∞ −∞
Rather than deal with a time average of randomly varying phases, we will instead
work with a linear superposition of all conceivable phase factors. That is, we will
write the phase φ(y 0 ) as K y 0 , where K is a parameter with units of inverse length,
which we allow to take on all possible real values with uniform likelihood. The
way we modify (8.32) for the continuous case is then
¿ h iÀ Z∞
i φ(y 0j )−φ(y m
0
) 1 0 00
e = δ j ,m → e i K (y −y ) d K = δ(y 00 − y 0 ) (8.41)
t 2π
−∞
Z∞ Z∞
kh y kh y 0
I tot (h) = 2 I (y 0 )d y 0 + 2Re e −i D I (y 0 )e −i R d y 0 (8.43)
−∞ −∞
where
1 ¯2
I (y 0 ) ≡ ²0 c ¯E (y 0 )¯
¯
(8.44)
2
8.B Van Cittert-Zernike Theorem 217
For I tot to have normal units of intensity, I (y 0 ) must have units of intensity per
length of source, implying that E (y 0 ) has units of field per square root of length.
R∞
Hence, −∞ I (y 0 )d y 0 is the intensity at the screen caused by the entire extended
source when only one slit is open. We see that (8.43) is equivalent to (8.36) and
(8.37).
¯ ∞ ¯2 ¯ ∞ ¯2
k y 02 kh y 0 k y 02 kh y 0
¯Z · ¸ ¯ ¯Z · ¸ ¯
0 ¯ i φ( y 0 )+i 2R −i 2R 0¯ 0 ¯ i φ(y 0 )+i 2R i 2R 0¯
¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯
I tot (h) ∝ ¯
¯ ¯E (y ) e e dy ¯ +¯ ¯ ¯ E (y ) e e dy ¯
−∞ −∞
¯ ¯ ¯ ¯
∞ ∞ ∗
k y 02 kh y 0 k y 02 kh y 0
kh y
Z · ¸ Z ·¯ ¡ ¢¯ ¸
0 0
¯E y 0 ¯ e i φ(y )+i 2R e −i 2R d y 0 ¯E y 0 ¯ e i φ( y )+i 2R e i 2R d y 0
+ 2Ree i D
¯ ¡ ¢¯
−∞ −∞
(8.45)
where we have employed (8.26) and (8.27) and similar expressions involving R
and y 0 .
The first term on the right-hand side of (8.45) is the intensity on the screen
when the lower slit is covered. The second term is the intensity on the screen
when the upper slit is covered. The last term is the interference term, which
modifies the sum of the individual intensities when both slits are uncovered.
Notice the occurrence of Fourier transforms (over position) on the quantities
inside of the square brackets. Later, when we study diffraction theory, we will
recognize these transforms as determining the strength of fields impinging on
the individual slits. This corresponds to a major difference between a spatially
coherent source and a random-phase source. With the random-phase source, the
slits are always illuminated with the same strength regardless of the separation.
However, with a coherent source, ‘beaming’ can occur such that the strength as
well as phase of the field at each slit depends on the slit separation.
A beautiful simplification occurs when the phase of the emitted light has the
following distribution:
k y 02
φ(y 0 ) = − (8.46) (converging spherical wave)
2R
Equation (8.46) is not as arbitrary as it may first appear. This particular phase
is an approximation to a concave spherical wave front converging to the center
between the two slits. This type of wave front is created when a plane wave passes
218 Chapter 8 Coherence Theory
through a lens. With the special phase (8.46), the intensity (8.45) reduces to
¯ ∞ ¯2 ¯ ∞ ¯2
¯Z ¯ ¯Z ¯
kh y 0 kh y 0
I tot (h) ∝ ¯¯ ¯E (y 0 )¯ e −i 2R d y 0 ¯¯ + ¯¯ ¯E y 0 ¯ e i 2R d y 0 ¯¯
¯ ¯ ¯ ¯ ¯ ¯ ¡ ¢¯ ¯
¯ ¯ ¯ ¯
−∞ −∞
(converging spherical wave) ∞ 2 (8.47)
kh y 0
kh y
Z ¯
i D ¯E (y 0 )¯ e −i 2R d y 0
¯
+ 2Ree
−∞
and the magnitude of the degree of coherence V = ¯γ (h/2)¯ from (8.37). Again,
¯ ¯
this corresponds to the field that goes through the upper slit, when it is positioned
at h/2, and which impinges on the screen. Let this field be denoted by |E 1 (h/2)|.
The field strength when the single slit is positioned at h compared to that when it
is positioned at zero is
¯ ∞ ¯
¯ R ¯ kh y 0 ¯
¯E (y 0 )¯ e −i R d y 0 ¯
¯
¯ ¯ ¯¯
¯ E 1 (h) ¯ ¯−∞ ¯
(converging spherical wave ¯=¯ (8.49)
¯ ¯
R∞ ¯
¯ E (0) ¯ ¯ ¯
1
¯
¯E (y 0 )¯ d y 0
¯
assumption) ¯
¯ −∞
¯
¯
This looks very much like ¯γ (h)¯ of (8.37) except that the magnitude of the field
¯ ¯
12 M. Born and E. Wolf, Principles of Optics, 7th ed., p. 574 (Cambridge University Press, 1999).
Exercises 219
Exercises
rise to fringes are due entirely to changes in φ and that ¯γ¯ is a slowly
varying function in comparison to the oscillations.
(b) What is the coherence time τc of the light in P8.4?
P8.2 (a) Show that the fringe visibility of a Gaussian spectral distribution
(see Example 8.2) goes from 1 to e −π/2 = 0.21 as the round-trip path in
one arm of the instrument is extended by a coherence length.
(b) Find the FWHM bandwidth in wavelength ∆λFWHM in terms of the
coherence length `c and the center wavelength λ0 .
HINT: First determine ∆ωFWHM , defined to be the width of I (ω) at half
of its peak. To convert to a wavelength difference, use ω = 2πc λ ⇒
∼ 2πc
∆ωFWHM = − 2 ∆λFWHM . You can ignore the minus sign; it simply
λ0
means that wavelength decreases as frequency increases.
P8.3 Show that Re{γ(τ)} defined in (8.10) reduces to cos (ω0 τ) in the case of
a plane wave E (t ) = E 0 e i (k0 z−ω0 t ) being sent through a Michelson inter-
ferometer. In other words, the output intensity from the interferometer
reduces to
I = 2I 0 [1 + cos (ω0 τ)]
as you already expect.
HINT: Don’t be afraid of delta functions. After integration, the left-over
delta functions cancel.
P8.4 Light emerging from a dense hot gas has a collisionally broadened
power spectrum described by the Lorentzian function
I (ω0 )
I (ω) = ´2
ω−ω0
³
1+ ∆ωFWHM /2
Perform the inverse Fourier transform on the field and find how the
intensity of the light looks a function of time.
HINT:
Z∞
e −i ax −2i πe i aβ
½
if a>0 ¡ ¢
dx = Imβ > 0
x +β 0 if a<0
−∞
P8.7 (a) A point source with wavelength λ = 500 nm illuminates two parallel
slits separated by h = 1.0 mm. If the screen is D = 2 m away, what is
the separation between the diffraction peaks on the screen? Make a
sketch.
13 J. Peatross and S. Bergeson, “Fourier Spectroscopy of Ultrashort Laser Pulses,” Am. J. Phys. 74,
842-845 (2006).
Exercises 221
(b) A thin piece of glass with thickness d = 0.01 mm and index n = 1.5 is
placed in front of one of the slits. By how many fringes does the pattern
at the screen move?
HINT: Add ∆φ to k (d 2 − d 1 ) in (8.25) , where ∆φ ≡ φ2 −φ1 is the relative
phase between the two paths. Compare the phase of the light when
traversing the glass versus traversing an empty region of the same
thickness.
L8.8 (a) Carefully measure the separation of a double slit in the lab (h ∼
0.1 mm separation) by shining a HeNe laser (λ = 633 nm) through it
and measuring the diffraction peak separations on a distant wall (say,
2 m from the slits).
HINT: For better accuracy, measure across several fringes and divide.
Double slit
Single slit separation h
Diffuser
width a Filter
Laser
CCD
Camera
Rotating diffuser
to create phase
variation
Figure 8.9
(b) Create an extended light source with a HeNe laser using a time-
varying diffuser followed by an adjustable single slit. (The diffuser
must rotate rapidly to create random time variation of the phase at
each point as would occur automatically for a natural source such
as a star.) Place the double slit at a distance of R ≈ 100 cm after the
first slit. (Take note of the exact value of R, as you will need it for the
next problem.) Use a lens to image the diffraction pattern that would
have appeared on a far-away screen into a video camera. Observe
the visibility of the fringes. Adjust the width of the source with the
single slit until the visibility of the fringes disappears. After making the
source wide enough to cause the fringe pattern to degrade, measure
the single slit width a by shining a HeNe laser through it and observing
the diffraction pattern on the distant wall. (video)
HINT: As we will study later, a single slit of width a produces an inten-
sity pattern on a screen a distance L away described by
³ πa ´
I (x) = I peak sinc2 x
λL
sin α sin α
where sinc (α) ≡ α and lim α = 1.
α→0
222 Chapter 8 Coherence Theory
NOTE: It would have been nicer to vary the separation of the two slits
to determine the width of a fixed source. However, because it is hard to
make an adjustable double slit, we varied the size of the source until
the spatial coherence of the light matched the slit separation.
a/2
y0
y −i kh
a/2 y a/2 y0
e −i kh D e
R
y0 y
h ³ ´i
d y0 e −i kh D e −i kh R d y0
R R
I 0 exp −i kh R + D
−i kh
R
−a/2 −a/2 −a/2
γ (h) = = =
a/2 a a
I0d y 0
R
−a/2
a/2 −a/2
−i kh
y
e
R − e −i kh R y
= e −i kh D sinc kha
= e −i kh D
−2i kh a/2
R
2R
Note that
Z∞
sin2 αx π
dx =
(αx)2 2α
0
Review, Chapters 6–8
R30 T or F: The integral of I (t ) over all t equals the integral of I (ω) over all
ω.
R33 T or F: The group velocity of light never exceeds the phase velocity.
R38 T or F: The Fourier transform (or inverse Fourier transform if you prefer)
of I (ω) is proportional to the degree of temporal coherence.
R40 T or F: The Young’s two-slit setup is ideal for measuring the temporal
coherence of light.
223
224 Review, Chapters 6–8
Problems
Horizontal Vertical
Polarizer Polarizer R42 (a) Horizontally polarized light enters a system and first travels through
a horizontal and then a vertical polarizer in series. What is the Jones
vector of the transmitted field?
(b) Now a polarizer at 45◦ is inserted between the two polarizers in the
system described in (a). What is the Jones vector of the transmitted
field? How does the final intensity compare to initial intensity?
(c) Now a quarter wave plate with a fast-axis angle at 45◦ is inserted
between the two polarizers (instead of the polarizer of part (b)). What
Figure 8.10 is the Jones vector of the transmitted field? How does the final intensity
compare to initial intensity?
R43 (a) Find the Jones matrix for half wave plate with its fast axis making an
arbitrary angle θ with the x-axis.
HINT: Project an arbitrary polarization with E x and E y onto the fast
and slow axes of the wave plate. Shift the slow axis phase by π, and then
project the field components back onto the horizontal and vertical axes.
The answer is
cos2 θ − sin2 θ 2 sin θ cos θ
· ¸
R44 (a) What is the spectral content (i.e., I (ω)) of a square laser pulse
E 0 e −i ω0 t , |t | ≤ τ/2
½
Figure 8.11 Polarizing Elements E (t ) =
0 , |t | > τ/2
where in this case E 0 has units of E-field per frequency. Make a sketch
of I (t ), indicating the location of the first zeros.
(c) If E (ω) is known (any arbitrary function, not the same as above), and
the light goes through a material of thickness ` and index of refraction
n (ω), how would you find the form of the pulse E (t ) after passing
through the material? Please set up the integral.
HINT:
Z∞
1
e i ω(t −t ) d ω
0
0
δ t −t =
¡ ¢
2π
−∞
coherence is
Z∞ , Z∞
γ(τ) = I (ω) e −i ωτ d ω I (ω)d ω
−∞ −∞
Find the fringe visibility V ≡ (I max − I min )/(I max + I min ) as a function of τ
(i.e. the round-trip delay due to moving one of the mirrors).
226 Review, Chapters 6–8
Extended Source
Fringe Pattern
R47 Light emerging from a point travels by means of two very narrow slits
to a point y on a screen. The intensity at the screen arising from a point
source at position y 0 is found to be
y y0
½ · µ ¶¸¾
¡ 0 ¢ 0
I screen y , h = 2I (y ) 1 + cos kh +
D R
where an approximation has restricted us to small angles.
(a) Now, suppose that I (y 0 ) characterizes emission from a wider source
with randomly varying phase across its width. Write down an expres-
sion (in integral form) for the resulting intensity at the screen:
Z∞
I screen y 0 , h d y 0
¡ ¢
I screen (h) ≡
−∞
(b) Assume that the source has an emission distribution with the form
02 02
I (y 0 ) = I 0 /∆y 0 e −y /∆y . What is the function γ(h) where the intensity
¡ ¢
p
is written I screen (h) = 2 πI 0 1 + Reγ(h) ?
£ ¤
HINT:
Z∞
π B 2 /4A+C
r
−Ax 2 +B x+C
e dx = e Re {A} > 0.
A
−∞
Selected Answers
Light as Rays
So far in our study of optics, we have described light in terms of waves, which sat-
isfy Maxwell’s equations. However, as you are probably aware, in many situations
light can be thought of as rays pointing along the direction of wave propagation. A
ray picture is useful when one is interested in the macroscopic flow of light energy,
but rays fail to reveal fine details, in particular wave and diffraction phenomena.
For example, simple ray theory suggests that a lens can focus light down to a point.
However, if a beam of light were concentrated onto a true point, the intensity
would be infinite! Nevertheless, ray theory is useful for predicting where a focus
occurs. It is also useful for describing imaging properties of optical systems (e.g.
lenses and mirrors).
Beginning in section 9.3 we study the details of ray theory and the imaging
properties of optical systems. First, however, we examine the justification for ray
theory starting from Maxwell’s equations. In the short-wavelength limit, Maxwell’s
equations give rise to the eikonal equation, which governs the direction of rays
in a medium with an index of refraction that varies with position. The German
word ‘eikonal’ comes from the Greek ‘²ικων’ from which the modern word ‘icon’
derives. The eikonal equation therefore has a descriptive title since it controls the
formation of images. Although we will not use the eikonal equation extensively,
we will show how it embodies the underlying justification for ray theory. As will be
apparent in its derivation, the eikonal equation relies on an approximation that
the features of interest in the light distribution are large relative to the wavelength
of the light.
The eikonal equation describes the direction of ray propagation, even in com-
plicated situations such as desert mirages where air is heated near the ground and
has a different index than the air farther from the ground. Rays of light from the
sky that initially are directed toward the ground can be bent such that they travel
parallel to or even up from the ground, owing to the inhomogeneous refractive
index. The eikonal equation can also be used to deduce Fermat’s principle, which
in short says that light travels from point A to point B following a path that takes
the minimum time. This principle can be used, for example, to‘derive’ Snell’s law.
Of course Fermat asserted his principle more than a century before Maxwell’s
227
228 Chapter 9 Light as Rays
1 1 1
= + (9.1)
f do di
where
ω 2π
k vac = = (9.4)
c λvac
Here R (r) is a real scalar function (which depends on position) having the dimen-
sion of length. By taking R (r) to be real, we do not take into account possible
absorption or amplification in the medium. Even though the trial solution (9.3)
looks somewhat like a plane wave,1 the function R (r) accommodates wave fronts
that can be curved or distorted as depicted in Fig. 9.1. At any given instant t , the
phase of the curved surfaces described by R (r) = const ant can be interpreted
as wave fronts of the solution. The wave fronts travel in the direction for which
R (r) varies the fastest. This direction is is aligned with ∇R (r), which lies in the
direction perpendicular to surfaces of constant phase.
The substitution of the trial solution (9.3) into the wave equation (9.2) gives
1 h i
2
∇2 E0 (r) e i kvac R(r) + n 2 (r) E0 (r) e i kvac R(r) = 0 (9.5)
k vac Figure 9.1 Wave fronts (i.e. sur-
faces of constant phase given by
where we have divided each term by e −i ωt . R(r)) distributed throughout space
in the presence of a spatially inho-
mogeneous refractive index. The
Computing the Laplacian in (9.5) gradient of R gives the direction of
travel for a wavefront.
The gradient of the x component of the field is
h i
∇ E 0x (r) e i kvac R(r) = [∇E 0x (r)] e i kvac R(r) + i k vac E 0x (r) [∇R (r)] e i kvac R(r)
+i k vac E 0x (r) ∇ R (r) + 2i k vac [∇E 0x (r)] · [∇R (r)] e i kvac R(r)
£ 2 ¤ ª
Upon combining the result for each vector component of E0 (r), the required spatial
derivative can be written as
h i ¡
∇2 E0 (r) e i kvac R(r) = ∇2 E0 (r) − k vac
2
E0 (r) [∇R (r)] · [∇R (r)] + i k vac E0 (r) ∇2 R (r)
£ ¤
© £ ¤
+2i k vac x̂ [∇E 0x (r)] · [∇R (r)] + ŷ ∇E 0 y (r) · [∇R (r)]
+ ẑ [∇E 0z (r)] · [∇R (r)]}) e i kvac R(r)
1 If the index is spatially independent (i.e. n (r) → n), then (9.3) reduces to the usual plane-wave
solution of the wave equation. In this case, we have R (r) = k · r/k vac and the field amplitude
becomes constant (i.e. E0 (r) → E0 ).
230 Chapter 9 Light as Rays
After performing the Laplacian and after some rearranging, (9.5) becomes
∇2 E0 (r) i 2i
∇R(r) · ∇R(r) − n 2 (r) E0 (r) = ∇2 R (r) +
£ ¤
2
+ x̂∇E 0x (r) · ∇R (r)
k vac k vac k vac
2i £ £ ¤ ¤
+ ŷ ∇E 0 y (r) · ∇R (r) + ẑ∇E 0z (r) · ∇R (r)
k vac
(9.6)
Don’t be afraid; at this point we are ready to make an important approxima-
tion. We take the limit of a very short wavelength (i.e. 1/k vac = λvac /2π → 0), and
the entire right-hand side of (9.6) vanishes. (Thank goodness!) With it we lose
the effects of diffraction. We also lose surface reflections at abrupt index changes
unless specifically considered. This approximation works best in situations where
only macroscopic features are of concern.
Our wave equation has been simplified to
where ŝ is a unit vector pointing in the direction ∇R (r), the direction normal to
wave front surfaces. Equation (9.8) is called the eikonal equation.2
y
Example 9.1
h
Suppose that a region of air above the desert
¡ ¢ on a phot day has an index of refraction
¡ ¢
that varies with height y according to n y = n 0 1 + y 2 /h 2 . Verify that R x, y =
h/2 n 0 x ± y 2 /2h is a solution to the eikonal equation. (See problem P9.1 for a more
¡ ¢
∇R · ∇R = n 0 x̂ ± ŷ y/h · n 0 x̂ ± ŷ y/h = n 02 1 + y 2 /h 2 = n 2 y
¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢
x̂ ± ŷ x̂ ± ŷ/2 x̂ ± ŷ/4
ŝ (h) = p ŝ (h/2) = p ŝ (h/4) = p
2 5/4 17/16
2 M. Born and E. Wolf, Principles of Optics, 7th ed., Sect. 3.1.1 (Cambridge University Press, 1999).
9.2 Fermat’s Principle 231
These are represented in Fig. 9.2. In a desert mirage, light from the sky can appear
to come from a lower position. We can determine a path for the rays by setting
d y/d x equal to the slope of ŝ:
dy y
= ⇒ y = y 0 e ±(x−x0 )/h
dx h
3 M. Born and E. Wolf, Principles of Optics, 7th ed., Sect. 3.3.2 (Cambridge University Press, 1999).
4 The curl of a gradient is identically zero for any function.
232 Chapter 9 Light as Rays
Equation (9.10) states that the integration of nŝ · d ` around a closed loop is always
zero. If we consider a closed loop comprised of a path from point A to point B and
then a different path from point back to point A again, the integrals for the two
legs always cancel, even while holding one path fixed while varying the other. This
means
ZB
nŝ · d ` is independent of path from A to B. (9.11)
A
Now consider a path from A to B that is parallel to ŝ, as depicted in Fig. 9.3. In
this case, the cosine in the dot product is always one. If we choose some other
path that connects A and B, the cosine associated with the dot product is less than
one at most points along that path, whereas the result of the integral is the same.
Therefore, if we artificially remove the dot product from the integral (i.e. exclude
the cosine factor), the result of the integral will exceed the true value unless the
path chosen follows the direction of ŝ (i.e. the path that corresponds to the one
that light rays actually follow).
In mathematical form, this argument can be expressed as
A ZB
B
Z
nŝ · d ` = min nd ` (9.12)
A A
The integral on the right is called the optical path length (OP L) between points A
and B:
ZB
B
B OP L| A ≡ nd ` (9.13)
A
Figure 9.3 A ray of light leaving The conclusion is that the true path that light follows between two points (i.e.
point A arriving at B. the one that stays parallel to ŝ) is the one with the shortest optical path length.
The index n may vary with position and therefore can be different for each of the
incremental distances d `.
Fermat’s principle is usually stated in terms of the time it takes light to travel
between points. The travel time ∆t depends not only on the path taken by the
light but also on the velocity of the light v (r), which varies spatially with the
refractive index:
ZB ZB
B d` d` OP L|BA
∆t | A = = = (9.14)
v(r) c/n(r) c
A A
To find the correct path for the light ray that leaves point A and crosses point
B, we need only minimize the optical path length between the two points. Mini-
mizing the optical path length is equivalent to minimizing the time of travel since
it differs from the time of travel only by the constant c. The optical path length
is not the actual distance that the light travels; it is proportional to the number
of wavelengths that fit into that distance (see (2.24)). Thus, as the wavelength
shortens due to a higher index of refraction, the optical path length increases.
The correct ray traveling from A to B does not necessarily follow a straight line
but can follow a complicated curve according to how the index varies.
9.2 Fermat’s Principle 233
An imaging situation occurs when many paths from point A to point B have
the same optical path length. An example of this occurs when a lens causes an
image to form. In this case all rays leaving point A (on an object) and traveling
A B
through the system to point B (on the image) experience equal optical path
lengths. This situation is depicted in Fig. 9.4. Note that while the rays traveling
through the center of the lens have a shorter geometric path length, they travel
through more material so that the optical path length is the same for all rays. Figure 9.4 Rays of light leaving
To summarize Fermat’s principle, of the many rays that might emanate from point A with the same optical path
a point A, the ray that crosses a second point B is the one that follows the shortest length to B.
optical path length. If many rays tie for having the shortest optical path, we
say that an image of point A forms at point B. It should be noted that Fermat’s
principle, as we have written it, does not work for anisotropic media such as
crystals where n depends on the direction of a ray as well as on its location (see
P9.4).
Example 9.2
Use Fermat’s principle to derive Snell’s law.
Solution: Consider the many rays of light that leave point A seen in Fig. 9.5. Only
one of the rays passes through point B. Within each medium we expect the light to
travel in a straight line since the index is uniform. However, at the boundary we
must allow for bending since the index changes.
The optical path length between points A and B may be written
q q
OP L = n i x i2 + y i2 + n t x t2 + y t2 (9.15)
We need to minimize this optical path length to find the correct one according to
Fermat’s principle. B
Since points A and B are fixed, we may regard x i and x t as constants. The distances
y i and y t are not constants although the combination
y tot = y i + y t (9.16)
Notice that
yi yt
sin θi = q and sin θt = q (9.19)
x i2 + y i2 x t2 + y t2
234 Chapter 9 Light as Rays
Example 9.3
Use Fermat’s principle to derive the equation of curvature for a reflective surface
that causes all rays leaving one point to image to another. Do the calculation in
two dimensions rather than in three.5
Solution: We adopt the convention that the origin is half way between the points,
which are separated by a distance 2a, as shown in Fig. 9.6. If the points are to
image to each other, Fermat’s principle requires that the total path length be a
constant; call it b. By inspection of the figure, we that path (which reflects once)
from one point to the other is
q q
(x + a)2 + y 2 + (x − a)2 + y 2 = b (9.21)
To get (9.21) into a more recognizable form, we isolate the first square root and
square both sides of the equation, which gives
q
(x + a)2 + y 2 = b 2 + (x − a)2 + y 2 − 2b (x − a)2 + y 2
After squaring the two binomial terms, some nice cancelations occur, and we get
Figure 9.6
q
4ax − b 2 = −2b (x − a)2 + y 2
16a 2 − 4b 2 x 2 − 4b 2 y 2 = 4a 2 b 2 − b 4
¡ ¢
Finally, we divide both sides by the term on the right to obtain the (hopefully)
familiar form of an ellipse
x2 y2
³ 2´ + ³ 2 ´ =1 (9.22)
b b 2
4 4 − a
5 This configuration is used to direct flash lamp energy into a laser amplifier rod. One ‘point’ in
Fig. 9.6 represents the end of an amplifier rod while the other represents the end of a thin flash-lamp
tube.
9.3 Paraxial Rays and ABCD Matrices 235
tan θ ∼
=θ (9.24)
Here, the angle θ (in radians) represents the angle that a particular ray makes
with respect to the optical axis. There is an important mathematical reason for
this approximation. The sine is a nonlinear function, but at small angles it is
approximately linear and can be represented by its argument. It is this linearity
that is crucial to the process of forming images. The linearity also greatly simplifies
the formulation since it reduces the problem to linear algebra. Conveniently, we
will be able to keep track of imaging effects with a 2×2 matrix formalism.
Consider a ray propagating in the y–z plane where the optical axis is in the z- Figure 9.7 The behavior of a ray as
direction. Let us specify a ray at position z 1 by two coordinates: the displacement light traverses a distance d .
from the axis y 1 and the orientation angle θ1 (see Fig. 9.7). If the index is uniform
everywhere, the ray travels along a straight path. It is straightforward to predict the
236 Chapter 9 Light as Rays
coordinates of the same ray down stream, say at z 2 . First, since the ray continues
in the same direction, we have
θ2 = θ1 (9.25)
By referring to Fig. 9.7 we can write y 2 in terms of y 1 and θ1 :
y 2 = y 1 + d tan θ1 (9.26)
y 2 = y 1 + θ1 d (9.27)
Example 9.4
Let the distance d be subdivided into two distances, a and b, such that d = a +
b. Show that an application of the ABCD matrix for distance a followed by an
application of the ABCD matrix for b renders same result as an application of the
ABCD matrix for distance d .
where the subscript “mid” refers to the ray in the middle position after traversing
the distance a. If we combine the equations, we get
· ¸ · ¸· ¸· ¸
y2 1 b 1 a y1
= (9.30)
θ2 0 1 0 1 θ1
which is in agreement with (9.28) since the ABCD matrix for the entire displace-
ment is · ¸ · ¸· ¸ · ¸
A B 1 b 1 a 1 a +b
= = (9.31)
C D 0 1 0 1 0 1
6 P. W. Milonni and J. H. Eberly, Lasers, Sect. 14.2 (New York: Wiley, 1988).
9.4 Reflection and Refraction at Curved Surfaces 237
y2 = y1 (9.32)
where θi is the angle of incidence with respect to the normal to the spherical
mirror surface. By the law of reflection, the incident and reflected ray both occur
at an angle θi referenced to the surface normal. The surface normal points towards
the center of curvature of the mirror surface, which we assume is on the z-axis a
distance R away. By convention, the radius of curvature R is a positive number
if the mirror surface is concave and a negative number if the mirror surface is
convex.
φ = θ1 + θi (9.35)
Figure 9.8 A ray depicted in the
and when this is combined with (9.34), we get
act of reflection from a spherical
y1 surface.
θi = − θ1 (9.36)
R
With this we are able to put (9.33) into a useful linear form:
2
θ2 = − y 1 + θ1 (9.37)
R
238 Chapter 9 Light as Rays
Equations (9.32) and (9.37) describe a linear transformation that can be con-
cisely formulated as
· ¸ · ¸· ¸
y2 1 0 y1
ABCD matrix for a curved mirror = (9.38)
θ2 −2/R 1 θ1
The ABCD matrix in this transformation describes the act of reflection from a
concave mirror with radius of curvature R. The radius R is negative when the
mirror is convex.
The final basic element that we shall consider is a spherical interface between
two materials with indices n i and n t (see Fig. 9.9). This has an effect similar to
that of the curved mirror, which changes the direction of a ray without altering
its distance y 1 from the optical axis. Please note that here the radius of curvature
is considered to be positive for a convex surface (opposite convention from that
of the mirror). In this way, if the lower index is on the left, a positive radius R for
either the interface or the mirror tends to deflect rays towards the axis. Again, we
are interested only in the act of transmission without any travel before or after
the interface. As before, (9.32) applies (i.e. y 2 = y 1 ).
At the interface, the rays obey Snell’s obeys, which in the paraxial approxima-
tions is written
n i θi = n t θt (9.39)
The angles θi and θt are referenced from the surface normal, as seen in Fig. 9.9.
and
θt = θ2 + φ (9.41)
where φ is the angle that the surface normal makes with the z-axis. As before (see
(9.34)), within the paraxial approximation we may write
φ∼
= y 1 /R
When this is used in (9.40) and (9.41), which are substituted into (9.39), Snell’s law
becomes
ni y 1 ni
µ ¶
θ2 = −1 + θ1 (9.42)
nt R nt
surfaces is ignored. (See P 9.6 for the more general case of a thick lens.) Mirror: 1f = R2
Thick lens
Solution: A thin lens is depicted in Fig. 9.10. R 1 is the radius of curvature for the ³ ´
1+ d n 1 −1 d
R1 n
first surface (which is positive if convex as drawn), and R 2 is the radius of curvature µ ¶ ³ ´ ³
´
(1−n) R1 − R1 + R dR 2− n 1 −n 1− d 1 −1
for the second surface (which is negative as drawn). For either surface, the radius 1 2 1 2 R2 n
The matrix for the first interface is written on the right, where it operates first on
an incoming ray vector. In this case, n i = 1 and n t = n. The matrix for the second
surface is written on the left so that it operates afterwards. For the second surface,
n i = n and n t = 1.
Notice the close similarity between the ABCD matrix for a thin lens (9.44) and
the ABCD matrix for a curved mirror (9.38). The ABCD matrix for either the thin
lens or the mirror can be written as
· ¸ · ¸
A B 1 0
= (9.45)
C D −1/ f 1 Figure 9.10 Thin lens.
where in the case of the thin lens the focal length is given by the lens maker’s
formula µ ¶
1 1 1
= (n − 1) − (focal length of thin lens) (9.46)
f R1 R2
240 Chapter 9 Light as Rays
Example 9.6
Derive the ABCD matrix for a window with thickness d and index n.
Solution: We can again take advantage of the ABCD matrix for a curved interface
(9.43), only in this problem we will let R 1 = ∞ and R 2 = ∞ to provide flat surfaces.
We take the index outside of the window to be unity and the index inside the
window to be n. We use the ABCD matrix (9.43) twice, once for each interface,
sandwiching matrix (9.31), which endows the window with thickness:
· ¸ · ¸· ¸· ¸
A B 1 0 1 d 1 0
= 1
C D 0 n 0 1 0 n
· ¸ (9.48)
1 d /n
= (window)
0 1
As far as rays are concerned, a window is effectively shorter to traverse than free
space.7 Fig. 9.11 illustrates why this is the case. The displacement of the exiting ray
is not as great as it would have been without the window. The window impedes
Figure 9.11 Window.
the rate at which the ray can move away from or toward the optical axis.
Example 9.7
h i
y
h i
y
Find ray θ22 that results when θ11 propagates through a distance a, reflects from
a mirror of radius R, and then propagates through a distance b. See Fig. 9.12.
Solution: The final ray in terms of the initial one is computed as follows:
· ¸ · ¸· ¸· ¸· ¸
y2 1 b 1 0 1 a y1
=
θ2 0 1 −2/R 1 0 1 θ1
· ¸· ¸
1 − 2b/R a + b − 2ab/R y1
= (9.49)
−2/R 1 − 2a/R θ1
(1 − 2b/R) y 1 + (a + b − 2ab/R) θ1
· ¸
=
(−2/R) y 1 + (1 − 2a/R) θ1
As always, the ordering of the matrices is important. The first effect that the ray
experiences is represented
h i by the matrix on the right, which is in the position that
y1
first operates on θ1 .
We have derived our basic ABCD matrices for rays traveling in the y–z plane,
as suggested in Figs. 9.7–9.12. This may have given the impression that it is
necessary to work within a plane that contains the optical axis (i.e. the z-axis
in our case). However, within the paraxial approximation, the ABCD matrices
are valid for rays that become displaced simultaneously in both the x and y
dimensions during propagating along z.
As we demonstrate below, the behavior of rays functions independently in
the x and y dimensions. h y Ifi desired, one can write a ray vector for each dimen-
£x¤
sion, namely θx and θ y . Moreover, the identical matrices, for example any
in table 9.1, are used for either dimension. Figs. 9.7–9.12 therefore represent
projections of rays onto the y–z plane. To complete the story, one can imagine
corresponding figures representing the projection of the rays onto the x–z plane.
Imagine a ray contained within a plane that is parallel to the y–z plane but for
which x > 0. One might be concerned that when the ray meets, for example, a
spherically concave mirror, the radius of curvature in the perspective of the y–z
dimension might be different for x > 0 than for x = 0 (at the center of the mirror).
This concern is actually quite legitimate and is the source of what is known as Galileo di Vincenzo Bonaiuti de'
spherical aberration. Nevertheless, in the paraxial approximation the intersection Galilei (15641642, Italian) was born
with the curved mirror of all planes that are parallel to the optical axis gives the in Pisa, Italy, the son of a musician.
Galileo enrolled in the University of Pisa
same curve.
with the intent to study medicine but
To see why this is so, consider the curvature of the mirror in Fig. 9.8. As we soon became diverted into mathematics.
He served three years as chair of math-
move away from the mirror center (in the x or y-dimension or some combination
ematics in Pisa beginning in 1589 and
thereof), the mirror surface deviates to the left by the amount then moved to the University of Padua
where he taught geometry, mechanics,
δ = R − R cos φ (9.50) and astronomy for two decades. While
Galileo did not invent the telescope, he
∼ 2
. φ = 1 − φ /2. And since in this approxi-
considerably improved the design. With
In the paraxial approximation, we have cos
it he discovered four moons of Jupiter
mation we may also write φ ∼
p
= x 2 + y 2 R, (9.50) becomes and was the rst to observe sunspots
and mountains and valleys on the Moon.
Galileo also was the rst to document
x2 y2
δ∼
= + (9.51) the phases of Venus, similar to the
2R 2R phases of the moon. He used these
observations to argue in favor of the
In the paraxial approximation, we see that the curve of the mirror is parabolic, and Copernican model of the solar system,
therefore separable between the x and y dimensions. That is, the curvature in but this conicted with the prevailing
the x-dimension (i.e. ∂δ/∂x = x/R) is independent of y, and the curvature in the views of the Catholic Church at the
time, and he was placed under house
y-dimension (i.e. ∂δ/∂y = y/R) is independent of x. A similar argument can be arrest and forbidden to publish of any
made for a spherical interface between two media. of his works. While under house arrest,
he wrote much on kinematics and other
principles of physics and is considered to
be the father of modern physics. Galileo
attempted to measure the speed of
light by observing an assistant uncover
9.6 Image Formation a lantern on a distant hill in response
to a light signal. He concluded that
light is really fast if not instantaneous.
Consider Example 9.7 where a ray travels a distance a, reflects from a curved
(Wikipedia)
mirror, and then travels a distance b. From (9.49), the ABCD matrix for the overall
242 Chapter 9 Light as Rays
process is
· ¸ · ¸
A B 1 − b/ f a + b − ab/ f
= (9.52)
C D −1/ f 1 − a/ f
where by (9.47) we have replaced 2/R with 1/ f . Because of the similarity between
the behavior of a curved mirror and a thin lens, the above expression can also
represent a ray traveling a distance a, traversing a thin lens with focal length f ,
and then traveling a distance b. The only difference is that, in the case the thin
lens, f is given by lens maker’s formula (9.46).
As is well known, it is possible to form an image with either a curved mirror
or a lens. Suppose that the initial ray is one of many rays that leaves a particular
point on an object positioned a = d o before the mirror (or lens). In order for an
image to occur at d i = b, it is essential that all rays leaving the particular point on
the object converge to a corresponding point on the image. That is, we want rays
leaving the point y 1 on the object (which may take on a range of angles θ1 ) all to
converge to a single point y 2 at the image. In the following equation we need y 2
to be independent of θ1 :
Ay 1 + B θ1
· ¸ · ¸· ¸ · ¸
y2 A B y1
= = (9.53)
θ2 C D θ1 C y 1 + Dθ1
do di 1 1 1
Figure 9.13 Image formation by a
do + di − =0⇒ = + (9.55)
f f do di
thin lens.
which is the familiar imaging formula (9.1). When the object is infinitely far away
(i.e. d o → ∞), the image appears at d i → f . This gives a physical interpretation
to the focal length f , as we have been calling it. Please note that d o and d i can
each be either positive (real as depicted in Fig. 9.13) or negative (virtual meaning
a screen cannot be inserted to display the image).
The magnification of the image is found by comparing the size of y 2 to y 1 .
From (9.52)–(9.55), the magnification is found to be
y2 di di
M≡ = A = 1− = − (9.56)
y1 f do
The negative sign indicates that for positive distances d o and d i the image is
inverted.
In the above discussion, we have examined image formation by a thin lens
or a curved mirror. Of course, images can also be formed by thick lenses or by
more complex composite optical systems (e.g. a system of lenses and spaces).
The ABCD matrices for the elements in a composite system are simply multiplied
together (the first element that rays encounter appearing on the right) to obtain an
9.6 Image Formation 243
overall ABCD matrix. The principles for image formation with an arbitrary ABCD
matrix are the same as those for a thin lens or curved mirror. As before, consider
propagation a distance d o from an object to the optical element followed by
propagation a distance d i to an image. The ABCD matrix for the overall operation
is
· ¸· ¸· ¸ · ¸
1 di A B 1 do A + d iC d o A + B + d o d iC + d i D
=
0 1 C D 0 1 C d oC + D
· 0 (9.57)
A B0
¸
=
C 0 D0
Example 9.8
Beginning students are often taught to draw ray diagrams such as the one in Fig.
9.14, which shows a real image formed by a thin lens. Several key rays aid in a
graphic prediction of the location and size of the image. Use ABCD-matrix analysis
to describe the effect of the lens on the three rays drawn.
object
A
C
image
Solution: Ray A is parallel to the axis with height y 1 before traversing the lens. Just
after the lens, ray A is described by
· ¸ · ¸· ¸ · ¸
y2 1 0 y1 y1
= =
θ2 −1/ f 1 0 −y 1 / f
· ¸· ¸ · ¸
1 f y1 0
which crosses the axis at the focus d = f , since 0 1 −y 1 / f
=
−y 1 / f
.
244 Chapter 9 Light as Rays
Meanwhile, ray B traverses the lens just where it crosses the axis. The lens
does nothing to this ray:
µ ¶ µ ¶µ ¶ µ ¶
y2 1 0 0 0
= =
θ2 −1/ f 1 −y 1 /d o −y 1 /d o
Ray B is un-deflected.
Finally, ray C, which goes through the point d = − f before the lens, becomes
parallel to the axis following the lens:
· ¸ · ¸· ¸ · ¸
y2 1 0 −M y 1 −M y 1
= =
θ2 −1/ f 1 −M y 1 / f 0
Note that starting from the left focus, we have just before the lens
· ¸· ¸ · ¸
1 f 0 −M y 1
=
0 1 −M y 1 / f −M y 1 / f
Our task is to find the values of p 1 and p 2 that make (9.60) true. We can straight-
away make the definition
f eff ≡ −1/C (9.61)
We can also solve for p 1 and p 2 by setting the diagonal elements of the matrix to 1.
Explicitly, we get
1−D
p 1C + D = 1 ⇒ p 1 = (9.62)
C
and
1− A
A + p 2C = 1 ⇒ p 2 = (9.63)
C
It remains to be shown that the upper right element in (9.60) (i.e. p 1 A + B +
p 1 p 2C + p 2 D) automatically goes to zero for our choices of p 1 and p 2 . This may
seem unlikely at first, but watch what happens!
When (9.62) and (9.63) are substituted into the upper right matrix element of (9.60)
we get
1−D 1−D 1− A 1− A
p 1 A + B + p 1 p 2C + p 2 D = A +B + C+ D
C C C C
1
= [1 − AD + BC ] (9.64)
C
µ ¯ ¯¶
1 ¯ A B ¯
= 1−¯¯ ¯
C C D ¯
This vanishes (as desired) if the determinant of the original ABCD matrix equals
one. Fortunately, this is always the case as long as we begin and end in the same
index of refraction: ¯ ¯
¯ A B ¯
¯ C D ¯=1 (9.65)
¯ ¯
Notice that the determinants of all of the matrices in table 9.1 are one. Moreover,
ABCD matrices constructed of these will also have determinants equal to one.10
However, when this matrix is used in succession to form a lens, the resulting matrix has determinant
equal to one.
11 P. W. Milonni and J. H. Eberly, Lasers, Sect. 14.3 (New York: Wiley, 1988).
246 Chapter 9 Light as Rays
Solution: The round-trip ABCD matrix for the cavity shown in Fig. 9.16c is
· ¸ · ¸· ¸· ¸· ¸
A B 1 L 1 0 1 L 1 0
= (9.66)
C D 0 1 −2/R 2 1 0 1 −2/R 1 1
where we have begun the round trip just after a reflection from the first mirror.
The round-trip ABCD matrix for the cavity shown in Fig. 9.16d is
· ¸ · ¸· ¸· ¸· ¸
A B 1 2L 1 1 0 1 2L 2 1 0
= (9.67)
C D 0 1 −1/ f 1 0 1 −1/ f 1
where we have begun the round trip just after a transmission through the lens
moving to the right. It is somewhat arbitrary where a round trip begins. The
multiplication on the above matrices will need to be carried out to do problems
P9.15 and P9.16.
9.A Aberrations and Ray Tracing 247
At this point you might be concerned that taking an ABCD matrix to the N th
power can be a lot of work. (It is already a significant work just to compute the
ABCD matrix for a single round trip.) In addition, we are interested in letting N
be very large, perhaps even infinity. You can relax because we have a neat trick to
accomplish this daunting task.
By Sylvester’s theorem in appendix 0.3, we have
¸N
A sin N θ − sin (N − 1) θ B sin N θ
· · ¸
A B 1
= (9.69)
C D sin θ C sin N θ D sin N θ − sin (N − 1) θ
where
1
cos θ =
(A + D) . (9.70)
2
This is valid as long as the determinant of the ABCD matrix is one. As noted
earlier (see (9.65)), we are in luck! The determinant is one any time a ray begins
and stops in the same refractive index, which by definition is guaranteed for any
round trip. We therefore can employ Sylvester’s theorem for any N that we might
choose, including very large integers.
We would like the elements of (9.69) to remain finite as N becomes very large.
If this is the case, then we know that a ray remains trapped within the cavity
and stays reasonably close to the optical axis. Since N only appears within the
argument of a sine function, which is always bounded between −1 and 1 for
real arguments, it might seem that the elements of (9.69) always remain finite
as N approaches infinity. However, it turns out that θ can become imaginary
depending on the outcome of (9.70), in which case the sine becomes a hyperbolic
sine, which can ‘blow up’ as N becomes large. In the end, the condition for cavity
stability is that a real θ must exist for (9.70), or in other words we need
1
−1 < (A + D) < 1 (condition for a stable cavity) (9.71)
2
It is left as an exercise to apply this condition to (9.66) and (9.67) to find the
necessary relationships between the various element curvatures and spacing in
order to achieve cavity stability.
248 Chapter 9 Light as Rays
(the pair is usually cemented together to form a “doublet” lens). The lens with the
shortest focal length is made of the glass whose index has the lesser dependence
on wavelength. By properly choosing the prescription of the two lenses, you
can exactly compensate for chromatic aberration at two wavelengths and do a
good job for a wide range of others. Achromatic doublets can also be designed to
minimize spherical aberration (see below), so they are often a good choice when
you need a high quality lens.
Monochromatic aberrations arise from the shape of the lens rather than the
variation of n with wavelength. Before the advent computers facilitated the
widespread use of ray tracing, these aberrations had to be analyzed primarily
with analytic techniques. The analytic results derived previously in this chapter
were based on first order approximations (e.g. sin θ ≈ θ). This analysis predicts
that a lens can image a point source to an exact image point, which predicts
spherically converging wavefronts at the image point as shown in Fig. 9.17(a). You
can increase the accuracy of the theory for non-paraxial rays by retaining second-
order correction terms in the analysis. With these second-order terms included,
the wave fronts converging towards an image point are mostly spherical, but have
second-order aberration terms added in (shown conceptually in Fig. 9.17(b)).
There are five aberration terms in this second-order analysis, and these represent
a convenient basis for discussing aberration.
The first aberration term is known as spherical aberration. This type of aber- high dispersion
low dispersion
ration results from the fact that rays traveling through a spherical lens at large glass glass
radii experience a different focal length than those traveling near the axis. For a
Figure 9.19 Chromatic aberration
converging lens, this causes wide-radius rays to focus before the near-axis rays
causes lenses to have different
as shown in Fig. 9.20. This problem can be helped by orienting lenses so that
focal lengths for different wave-
the face with the least curvature is pointed towards the side where the light rays lengths. It can be corrected using
have the largest angle. This procedure splits the bending of rays more evenly an achromatic doublet lens.
between the front and back surface of the lens. As mentioned above, you can also
cement two lenses made from different types of glass together so that spherical
aberrations from one lens are corrected by the other.
The aberration term referred to as astigmatism occurs when an off-axis object
point is imaged to an off-axis image point. In this case a spherical lens has a
different focal length in the horizontal and vertical dimensions. For a focusing
lens this causes the two dimensions to focus at different distances, producing a
vertical line at one image plane and a horizontal line at another. A lens can also be
inherently astigmatic even when viewed on axis if it is football shaped rather than
spherical. In this case, the astigmatic aberration can be corrected by inserting a
cylindrical lens at the correct orientation (this is a common correction needed in
eyeglasses).
A third aberration term is referred to as coma. This is observed when off-axis
points are imaged and produces a comet shaped tail with its head at the point
predicted by paraxial theory. (The term ‘coma’ refers to the atmosphere of a
comet, which is how the aberration got its name.) This aberration is distinct from
astigmatism, which is also observed for off-axis points, since coma is observed Figure 9.20 Spherical aberration
in a plano-convex lens.
250 Chapter 9 Light as Rays
c
b
a
c
a
Image on screen
Figure 9.21 Illustration of coma. Rays traveling through the center of the lens are im-
aged to point a as predicted by paraxial theory. Rays that travel through the lens at
radius ρ b in the plane of the figure are imaged to point b. Rays that travel through the
lens at radius ρ b , but outside the plane of the figure are imaged to other points on the
circle (in the image plane) containing point b. Rays at that travel through the lens at
other radii on the lens (e.g. ρ c ) also form circles in the image plane with radius propor-
tional to ρ 2 with the center offset from point a a distance proportional to ρ 2 . When
light from each of these circles combines on the screen it produces an imaged point
Undistorted
with a “comet tail.”
even when all of the rays are in one plane (see Fig. 9.21). You have probably seen
coma if you’ve ever played with a magnifying glass in the sun—just tilt the lens
slightly and you see a comet-like image rather than a point.
The curvature of the field aberration term arises from the fact that spherical
Barrel Distortion lenses image spherical surfaces to another spherical surface, rather than imaging
a plane to a plane. This is not so bad for your eyeball, which has a curved screen,
but for things like cameras and movie projectors we would like to image to a flat
screen. When a flat screen is used and the curvature of the field aberration is
present, the image will be focus well near the center, but become progressively
out of focus as you move to the edge of the screen (i.e. the flat screen is farther
from the curved image surface as you move from the center).
The final aberration term is referred to as distortion. This aberration occurs
Pincushion Distortion
when the magnification of a lens depends on the distance from the center of
Figure 9.22 Distortion occurs the screen. If magnification decreases as the distance from the center increases,
when magnification is not con- then ‘barrel’ distortion is observed. When magnification increases with distance,
stant across an extended image.
‘pincushion’ distortion is observed (see Fig. 9.22).
All lenses will exhibit some combination of the aberrations listed above (i.e.
chromatic aberration plus the five second-order aberration terms). In addition to
the five named monochromatic aberrations, there are many other higher order
aberrations that also have to be considered. Aberrations can be corrected to a high
degree with multiple-element systems (designed using ray-tracing techniques)
composed of lenses and irises to eliminate off-axis light. For example, a camera
lens with a focal length of 50 mm, one of the simplest lenses in photography, is
typically composed of about six individual elements. However, optical systems
never completely eliminate all aberration, so designing a system always involves
some degree of compromise in choosing which aberrations to minimize and
which ones you can live with.
Exercises 251
Exercises
P9.1 Consider the index described in Example 9.1. The solution given in
the example corresponds to rays that asymptotically approach y = 0. A
more general solution is given by
p
µ q ¶
∇R = n 0 x̂ 1 + α ± ŷ y 2 /h 2 − α 1 + α > 0 and y 2 /h 2 − α > 0
¡ ¢
This corresponds to rays that either hit the ground or return toward the
sky without reaching the ground, depending on the sign of α.
(a) Verify that ∇R satisfies the eikonal equation and determine the
¡ ¢
function R x, y .
ξ
³ p ´
HINT: d ξ ξ2 − α = 2 ξ2 − α − α2 ln ξ + ξ2 − α (ξ − α > 0).
R p p
p ³ ´
(b) Verify that the light path is given by y = h α cosh x−x p 0 when
h 1+α
p ¯
¯ x−x
¯
α > 0 and is given by y = h |α| sinh ¯ p 0
¯ when α < 0. Consider only
¯
h 1+α
the region y > 0 (i.e. above ground). Notice that these solutions can
make rays that travel either to the right or to the left.
HINT: cosh2 ξ − sinh2 ξ = 1 ddξ cosh ξ = sinh ξ ddξ sinh ξ = cosh ξ.
(c) Make a sketch of these two solution classes in the case of α = ±4.
P9.2 Prove that under the approximation of very short wavelength, the
Poynting vector is directed along ∇R (r) or ŝ.
Solution: (partial)
First, from Faraday’s law (1.36) we have
i ³ ´
B(r, t ) = ∇ × E0 (r)e i (kvac R(r)−ωt )
ω
i ³ i (kvac R(r)−ωt ) ´
B(r, t ) = e [∇ × E0 (r)] + i k vac e i (kvac R(r)−ωt ) [∇R(r) × E0 (r)]
ω
i λvac i [kvac R(r)−ωt ] 1
= e [∇ × E0 (r)] − e i [kvac R(r)−ωt ] [∇R (r) × E0 (r)]
2πc c
The first term vanishes in the limit of very short wavelength, and we have:
1
B(r, t ) → − [∇R (r)] × E0 (r) e i [kvac R(r)−ωt ] . (9.72)
c
Next, from Gauss’s law (1.34) and the constitutive relation (2.16) we have
h¡ i
∇ · 1 + χ(r) E0 (r)e i (kvac R(r)−ωt ) = 0
¢
252 Chapter 9 Light as Rays
e i (kvac R(r)−ωt ) ∇ · 1 + χ (r) E0 (r) + i k vac e i (kvac R(r)−ωt ) 1 + χ(r) [∇R (r) · E0 (r)] = 0
£¡ ¢ ¤ ¡ ¢
Canceling the common exponential term, using k vac = 2π/λvac , and some algebra then gives
∇ · 1 + χ(r) E0 (r)
£¡ ¢ ¤
−i λvac + ∇R(r) · E0 (r) = 0
2π 1 + χ(r)
¡ ¢
1
S= Re {E(r, t )} × Re {B(r, t )}
µ0
1 £
E (r, t ) + E∗ (r, t ) × B(r, t ) + B∗ (r, t )
¤ £ ¤
=
4µ0
You will need to employ expressions (9.72) and (9.73), as well as the BAC-CAB rule (see P0.3).
P9.3 Use Fermat’s Principle to derive the law of reflection (3.6) for a reflective
surface.
A
B
HINT: Do not consider light that goes directly from A to B; require a
single bounce.
P9.4 Show that Fermat’s Principle fails to give the correct path for an extraor-
dinary ray entering a uniaxial crystal whose optic axis is perpendicular
to the surface.
HINT: With the index given by (5.29), show that Fermat’s principle
Figure 9.23 leads to an answer that neither agrees with the direction of the k-vector
(5.32) nor with the direction of the Poynting vector (5.40).
P9.5 Derive the ABCD matrix that takes a ray on a round trip through a
simple laser cavity consisting of a flat mirror and a concave mirror of
radius R separated by a distance L. HINT: Start at the flat mirror. Use
the matrix in (9.28) to travel a distance L. Use the matrix in (9.38) to
represent reflection from the curved mirror. Then use the matrix in
(9.28) to return to the flat mirror. The matrix for reflection from the flat
mirror is the identity matrix (i.e. R flat → ∞).
Exercises 253
P9.6 Derive the ABCD matrix for a thick lens made of material n 2 sur-
rounded by a liquid of index n 1 . Let the lens have curvatures R 1 and R 2
and thickness d .
Answer:
³
n1
´
n
1 + Rd −1 d n1
· ¸
A B 1 ´ n2 ³2
= ³
n2
´³ ³
n n
´
n1
´
C D − n − 1 R − R + R dR 2 − n1 − n2
1 1
1 − Rd n2 − 1
1 1 2 1 2 2 1 2
P9.7 (a) Show that the ABCD matrix for a thick lens (see P9.6) reduces to that A
of a thin lens (9.45) when the thickness goes to zero. Take the index
image object
outside of the lens to be n 1 = 1. B
(b) Find the ABCD matrix for a thick window (thickness d ). Take the
index outside of the window to be n 1 = 1. HINT: A window is a thick
lens with infinite radii of curvature. Figure 9.24 Formation of a virtual
image by a thin lens.
P9.8 An object is placed in front of a concave mirror. Find the location of
the image d i and magnification M when d o = R, d o = R/2, d o = R/4,
and d o = −R/2 (virtual object). Make a diagram for each situation,
depicting rays traveling from a single off-axis point on the object to A
a corresponding point on the image. You may want to emphasize
especially the ray that initially travels parallel to the axis and the ray
C
that initially travels in a direction intersecting the axis at the focal point
R/2. object image B
P9.9 Perform an analysis similar to example 9.8 for the virtual image formed
by the positive lens in Fig. 9.24.
Figure 9.25 Formation of a virtual
P9.10 Perform an analysis similar to example 9.8 for the virtual image formed image by a thin lens with negative
by the negative lens in Fig. 9.25. focal length.
L9.13 Deduce the positions of the principal planes and the effective focal
Figure 9.27
length of a compound lens system. Reference the positions of the
principal planes to the outside ends of the metal hardware that encloses
the lens assembly. (video)
HINT: Obtain three sets of distances to the object and image planes
and place the data into (9.58) to create three distinct equations for the
unknowns A, B, C, and D. Find A, B, and C in terms of D and place the
results into (9.65) to obtain the values for A, B, C, and D. The effective
Figure 9.28 focal length and principal planes can then be found through (9.61)–
(9.63).
P9.14 Use a computer program to calculate the ABCD matrix for the com-
pound system shown in Fig. 9.29, known as the “Tessar lens.” The
details of this lens are as follows (all distances are in the same units,
and only the magnitude of curvatures are given—you decide the sign):
Convex-convex lens 1 (thickness 0.357, R 1 = 1.628, R 2 = 27.57, n =
1.6116) is separated by 0.189 from concave-concave lens 2 (thickness
0.081, R 1 = 3.457, R 2 = 1.582, n = 1.6053), which is separated by 0.325
from plano-concave lens 3 (thickness 0.217, R 1 = ∞, R 2 = 1.920, n =
1.5123), which is directly followed by convex-convex lens 4 (thickness
0.396, R 1 = 1.920, R 2 = 2.400, n = 1.6116).
1 2 3 4 HINT: You can reduce the number of matrices you need to multiply by
using the “thick lens” matrix.
Figure 9.29
P9.15 (a) Show that the cavity depicted in Fig. 9.16c is stable if
L L
µ ¶µ ¶
0 < 1− 1− <1
R1 R2
(b) The two concave mirrors have radii R 1 = 60 cm and R 2 = 100 cm.
Over what range of mirror separation L is it possible to form a stable
laser cavity?
HINT: There are two different stable ranges with an unstable range
between them.
P9.16 Find the stable ranges for L 1 = L 2 = L for the laser cavity depicted in
Fig. 9.16d with focal length f = 50 cm.
Exercises 255
L9.17 Experimentally determine the stability range of a HeNe laser with ad-
justable end mirrors. Check that this agrees reasonably well with theory.
Figure 9.30
Can you think of reasons for any discrepancy? (video)
Chapter 10
Diffraction
In the 1600’s, Christian Huygens developed a wave description for light. Unfor-
tunately, his ideas were largely overlooked at the time because Sir Isaac Newton
promoted a competing theory. Newton proposed that light should be thought
of as many tiny bullets, or corpuscles, as he called them. Newton’s ideas pre-
vailed for more than a century, perhaps because he was right on so many other
things, until 1807 when Thomas Young performed his famous two-slit experiment,
conclusively demonstrating the wave nature of light. Even then, Young’s conclu-
sions were accepted only gradually by others, a notable exception being a young
Frenchman named Augustin Fresnel. The two formed a close friendship through
correspondence, and it was Fresnel that followed up on Young’s conclusions and
dedicated his life to a study of light.
Fresnel’s skill as a mathematician allowed him to transform physical intuition
into powerful and concise ideas. Perhaps Fresnel’s greatest accomplishment was Christiaan Huygens (16291695,
the adaptation of Huygens’ principle of wavelet superposition into a mathematical Dutch) was born in The Hague, Nether-
lands. His father was friends with the
formula. Ironically, he used Newton’s calculus to achieve this. Huygens’ principle mathematician René Descartes, which
asserts that a wave front can be thought of as many wavelets, which propagate and probably inuenced his upbringing. Huy-
gens studied law and mathematics at
interfere to form new wave fronts. This is illustrated in Fig. 10.1. The phenomenon the University of Leiden, which preceded
of diffraction is then understood as the spilling of wavelets around obstructions a very productive career as a scientist
and mathematician. During mid career,
in the path of light.
Huygens held a position in the French
After formulating Huygens’ principle as a diffraction integral, Fresnel made Academy of Sciences in Paris for 15
an approximation to his own formula, called the Fresnel approximation, for the years, but spent the majority of his life
in The Hague. Huygens was the rst
sake of making the integration easier to perform. As far as approximations go, to advocate the wave theory of light.
the Fresnel approximation is surprisingly accurate in describing the light field He was able to explain birefringence in
terms of his wave theory together with
in the region down stream from an aperture. The diffraction pattern can evolve a refractive index that varied with direc-
in complicated ways as the distance from an aperture increases. At distances far tion. Huygens constructed a telescope
with which he discovered Saturn's moon
down stream from an aperture, the diffraction pattern acquires a final form that
Titan. He also made the rst detailed
no longer evolves, other than to grow in proportion to distance. This far-field observations of the Orion nebula. Huy-
limit is often of interest, and it turns out that the Fresnel diffraction formula can gens made signicant advancements in
clock-making technology and wrote a
be simplified further in this case. The far-field limit of the Fresnel diffraction book on probability theory. Huygens
formula is called the Fraunhofer approximation. was one of the earliest science-ction
writers and speculated that life exists on
From the modern perspective, Fresnel’s diffraction formula needs justifica- other planets in his book Cosmotheoros.
(Wikipedia)
257
258 Chapter 10 Diffraction
tion starting from Maxwell’s equation. The diffraction formula is based on scalar
diffraction theory, which ignores polarization effects. In some situations, ignor-
ing polarization is benign, but in other situations ignoring polarization effects
produces significant errors. These issues as well as the approximations leading to
scalar diffraction theory are discussed in section 10.2.
i e i kR
Ï
E (x, y, z) = − E (x 0 , y 0 , 0) d x 0d y 0 (10.1)
λ R
aperture
where q
R= (x − x 0 )2 + (y − y 0 )2 + z 2 (10.2)
Figure 10.2 is the radius of each wavelet as it individually intersects the point (x, y, z). The
factor −i /λ in front of the integral in (10.1) ensures the right phase and field
strength (not to mention units). Justification for this factor is given in section
10.3 and in appendix 10.A. To summarize, (10.1) tells us how to compute the field
1 For simplicity, we use the term ‘spherical wave’ in this book to refer to waves of the type
imagined by Huygens (i.e. of the form e i kR /R). There is a different family of waves based on
spherical harmonics that are also sometimes referred to as spherical waves. These waves have
angular as well as radial dependence, and they are solutions to Maxwell’s equations. See J. D.
Jackson, Classical Electrodynamics, 3rd ed., pp. 429–432 (New York: John Wiley, 1999).
10.1 Huygens’ Principle as Formulated by Fresnel 259
down stream given knowledge of the field in an aperture. The field at each point
(x 0 , y 0 ) in the aperture, which may vary with strength and phase, is treated as the
source for a spherical wave. The integral in (10.1) sums the contributions for all
of these wavelets.
Example 10.1
Find the on-axis2 (i.e. x, y = 0) intensity following a circular aperture of diameter `
illuminated by a uniform plane wave.
Z2π Z`/2 i k pρ 02 +z 2
i E0 0 e
E (0, 0, z) = − dφ ρ0 d ρ0
λ ρ 02 + z 2
p
0 0
p 02 2 ¯`/2 µ p
i E0 e i k ρ +z ¯¯
¶
2 2
=− 2π ¯ = −E 0 e i k (`/2) +z − e i kz
λ ik ¯
0
block. This removes the unwanted part of the previous integration and yields the
overall result. It is important to add and subtract the integrals (i.e. fields), not
their squares (i.e. intensity).
As trivial as Babinet’s principle may seem to you, it may not be obvious at
first that Babinet’s principle also applies to an infinitely wide plane wave that
is interrupted by finite obstructions. In this case, one computes the diffraction
of the blocked portions of the field as though these portions were openings in a
mask. This result is then subtracted from the plane wave (no integration needed
for the plane), as depicted in Fig. 10.5.
Mask When Fresnel first presented his diffraction formula to the French Academy
Block of Sciences, a certain judge of scientific papers named Siméon Poisson noticed
that Fresnel’s formula predicted that there should be light in the center of the
geometric shadow behind a circular obstruction. This seemed so absurd to
Figure 10.5 A block in a plane Poisson that he initially disbelieved the theory, until the spot was shortly thereafter
wave giving rise to diffraction in experimentally confirmed, much to Poisson’s chagrin. Needless to say, Fresnel’s
the geometric shadow. paper was then awarded first prize, and this spot appearing behind circular blocks
has since been known as Poisson’s spot.
Example 10.2
Find the on-axis (i.e. x, y = 0) intensity behind a circular block of diameter ` placed
in a uniform plane wave.
Solution:
³ Fromp Example ´ 10.1, the on-axis field behind a circular aperture is
i kz i k (`/2)2 +z 2
E0 e − e . Babinet’s principle says to subtract this result from a plane
wave to obtain the field behind the circular block. The situation is depicted in
Fig. 10.5 (side view). The on-axis field is then
µ p ¶ p
2 2 2 2
E (0, 0, z) = E 0 e i kz − E 0 e i kz − e i k (`/2) +z = E 0 e i k (`/2) +z
This result says that, in the exact center of the shadow behind a circular obstruction,
the intensity is the same as the illuminating plane wave for all distance z. A spot of
light in the center forms right away; no wonder Poisson was astonished!
where k ≡ nω/c is the magnitude of the usual wave vector (see also (9.2)). Equa-
tion (10.4) is called the Helmholtz equation. Again, it is merely the wave equation
written for the case of a single frequency, where the trivial time dependence has
been removed. To obtain the full wave solution, just append the factor e −i ωt to
the solution of the Helmholtz equation E (r).
At this point we take an egregious step: We ignore the vectorial nature of E(r)
and write (10.4) using only the magnitude E (r). When using scalar diffraction
theory, we must keep in mind that it is based on this serious step. Under the
scalar approximation, the vector Helmholtz equation (10.4) becomes the scalar
Helmholtz equation:
∇2 E (r) + k 2 E (r) = 0 (10.5)
This equation of course is consistent with (10.4) in the case of a plane wave.
However, we are interested in spherical waves of the form E (r ) = E 0 r 0 e i kr /r . It
turns out that such spherical waves are exact solutions to the scalar Helmholtz
equation (10.5). The proof is left as an exercise (see P10.3). Nevertheless, spherical Francois Jean Dominique Arago
waves of this form only approximately satisfy the vector Helmholtz equation (10.4). (1786-1853, French) was born in Cata-
lan France, where his father was the
We can get away with this sleight of hand if the radius r is large compared to a Treasurer of the Mint. As a teenager,
wavelength (i.e., kr À 1) and if we restrict r to a narrow range perpendicular to Arago was sent to a municipal college
in Perpignan where he developed a
the polarization. deep interest in mathematics. In 1803,
he entered the Ecole Polytechnique in
Paris, where he purportedly was dis-
Significance of the Scalar Wave Approximation appointed that he was not presented
with new knowledge at a higher rate.
He associated with famous French
The solution of the scalar Helmholtz equation is not completely unassociated with
mathematicians Siméon Poisson and
the solution to the vector Helmholtz equation. In fact, if E scalar (r) obeys the scalar Pierre-Simmon Laplace. He later worked
Helmholtz equation (10.5), then with Jean-Baptiste Biot to measure
the meridian arch to determine the
E (r) = r × ∇E scalar (r) (10.6) exact length of the meter. This work
took him to the Balearic Islands, Spain,
where he was imprisoned as a spy, be-
obeys the vector Helmholtz equation (10.4).
ing suspected because of lighting res
Consider a spherical wave, which is a solution to the scalar Helmholtz equation: atop a mountain as part of his survey-
ing eorts. After a heroic prison escape
i e i kr
µ ¶
E (r) = −φ̂r 0 E 0 1 − sin θ (10.9)
kr r
This field looks approximately like the scalar spherical wave solution (10.7) in the
limit of large r if the angle is chosen to lie near θ ∼
= π/2 (spherical coordinates).
262 Chapter 10 Diffraction
Since our use of the scalar Helmholtz equation is in connection with this spherical
wave under these conditions, the results are close to those obtained from the
vector Helmholtz equation.
Fresnel developed his diffraction formula (10.1) a half century before Maxwell
assembled the equations of electromagnetic theory. In 1887, Gustav Kirchhoff
demonstrated that Fresnel’s diffraction formula satisfies the scalar Helmholtz
equation. In doing this he clearly showed the approximations implicit in the
theory, and made a slight revision to the formula:
Figure 10.6 The factor in square brackets, Kirchhoff’s revision, is known as the obliquity factor.
Here, cos(R, ẑ) indicates the cosine of the angle between R and ẑ. Notice that this
factor is approximately equal to one when the point (x, y, z) is chosen to be in
the forward direction; we usually study diffraction under this circumstance. On
the other hand, the obliquity factor equals zero for fields traveling in the reverse
direction (i.e. in the −ẑ direction). This fixes a problem with Fresnel’s version of
the formula (10.1) based on Huygens’ wavelets, which suggested that light could
as easily diffract in the reverse direction as in the forward direction
In honor of Kirchhoff’s work, (10.10) is referred to as the Fresnel-Kirchhoff
diffraction formula. The details of Kirchhoff’s more rigorous derivation, including
how the factor −i /λ naturally arises, are given in Appendix 10.A. Since the Fresnel-
Kirchhoff formula can be understood as a superposition of spherical waves, it is
not surprising that it satisfies the scalar Helmholtz equation (10.5).
constant across the aperture and if the obliquity factor (1 + cos (r, ẑ))/2 is approxi-
mated as one (i.e. forward direction).
Fresnel introduced an approximation3 to his diffraction formula that makes
the integration somewhat easier to perform. The approximation is analogous
to the paraxial approximation made for rays in chapter 9. Thus, the Fresnel
approximation requires the avoidance of large angles with respect to the z-axis
Besides letting the obliquity factor be one, Fresnel approximated R by the
distance z in the denominator of (10.10) . Then the denominator can be brought
out in front of the integral since it no longer depends on x 0 and y 0 . This is valid to
the extent that we restrict ourselves to small angles:
R∼
=z (denominator only; Fresnel approximation) (10.11)
3 J. W. Goodman, Introduction to Fourier Optics, Sect. 4-1 (New York: McGraw-Hill, 1968).
10.3 Fresnel Approximation 263
Example 10.3
Compute the Fresnel diffraction field following a rectangular aperture (dimensions
∆x by ∆y) illuminated by a uniform plane wave.
∆x/2 ∆y/2
e i kz i k (x 2 +y 2 )
Z Z
k 02 ky
0 i 2z x −i kx
z x
0 k 02 y0
d y 0 e i 2z y e −i
¡ ¢
E x, y, z = −i E 0 e 2z dx e e z
λz
−∆x/2 −∆y/2
∂2 Ẽ ∂2 Ẽ ∂Ẽ ∂2 Ẽ
+ + 2i k + =0 (10.14)
∂x 2 ∂y 2 ∂z ∂z 2
264 Chapter 10 Diffraction
2
At this point we make the paraxial wave approximation,4 which is |2k ∂∂zẼ | À | ∂∂zẼ2 |.
That is, we assume that the amplitude of the field varies slowly in the z-direction
such that the wave looks much like a plane wave. We permit the amplitude to
change as the wave propagates in the z-direction as long as it does so on a scale
much longer than a wavelength. This leads to the paraxial wave equation:
∂2 ∂2 ∂
µ ¶
+ + 2i k Ẽ (x, y, z) ∼
=0 (paraxial wave equation) (10.15)
∂x 2 ∂y 2 ∂z
It turns out that the Fresnel approximation (10.13) is an exact solution to the
paraxial wave equation. As demonstrated in problem P10.5, (10.15) is satisfied by
Ï∞ h i
i 2
i k (x−x 0 ) +( y−y 0 )
2
Ẽ (x, y, z) ∼
=− Ẽ (x 0 , y 0 , 0)e 2z d x0d y 0 (10.16)
λz
−∞
By removing the factor (10.17) from (10.13), we obtain the Fraunhofer diffrac-
tion formula:
k 2 2
i e i kz e i 2z (x +y )
Ï
k 0 0
∼ E x 0 , y 0 , 0 e −i z (xx +y y ) d x 0 d y 0
¡ ¢ ¡ ¢
E x, y, z = − (10.19) (Fraunhofer approximation)
λz
aperture
k 02 02
Obviously, the removal of e i 2z (x +y ) from the integrand improves our chances of
being able to perform the integration. Notice that the integral can now be inter-
preted as a two-dimensional Fourier transform on the aperture field E x 0 , y 0 , 0 .
¡ ¢
Example 10.4
Compute the Fraunhofer diffraction pattern following a rectangular aperture (di-
mensions ∆x by ∆y) illuminated by a uniform plane wave.
It is left as an exercise (see P10.8) to perform the integration and compute the
Figure 10.8 Fraunhofer diffraction
intensity. The result turns out to be
pattern (field amplitude) gener-
∆x 2 ∆y 2 2 π∆x 2 π∆y
ated by a uniformly illuminated
µ ¶ µ ¶
¡ ¢
I x, y, z = I 0 sinc x sinc y (10.20) rectangular aperture with a height
λ2 z 2 λz λz
twice the width.
where sincξ ≡ sin ξ/ξ. Note that lim sincξ = 1.
ξ→0
E (x 0 , y 0 , z = 0) = E (ρ 0 , z = 0) (10.21)
We are able to perform the integration over φ0 with the help of the formula (0.57):
Z2π
kρρ 0
µ ¶
kρρ 0
−i z cos(φ−φ0 ) 0
e d φ = 2πJ 0 (10.26)
500/k z
0
z = 200/k J 0 is called the zero-order Bessel function. Equation (10.25) then reduces to
kρ 2
2πi e i kz e i kρρ 0
µ ¶
¡ 0 ¢ i kρ02
Z
2z
0 0
E ρ, z = − ρ d ρ E ρ , 0 e 2z J 0
¡ ¢
λz z
aperture
(Fresnel approximation with cylindrical symmetry) (10.27)
¢ kρ02
The integral in (10.27) is called a Hankel transform on E ρ 0 , 0 e i 2z .
¡
500/k In the case of the Fraunhofer approximation, the diffraction ³ integral becomes
kρ 02
´
a Hankel transform on just the field E ρ , z = 0 since exp i 2z goes to one.
¡ 0 ¢
z = 1000/k
Under cylindrical symmetry, the Fraunhofer approximation is
kρ 2
2πi e i kz e i kρρ 0
Z µ ¶
2z
0 0
¡ 0 ¢
E ρ, z = − ρ d ρ E ρ , 0 J0
¡ ¢
λz z
aperture
(Fraunhofer approximation with cylindrical symmetry) (10.28)
Just as fast Fourier transform algorithms aid in the numerical evaluation of diffrac-
500/k tion integrals in Cartesian coordinates, fast Hankel transforms also exist and can
be used with cylindrically symmetric diffraction integrals.
Figure 10.9 Field amplitude fol-
lowing a circular aperture com-
puted in the Fresnel approxima- Example 10.5
tion. Compute the Fresnel and Fraunhofer diffraction patterns following a circular
aperture (diameter `) illuminated by a uniform plane wave.
kρ 2 Z`/2
2πe i kz e i kρρ 0
µ ¶
2z
E ρ, z = −i E 0 ρ0d ρ0 J0
¡ ¢
λz z
0
∂V ∂U
I · ¸ Z
U ∇2V − V ∇2U d v
£ ¤
U −V da = (10.30)
∂n ∂n
S V
The notation ∂/∂n implies a derivative in the direction normal to the surface. We
choose the following functions:
V ≡ e i kr /r
(10.31)
U ≡ E (r)
where E (r) is assumed to satisfy the scalar Helmholtz equation, (10.5). When
these functions are used in Green’s theorem (10.30), we obtain
I " # Z " #
∂ e i kr e i kr ∂E 2e
i kr
e i kr 2
E − da = E∇ − ∇ E dv (10.32)
∂n r r ∂n r r
S V
e i kr e i kr 2 e i kr e i kr 2
E ∇2 − ∇ E = −k 2 E + k E =0 (10.33)
r r r r
6 See J. W. Goodman, Introduction to Fourier Optics, Sect. 3-3 (New York: McGraw-Hill, 1968).
7 We exclude the point r = 0; see P0.4 and P0.5.
268 Chapter 10 Diffraction
where we have taken advantage of the fact that E (r) and e i kr /r both satisfy (10.5).
This is exactly the reason for our judicious choices of the functions V and U since
with them we were able to make half of (10.30) disappear. We are left with
I " #
∂ e i kr e i kr ∂E
E − da = 0 (10.34)
∂n r r ∂n
S
Now consider a volume between a small sphere of radius ² at the origin and an
outer surface of whatever shape. The total surface that encloses the volume is
comprised of two parts (i.e. S = S 1 + S 2 as depicted in Fig. 10.11).
When we apply (10.34) to the surface in Fig. 10.11, we have
I " # I " #
∂ e i kr e i kr ∂E ∂ e i kr e i kr ∂E
E − da = − E − da (10.35)
∂n r r ∂n ∂n r r ∂n
S2 S1
Our motivation for choosing this geometry with multiple surfaces is that eventu-
ally we want to find the field at the origin (inside the little sphere) from knowledge
of the field on the outside surface. To this end, we assume that ² is so small that
E (r) is approximately the same everywhere on the surface S 1 . Then the integral
over S 1 becomes
Figure 10.11 A two-part surface
enclosing volume V . I " # Z2π Zπ " Ã ! #
∂ e i kr e i kr ∂E ∂ e i kr ∂r e i kr ∂E ∂r
µ ¶
E − d a = lim dφ E − r 2 sin φd φ
∂n r r ∂n r =²→0 ∂r r ∂n r ∂r ∂n
S1 0 0
(10.36)
where we have used spherical coordinates. Notice that we have employed the
chain rule to execute the normal derivative ∂/∂n. Since r always points opposite
to the direction of the surface normal n̂, the normal derivative ∂r /∂n is always
equal to −1.8 We can now perform the integration in (10.36) as well as take the
limit as ² → 0 to obtain
I " # " Ã ! ¶#
∂ e i kr e i kr ∂E 2 e i kr e i kr 2e
i kr µ
∂E
lim E − d a = −4π lim r − 2 + i k E −r
²→0 ∂n r r ∂n ²→0 r r r ∂r
S1 r =²
∂E
·³ ´ µ ¶ ¸
= −4π lim −e i k² + i k²e i k² E − e i k² ²
²→0 ∂r r =²
= 4πE (0)
(10.37)
With the aid of (10.37), Green’s theorem applied to our specific geometry
reduces to I " i kr #
1 e ∂E ∂ e i kr
E (0) = −E da (10.38)
4π r ∂n ∂n r
S2
8 From the definition of the normal derivative we have ∂r /∂n ≡ ∇r · n̂ = −n̂ · n̂ = −1.
10.A Fresnel-Kirchhoff Diffraction Formula 269
We have essentially arrived at the result that we are seeking. The field coming Figure 10.12 Surface S 2 depicted
through the aperture is integrated to find the field at the origin, which is located as a mask and a large hemisphere.
beyond the aperture. Let us manipulate the formula a little further. The second
term in the integral of (10.39) can be rewritten as follows:
à !
∂ e i kr ∂ e i kr ∂r ik i ke i kr
µ ¶
1 i kr
= = − 2 e cos (r, n̂) → cos (r, n̂) (10.40)
∂n r ∂r r ∂n r r r Àλ r
where ∂r /∂n = cos (r, n̂) indicates the cosine of the angle between r and n̂. We
have also assumed that the distance r is much larger than a wavelength in order
to drop a term. Next, we assume that the field illuminating the aperture can be
written as E ∼
= Ẽ x, y e i kz . This represents a plane-wave field traveling through
¡ ¢
∂E ∂E ∂z
= i k Ẽ x, y e i kz (−1) = −i kE
¡ ¢
= (10.41)
∂n ∂z ∂n
Substituting (10.40) and (10.41) into (10.39) yields
i e i kr
· ¸
1 + cos (r, n̂)
Ï
E (0) = − E da (10.42)
λ r 2
aperture
Finally, we wish to rearrange our coordinate system to that depicted in Fig. 10.2.
In our derivation, it was less cumbersome to place the origin at a point after the
9 Later Sommerfeld noticed that these two assumptions actually contradict each other, and he
revised Kirchhoff’s work to be more accurate. In practice this revision makes only a tiny difference
as light spills onto the back of the aperture, over a length scale of a wavelength. We will ignore
this effect and go with Kirchhoff’s (slightly flawed) assumption. For further discussion see J. W.
Goodman, Introduction to Fourier Optics, Sect. 3-4 (New York: McGraw-Hill, 1968).
270 Chapter 10 Diffraction
where q
¢2
(x − x 0 )2 + y − y 0 + d 2
¡
R= (10.44)
Equation (10.10) is the same as (10.42) after applying a coordinate transformation.
It is called the Fresnel-Kirchhoff diffraction formula and it agrees with (10.1)
except for the obliquity factor [1 + cos (r, ẑ)]/2.
The unit vector n̂ always points normal to the surface of volume V over which
the integral is taken. Let the vector function f be U ∇V , where U and V are both
analytical functions of the position coordinate r. Then (10.45) becomes
I Z
(U ∇V ) · n̂ d a = ∇ · (U ∇V ) d v (10.46)
S V
∂V
∇V · n̂ ≡ (10.47)
∂n
The argument of the integral on the right-hand side of (10.46) can be expanded
with the chain rule:
∇ · (U ∇V ) = ∇U · ∇V +U ∇2V (10.48)
With these substitutions, (10.46) becomes
∂V
I Z
∇U · ∇V +U ∇2V d v
£ ¤
U da = (10.49)
∂n
S V
Actually, so far we haven’t done much. Equation (10.49) is nothing more than the
divergence theorem applied to the vector function U ∇V . Similarly, we can apply
the divergence theorem to an alternative vector function given by the reverse
10.B Green’s Theorem 271
∂U
I Z
∇V · ∇U + V ∇2U d v
£ ¤
V da = (10.50)
∂n
S V
We subtract (10.50) from (10.49), and this leads to (10.30) known as Green’s theo-
rem.
272 Chapter 10 Diffraction
Exercises
A
f (r ) = cos (kr − ωt )
r
is a solution to the wave equation in spherical coordinates with only
radial dependence,
1 ∂ 2 ∂f 1 ∂2 f
µ ¶
r =
r 2 ∂r ∂r v 2 ∂t 2
A sin φ
· ¸
1
E(r, φ) = cos (kr − ωt ) − sin (kr − ωt ) φ̂
r kr
Exercises 273
1 ∂2 ¡ ¢ ∂ ∂ψ ∂2 ψ
µ ¶
2 1 1
∇ ψ= r ψ + sin θ +
r ∂r 2 r 2 sin θ ∂θ ∂θ r 2 sin2 θ ∂φ2
P10.5 Check that (10.16) is the solution to the paraxial wave equation (10.15).
P10.6 (a) Repeat Example 10.1 to find the on-axis intensity after a circular
aperture in both the Fresnel and Fraunhofer approximations. (HINT:
Use (10.27) and (10.28) to obtain the fields ρ = 0.) Also make suitable
approximations directly to (10.3) to obtain the same answers.
(b) Check how well the Fresnel and Fraunhofer approximations work
by graphing the three curves (i.e. (10.3) and the curves obtained in part
(a)) on a single plot as a function of z. Take ` = 10 µm and λ = 500 nm.
To see the result better, use a log scale on the z-axis.
4
3 Fresnel
Fraunhofer
2
1 Fresnel-Kirchoff
Figure 10.14 “The Fraunhofer Ap-
proximation” by Sterling Cornaby
0
-3 -2 -1 0
10 10 10 10
z (mm)
Figure 10.15
274 Chapter 10 Diffraction
L10.7 (a) Why does the on-axis intensity behind a circular opening fluctuate
(see Example 10.1) whereas the on-axis intensity behind a circular
obstruction remains constant (see Example 10.2)?
(b) Create a collimated laser beam several centimeters wide. Observe
the on-axis intensity on a movable screen (e.g. a hand-held card) be-
hind a small circular aperture and behind a small circular obstruction
placed in the beam. (video)
(c) In the case of the circular aperture, measure the distance to several
on-axis minima and check that it agrees with prediction. (See problem
P10.6.)
Laser
Figure 10.16
P10.8 Calculate the Fraunhofer diffraction field and intensity patterns for a
rectangular aperture (dimensions ∆x by ∆y) illuminated by a plane
wave E 0 . In other words, derive (10.20).
P10.9 A single narrow slit has a mask placed over it so the aperture function
is not a square pulse but rather a cosine: E (x 0 , y 0 , 0) = E 0 cos(x 0 /L) for
−L/2 < x 0 < L/2 and E (x 0 , y 0 , 0) = 0 otherwise. Calculate the far-field
(Fraunhofer) diffraction pattern. Make a plot of intensity as a function
of xkL/2z; qualitatively compare the pattern to that of a regular single
slit.
P10.10 Calculate the Fraunhofer diffraction intensity pattern (10.29) for a cir-
cular aperture (diameter `) illuminated by a plane wave E 0 .
Chapter 11
Diffraction Applications
275
276 Chapter 11 Diffraction Applications
with a 1 cm radius (not necessarily circular) is used with visible light, the light
must travel more than a kilometer in order to reach the Fraunhofer limit. It
may therefore seem unlikely to reach the Fraunhofer limit in a typical optical
Figure 11.1 Diffraction in the far system, especially if the aperture or beam size is relatively large. Nevertheless,
field. spectrometers, which typically utilize diffraction gratings many centimeters wide,
depend on achieving the Fraunhofer limit within the confines of a manageable
instrument box. This is accomplished using imaging techniques. The Fraunhofer
limit is also important to the performance of other optical instruments that use
lenses (e.g. a telescope).
Consider a lens with focal length f placed in the path of light following an
aperture (see Fig. 11.2). Let the lens be placed an arbitrary distance L after the
aperture. The lens produces an image of the Fraunhofer pattern at a new location
d i following the lens according to the imaging formula (see (9.55))
1 1 1
= + . (11.2)
f − (z − L) d i
Keep in mind that the lens interrupts the light before the Fraunhofer pattern
has a chance to form. This means that the Fraunhofer diffraction pattern may
Figure 11.2 Imaging of the Fraunhofer diffraction pattern to the focus of a lens.
11.1 Fraunhofer Diffraction Through a Lens 277
di ∼
= f. (11.3)
Thus, a lens makes it very convenient to observe the Fraunhofer diffraction pat-
tern even from relatively large apertures. It is not necessary to let the light propa-
gate for kilometers. We need only observe the pattern at the focus of the lens as
shown in Fig. 11.2. Notice that the spacing L between the aperture and the lens is
unimportant to this conclusion.
Even though we know that the Fraunhofer diffraction pattern occurs at the
focus of a lens, the question remains as to the size of the image. To find the answer,
let us examine the magnification (9.56), which is given by
di
M =− (11.4)
− (z − L)
Taking the limit of very large z and employing (11.3), the magnification becomes
f
M→ (11.5)
z
This is a remarkable result. When the lens is inserted, the size of the diffraction
pattern decreases by the ratio of the lens focal length f to the original distance
z to a far-away screen. Since in the Fraunhofer regime the diffraction pattern is
proportional to distance (i.e. si ze ∝ z), the image at the focus of the lens scales
in proportion to the focal length (i.e. si ze ∝ f ). This means that the angular
width of the pattern is preserved! With the lens in place, we can rewrite (11.1)
straightaway as
¯ ¯2
¯ ¯
¢ 1 ¯ 1
Ï
¡ 0 0 ¢ −i k (xx 0 +y y 0 ) 0 0 ¯
∼
¡
I x, y, L + f = c²0 ¯¯ E x , y ,0 e f d x d y ¯¯ (11.6)
2 ¯λf ¯
aperture
which describes the intensity distribution pattern at the focus of the lens.
Although (11.6) correctly describes the intensity, we cannot easily write the
electric field since the imaging techniques that we have used do not render the
phase information. To obtain an expression for the field, it will be necessary to
employ the Fresnel diffraction formula. In addition, we need to know how a lens
adjusts the phase fronts of the light passing through it.
Consider a monochromatic light field that goes through a thin lens with focal
length f . In traversing the lens, the wavefront undergoes a phase shift that varies
across the lens. We will reference the phase shift to that experienced by the light
that goes through the center of the lens. In the Fig. 11.3, R 1 is a positive radius
278 Chapter 11 Diffraction Applications
∆φ = −k (n − 1) (`1 + `2 ) . (11.7)
The negative sign indicates a phase advance (i.e. same sign as −ωt ). Since the
Figure 11.3 A thin lens, which
off-axis light travels through less material, the phase of the wave front gets ahead
modifies the phase of a field pass-
of the light traveling through the center of the lens. In (11.7), k represents the wave
ing through.
number in vacuum (i.e. 2π/λvac ); since `1 and `2 correspond to distances outside
of the lens material.
We can find expressions for `1 and `2 from the equations describing the spherical
surfaces of the lens:
(R 1 − `1 )2 + x 2 + y 2 = R 12
(11.8)
(R 2 + `2 )2 + x 2 + y 2 = R 22
In the Fresnel approximation, which takes place in the paraxial limit, it is appro-
priate to neglect the terms `21 and `22 in comparison with the other terms present.
Within this approximation, equations (11.8) become
x2 + y 2 x2 + y 2
`1 ∼
= and `2 ∼
=− (11.9)
2R 1 2R 2
x2 + y 2
¶¡ ¢
k ¡ 2
µ
1 1
∆φ = −k (n − 1) x + y2
¢
− =− (11.10)
R1 R2 2 2f
where the focal length of a thin lens f has been introduced according to the lens-
maker’s formula (9.46).
In summary, the light traversing a lens experiences a relative phase shift given by
¢ −i k x 2 +y 2 )
E x, y, z after lens = E x, y, z before lens e 2 f (
¡ ¢ ¡
(11.11)
Starting from the known field E x 0 , y 0 , 0 at the aperture, we compute the field
¡ ¢
i 2kf (x 2 +y 2 )
eik f e
Ï h
−i k x 002 +y 002 )
E (x 00 , y 00 , L)e 2 f (
¡ ¢ i
E x, y, L + f = −i
λf
i 2kf (x 002 +y 002 ) −i kf (xx 00 +y y 00 )
×e e d x 00 d y 00 (11.13)
As you can probably appreciate, the injection of (11.12) into (11.13) makes a
rather long formula involving four dimensions of integration. Nevertheless, two
of the integrals can be performed in advance of choosing the aperture (i.e. those
over x 00 and y 00 ). This is accomplished with the help of the integral formula (0.55)
(even though in this instance the real part of a is zero). After this cumbersome
work, (11.13) reduces to
Notice that at least the integration portion of this formula looks exactly like
the Fraunhofer diffraction formula! This happened even though in the preceding
discussion we did not at any time specifically make the Fraunhofer approximation.
The result (11.14) implies the intensity distribution (11.6) as anticipated. However,
the phase of the field is also revealed in (11.14).
In general, the field caries a wave front curvature as it passes through the
focal plane of the lens. In the special case L = f , the diffraction formula takes a
particularly simple form:
e 2i k f
Ï
−i kf (xx 0 +y y 0 )
E (x 0 , y 0 , L + f )¯L= f = −i E (x 0 , y 0 , 0)e d x 0d y 0
¯
(11.15)
λf
When the lens is placed at this special distance following the aperture, the Fraun-
hofer diffraction pattern viewed at the focus of the lens carries a flat wave front.
where f , the focal length of the lens, takes the place of z in the diffraction formula.
-1 The parameter ` is its diameter of the lens. This intensity pattern contains the
first order Bessel function J 1 , which behaves somewhat like a sine wave as seen in
1 Fig. 11.7. The main differences are that the zero crossings are not exactly periodic
and the function slowly diminishes with larger arguments. The first zero crossing
(after x = 0) occurs at 1.22π.
The intensity pattern described by (11.16) contains the factor 2J 1 (ξ)/ξ, where
ξ represents the combination k`ρ/2 f . As noticed in Fig. 11.7, J 1 (ξ) goes to zero
at ξ = 0. Thus, we have a zero-divided-by-zero situation when evaluating 2J 1 (ξ)/ξ
at the origin. This is similar to the sinc function (i.e. sin (ξ)/ξ), which approaches
one at the origin. In fact, 2J 1 (ξ)/ξ is sometimes called the jinc function because it
0 also approaches one at the origin. The square of the jinc is shown in Fig. 11.7b.
0 2 3 4
This curve is proportional to the intensity described in (11.16). This pattern is
sometimes called an Airy pattern after Sir George Biddell Airy (English, 1801–1892)
1 In the thin-lens approximation, the ray from either star that traverses the center of the lens (i.e.
Figure 11.7 (a) First-order Bessel
function. (b) Square of the Jinc y = 0) maintains its angle:
function. ·
0
¸ ·
1 0
¸·
0
¸ ·
0
¸
= =
θ2 −1/ f 1 θ1 θ1
11.2 Resolution of a Telescope 281
who first described the pattern. As can be seen in Fig. 11.7b, the intensity quickly
drops at larger radii.
We now return to the question of whether the images of two nearby stars
as depicted in Fig. 11.6 can be distinguished. Since the peak in Fig. 11.7b is the
dominant feature in the diffraction pattern, we will say that the two stars are
resolved if the angle between them is enough to keep their respective diffraction
peaks from seriously overlapping. We will adopt the criterion suggested by Lord
Rayleigh that the peaks are distinguishable if the peak of one pattern is no closer
than the first zero to the other peak. This situation is shown in Fig. 11.8.
The angle that corresponds to this separation of diffraction patterns is found
by setting the argument of (11.16) equal to 1.22π, the location of the first zero:
k`ρ
= 1.22π (11.17)
2f
ρ 1.22λ
θmin ∼
= = (11.18)
f `
Here we have associated the ratio ρ/ f (i.e. the radius of the diffraction pattern
compared to the distance from the lens) with an angle θmin . The Rayleigh criterion
requires that the diffraction patterns be separated by at least this angle before we
say that they are resolved. Figure 11.8 The Rayleigh criterion
θmin depends on the diameter of the lens ` as well as on the wavelength of the for a circular aperture.
light. Since the angle between the images and the angle between the objects is
the same, θmin tells the minimum angle between objects that can be resolved with
a given instrument. This analysis assumes that the light from the two objects is
incoherent, meaning the intensities in the image plane add; interferences between
the two fields fluctuate rapidly in time and average away.
Example 11.1
What minimum telescope diameter is required to distinguish a Jupiter-like planet
(orbital radius 8 × 108 km) from its star if they are 10 light-years away?
This seems like a piece of cake; a telescope with a diameter bigger than 7cm will do
the trick. However, the vastly unequal brightness of the star and the planet is the
real technical challenge. The faint diffraction rings in the star’s diffraction pattern
completely swamp the faint signal from the planet.
282 Chapter 11 Diffraction Applications
E aperture (x 0 − x n0 , y 0 − y n0 , 0), where the offset in the arguments shifts the location of
Figure 11.9 Array of identical aper- the aperture. The field comprising all of the identical apertures is
tures. N
E x 0, y 0, 0 = E aperture (x 0 − x n0 , y 0 − y n0 , 0)
¡ ¢ X
(11.19)
n=1
We next compute the Fraunhofer diffraction pattern for the above field. Upon
inserting (11.19) into the Fraunhofer diffraction formula (10.19) we obtain
k 2 +y 2 ∞ Z∞
e i kz e i 2z (x ) X
N Z k 0 0
0
d y 0 E aperture x 0 − x n0 , y 0 − y n0 , 0 e −i z (xx +y y )
¡ ¢ ¡ ¢
E x, y, z = −i dx
λz n=1
−∞ −∞
(11.20)
where we have taken the summation out in front of the integral. We have also
integrated over the entire (infinitely wide) mask since E aperture is nonzero only
inside each aperture.
Even without yet choosing the shape of the identical apertures, we can make
some progress on (11.20) with the change of variables x 00 ≡ x 0 −x n0 and y 00 ≡ y 0 − y n0 :
k 2
+y 2 ) X Z∞ Z∞
e i kz e i 2z (x N
00
d y 00 E aperture x 00 , y 00 , 0
¡ ¢ ¡ ¢
E x, y, z = −i dx
λz n=1
−∞ −∞
k 00
+x n0 )+y ( y 00 +y n0 )]
× e −i z [x (x
(11.21)
Next we pull the factor exp {−i kz (xx n0 + y y n0 )} out in front of the integral to arrive
at our final result:
" #
N
−i kz (xx n0 +y y n0 )
¡ ¢ X
E x, y, z = e
n=1
Z∞ Z∞
k 2
+y 2 )
e i kz e i 2z (x k 0 0
d x0 d y 0 E aperture x 0 , y 0 , 0 e −i (xx +y y )
¡ ¢
× −i z
λz
−∞ −∞
(11.22)
11.3 The Array Theorem 283
For the sake of elegance, we have traded back x 0 for x 00 and y 0 for y 00 as the
variables of integration. Equation (11.22) is known as the array theorem.2 Note
that the second factor in brackets is exactly the Fraunhofer diffraction pattern
from a single aperture centered on x 0 = 0 and y 0 = 0. When more than one
identical aperture is present, we only need to evaluate the Fraunhofer diffraction
formula for a single aperture. Then, the single-aperture result is multiplied by the
summation in front, which entirely contains the information about the placement
of the (many) identical apertures.
Example 11.2
Calculate the Fraunhofer diffraction pattern for two identical circular apertures
with diameter ` whose centers are separated by a spacing h.
Z∞ Z∞
E aperture x 0 − x n0 , y 0 − x n0 , 0 = 0
d y 0 δ x 00 − x n0 δ y 00 − x n0 E aperture x 0 − x 00 , y 0 − y 00 , 0
¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢
dx
−∞ −∞
The integral in (11.20) therefore may be viewed as a 2-D Fourier transform of a convolution, where
kx/z and k y/z play the role of spatial frequencies. The convolution theorem (see P0.26) indicates
that this is the same as the product of Fourier transforms. The 2-D Fourier transform for the delta
function (times 2π) is
Z∞ Z∞
k 00 00 k 0 0
d x 00 d y 00 δ x 00 − x n0 δ y 00 − y n0 e −i z (xx +y y ) = e −i z (xxn +y y n )
¡ ¢ ¡ ¢
−∞ −∞
The array theorem (11.22) exhibits this factor. It multiplies the single-slit Fraunhofer diffraction
integral, which is the Fourier transform of the other function.
284 Chapter 11 Diffraction Applications
(11.26)
e −i
khx
2z
N
− ei
khx
2z
N sin N khx
2z
= khx khx
= ³ ´
e −i 2z − ei 2z sin khx
2z
By combining (11.23) and (11.26) we obtain the full Fraunhofer diffraction pattern
for a diffraction grating. The expression for the field is
³ ´
khx "
¢ sin N 2z
¶#
∆x∆ye i kz i k (x 2 +y 2 ) π∆x π∆y
µ ¶ µ
¡
E x, y, z = ´ −i E 0 e 2z sinc x sinc y
λz λz λz
³
sin khx
2z
(11.27)
Now let us suppose that the slits are really tall (parallel to the y-dimension)
such that ∆y À λ. If the slits are infinitely tall, the final sinc function in Eq. (11.27)
can be approximated as one. 3 The intensity pattern in the horizontal direction
3 This is mostly the right idea, but is still a bit of a fake. In fact, the field often does not have a
uniform phase along the entire slit in the y-dimension, so our use of the function sinc π∆y/λz y
£¡ ¢ ¤
was inappropriate to begin with. The energy in a real spectrometer is usually spread out in a diffuse
pattern in the y-dimension. However, its form in y is of little relevance; the spectral information is
carried in the x-dimension only.
11.5 Spectrometers 285
can then be written in terms of the peak intensity of the diffraction pattern on the
screen: ³ ´
¶ sin2 N πhx
π∆x
µ
λz
I (x) = I peak sinc2 x (11.28)
λz
³ ´
N 2 sin πhx
2
λz
N=2
sin N α
Note that lim = N so we have placed N 2 in the denominator when intro-
α→0 sin α
ducing our definition of I peak , which represents the intensity on the screen at
x = 0. In principle, the intensity I peak is a function of y and depends on the exact
details of how the slits are illuminated as a function of y, but this is usually not of
interest as long as we stay with a given value of y as we scan along x.
It is left as an exercise to study the functional form of (11.28), especially how N=5
the number of slits N influences the behavior. The case of N = 2 describes
the diffraction pattern for a Young’s double slit experiment. We now have a
description of the Young’s two-slit pattern in the case that the slits have finite
openings of width ∆x rather than infinitely narrow ones.
A final note: You may wonder why we are interested in Fraunhofer diffraction
from a grating. The reason is that we are actually interested in separating different
wavelengths by observing their distinct diffraction patterns separated in space. In N = 10
order to achieve good spatial separation between light of different wavelengths,
it is necessary to allow the light to propagate a far distance. Optimal separation
(the maximum possible) occurs therefore in the Fraunhofer regime.
11.5 Spectrometers
The formula (11.28) can be exploited to make wavelength measurements. This N = 100
forms the basis of a diffraction grating spectrometer. A spectrometer has relatively
poor resolving power compared to a Fabry-Perot interferometer. Nevertheless, a
spectrometer is not hampered by the serious limitation imposed by free spectral
range. A spectrometer is able to measure a wide range of wavelengths simulta-
neously. The Fabry-Perot interferometer and the grating spectrometer in this
sense are complementary, the one being able to make very precise measurements -4 -2 0 2 4
within a narrow wavelength range and the other being able to characterize wide
ranges of wavelengths simultaneously.
To appreciate how a spectrometer works, consider Fraunhofer diffraction Figure 11.12 Diffraction through
from a grating, as described by (11.28). The structure of the diffraction pattern various numbers of slits, each
with ∆x = h/2 (slit widths half
has various peaks. For example, Fig. 11.12a shows the diffraction peaks from a
the separation). The dotted line
Young’s double slit (i.e. N = 2). The diffraction pattern is comprised of the typical shows the single slit diffraction
Young’s double-slit
³ pattern
´ multiplied
³ ´ by³ the´diffraction pattern of a single slit. pattern. (a) Diffraction from a
(Note that sin2 2 πhx
λz /4sin
2 πhx
λz = cos
2 πhx
λz .)
double slit. (b) Diffraction from 5
As the number of slits N is increased, the peaks seen in the Young’s double-slit slits. (c) Diffraction from 10 slits.
(d) Diffraction from 100 slits.
pattern tend to sharpen with additional smaller peaks appearing in between.
Figure 11.12b shows the case for N = 5. The more significant peaks occur when
sin(πhx/λz) in the denominator of (11.28) goes to zero. Keep in mind that the
286 Chapter 11 Diffraction Applications
As mentioned, the main diffraction peaks occur when the denominator of (11.28)
goes to zero, i.e.
πhx
= mπ (11.29)
λz
The numerator of (11.28) goes to zero at these same locations (i.e. N πhx/λz =
N mπ), so the peaks remain finite. If two nearby wavelengths λ1 and λ2 are sent
through the grating simultaneously, their m th peaks are located at
mzλ1 mzλ2
x1 = and x 2 = (11.30)
h h
These are spatially separated by
mz
∆x λ ≡ x 2 − x 1 = ∆λ (11.31)
h
where ∆λ ≡ λ2 − λ1 .
Meanwhile, we can find the spatial width of, say, the first peak by considering the
change in x 1 that causes the sine in the numerator of (11.28) to reach the nearby
zero (see inset in Fig. 11.12d). This condition implies
πh x 1 + ∆x peak
¡ ¢
N = N mπ + π (11.32)
λ1 z
We will say that two peaks, associated with λ1 and λ2 , are barely distinguishable
when ∆x λ = ∆x peak . We also substitute from (11.30) to rewrite (11.32) as
πh (mzλ1 /h + mz∆λ/h) λ
N = N mπ + π ⇒ ∆λ = (11.33)
λ1 z Nm
Here we have dropped the subscript on the wavelength in the spirit of λ1 ≈ λ2 ≈ λ.
11.6 Diffraction of a Gaussian Field Profile 287
λ
RP ≡ = mN (11.34)
∆λ
The resolving power is proportional to the number of slits illuminated on the
diffraction grating. The resolving power also improves for higher diffraction
orders m.
Example 11.3
What is the resolving power with m = 1 of a 2-cm-wide grating with 500 slits per
millimeter, and how wide is the 1st-order diffraction peak for 500-nm light after
1-m focusing?
500
RP = mN = 2 cm = 104
0.1 cm
and the minimum distinguishable wavelength separation is
mf 1m
∆x = ∆λ = 0.05nm = 25 µm
h 2 × 10−6 m
k 2 +y 2 ∞ Z∞
e i kz e i 2z (x ) Z h 02 02 2
i k 02 02 k 0 0
d x0 d y 0 E 0 e −(x +y )/w 0 e i 2z (x +y ) e −i z (xx +y y )
¡ ¢
E x, y, z = −i
λz
−∞ −∞
(11.37)
The Gaussian profile itself limits the dimension of the emission region, so there is
no problem in integrating to infinity. Equation (11.37) can be rewritten as
∞ ∞
µ ¶ µ ¶
k 2 +y 2
E 0 e i kz e i 2z (x ) Z k ky
− 1
−i 2z x 02 −i kx
z x
0 Z − 1 k
+i 2z y 02 −i z y 0
0 w2 0 w2
¡ ¢
E x, y, z = −i dx e 0 dy e 0
λz
−∞ −∞
(11.38)
The integrals over x 0 and y 0 have the identical form and can be done individually
with the help of the integral formula (0.55). The algebra is cumbersome, but the
integral in the x 0 dimension becomes
1 ³ ´2
Z∞ −i kx
µ ¶
2
− 1
w2
−i k 02
x −i kx
x 0
π z
d x 0e
2z z
0 = exp
k
1
³ ´
− i 2z 1 k
−∞ w 02 4 w 2 − i 2z
0
1
2
2
π −kx
= ´ exp ³
³ ´
k 2z 2z
−i 2z 1 + i kw 2 2z kw 2 − i
0 0
1
2
h i
2 2z
λz −kx kw 02
+ i
= r exp
· ¸
³ ´2 i tan−1 2z ³ ´ 2
2z kw 2
2z
1 + kw 2 e 0 2z 1 + kw 2
0 0
(11.39)
A similar expression results from the integration on y 0 .
When (11.39) and the equivalent expression for the y-dimension are used in
(11.38), the result is
(xÃ2 +y 2 )!
µ ¶
1 k
− 2 +i 2z
k 2 2 w2
¡ ¢ e i kz e i 2z (x +y ) 1+ 2z
kw 2
0
−i tan−1 2z
kw 2
E x, y, z = E 0 r ³ ´2 e
0 e 0 (11.40)
2z
1 + kw 2
0
This rather complicated-looking expression for the field distribution is in fact very
useful and can be directly interpreted, as discussed in the next section.
A Gaussian field profile is one of few diffraction problems that can be handled con-
veniently in either the Cartesian (as above) or cylindrical coordinate. In cylindrical
11.7 Gaussian Laser Beams 289
e i kz e i 2z 0 2z
2z 1+ −i tan−1
kw 2 kw 2
= E0 s µ ¶2 e
0 e 0
2z
1+ kw 02
ρ2 ≡ x 2 + y 2, (11.42)
q
w (z) ≡ w 0 1 + z 2 /z 02 , (11.43)
R (z) ≡ z + z 02 /z, (11.44)
2
kw 0
z0 ≡ (11.45)
2
This formula describes the lowest-order Gaussian mode, the most common laser
beam profile. (Please be aware that some lasers are multimode and exhibit more
complicated structures.)
It turns out that (11.41) works equally well for negative values of z. The
expression can therefore be used to describe the field of a simple laser beam
everywhere (before and after it goes through a focus). In fact, the expression
works also near z = 0!4 At z = 0 the diffracted field (11.41) returns the exact
4 There is good reason for this since the Fresnel diffraction integral is an exact solution to the
paraxial wave equation (10.15). The beam (11.41) therefore satisfies the paraxial wave equation for
all z.
290 Chapter 11 Diffraction Applications
expression for the original field profile (11.35) (see P11.11). In short, (11.41) may
be used with impunity as long as the divergence angle of the beam is not too wide.
To begin our interpretation of (11.41), consider the intensity profile I ∝ E ∗ E
as depicted in Fig. 11.15:
w 2 − 2ρ2 I0 2ρ 2
− 2
I ρ, z = I 0 2 0 e w 2 (z) =
¡ ¢
e w (z) (11.46)
w (z) 1 + z 2 /z 02
By inspection, we see that w (z) gives the radius of the beam anywhere along
z. At z = 0, the beam waist, w (z = 0) reduces to w 0 , as expected. The parameter
z 0 , known as the Rayleigh range, specifies the distance along the axis from z = 0
to the point where the intensity decreases by a factor of 2. Note that w 0 and z 0
are not independent of each other but are connected through the wavelength
Figure 11.15 A Gaussian laser field according to (11.45). There is a tradeoff: a small beam waist means a short depth
profile in the vicinity of its beam of focus. That is, a small w 0 means a small Rayleigh range z 0 .
waist. We next consider the phase terms that appear in the field expression (11.41).
The phase term i kz + i kρ 2 /2R (z) describes the phase of curved wave fronts,
where R (z) is the radius of curvature of the wave front at z. At z = 0, the radius of
curvature is infinite (see (11.44)), meaning that the wave front is flat at the laser
beam waist. In contrast, at very large values of z we have R (z) ∼ = z (see (11.44)).
kρ 2 ∼ p 2
In this case, we may write these phase terms as kz + 2R(z) = k z + ρ 2 . This
describes a spherical wave front emanating from the origin out to point ρ, z . The
¡ ¢
Example 11.4
Write the beam waist w 0 in terms of the f-number, defined to be the ratio of z to
the diameter of the beam diameter 2w(z) far from the beam waist.
Solution: Far away from the beam waist (i.e. z >> z 0 ) the laser beam expands
along a cone. That is, its diameter increases in proportion to distance.
q
w (z) = w 0 1 + z 2 /z 02 → w 0 z/z 0
The cone angle is parameterized by the f-number, the ratio of the cone height to
its base:
z z z0
f # ≡ lim = =
z→±∞ 2w (z) 2w 0 z/z 0 2w 0
Figure 11.16
11.A ABCD Law for Gaussian Beams 291
2λ f #
w0 = (11.47)
π
Equation (11.47) gives a convenient way to predict the size of a laser focus. One
calculate the f-number by dividing the diameter of the beam at a lens by the
distance to the focus. However, in practice you may be very surprised at how badly
a beam focuses compared to the theoretical prediction (due to aberrations, etc.).
It is always good practice to directly measure your focus if its size is important to
an experiment.
A (z + i z 0 ) + B
z 0 + i z 00 = (11.48)
C (z + i z 0 ) + D
292 Chapter 11 Diffraction Applications
Figure 11.17 Gaussian laser beam traversing an optical system described by an ABCD
matrix. The dark lines represent the incoming and exiting beams. The gray line repre-
sents where the exiting beam appears to have been.
where A, B , C
p, and D are the matrix elements of the optical system. The imaginary
number i ≡ −1 imbues the law with complex arithmetic. It makes two equations
from one, since the real and imaginary parts of (11.48) must separately be equal.
We now prove the ABCD law. We begin by showing that the law holds for
two specific ABCD matrices. First, consider the matrix for propagation through a
distance d : · ¸ · ¸
A B 1 d
= (11.49)
C D 0 1
We know that simple propagation has minimal effect on a beam. The Rayleigh
range is unchanged, so we expect that the ABCD law should give z 00 = z 0 . The
propagation through a distance d modifies the beam position by z 0 = z + d . We
now check that the ABCD law agrees with these results by inserting (11.49) into
(11.48):
1 (z + i z 0 ) + d
z 0 + i z 00 = = z + d + i z 0 (propagation through distance d) (11.50)
0 (z + i z 0 ) + 1
A beam that traverses a thin lens undergoes the phase shift −kρ 2 /2 f , according
to (11.11). This modifies the original phase of the wave front kρ 2 /2R (z), seen in
(11.41). The phase of the exiting beam is therefore
kρ 2 kρ 2 kρ 2
= − (11.52)
2R (z 0 ) 2R (z) 2 f
1 1 1 1 1 1
= − ⇒ 2 0
= 2
− (11.53)
R (z 0 ) R (z) f 0 0
z + z 0 /z z + z 0 /z f
11.A ABCD Law for Gaussian Beams 293
In addition to this relationship, the local radius of the beam given by (11.43)
cannot change while traversing the ‘thin’ lens. Therefore,
2
à !
z0 z2
µ ¶
¡ 0¢ 0
w z = w (z) ⇒ z 0 1 + 2 = z 0 1 + 2 (11.54)
z 00 z0
On the other hand, the ABCD law for the thin lens gives
1 (z + i z 0 ) + 0
z 0 + i z 00 = ¡ ¢ (traversing a thin lens with focal length f )
− 1/ f (z + i z 0 ) + 1
(11.55)
It is left as an exercise (see P11.14) to show that (11.55) is consistent with (11.53)
and (11.54).
So far we have shown that the ABCD law works for two specific examples,
namely propagation through a distance d and transmission through a thin lens
with focal length f . From these elements we can derive more complicated sys-
tems. However, the ABCD matrix for a thick lens cannot be constructed from just
these two elements. However, we can construct the matrix for a thick lens if we
sandwich a thick window (as opposed to empty space) between two thin lenses.
The proof that the matrix for a thick window obeys the ABCD law is left as an
exercise (see P11.17). With these relatively few elements, essentially any optical
system can be constructed, provided that the beam propagation begins and ends
in the same index of refraction.
To complete our proof of the general ABCD law, we need only show that when
it is applied to the compound element
· ¸ · ¸· ¸ · ¸
A B A2 B2 A1 B1 A 2 A 1 + B 2C 1
A2B1 + B2D 1
= =
C D C2 D2 C1 D1 C 2 A 1 + D 2C 1
C2B1 + D 2D 1
(11.56)
it gives the same answer as when the law is applied sequentially, first on
· ¸
A1 B1
C1 D1
and then on · ¸
A2 B2
C2 D2
294 Chapter 11 Diffraction Applications
Explicitly, we have
A 2 z 0 + i z 00 + B 2
¡ ¢
00 00
z + i z0 =
C 2 z 0 + i z 00 + D 2
¡ ¢
h i
A 2 CA11(z+i
(z+i z 0 )+B 1
z 0 )+D 1 + B 2
= h i
C 2 CA11(z+i
(z+i z 0 )+B 1
z 0 )+D 1 + D 2
A 2 [A 1 (z + i z 0 ) + B 1 ] + B 2 [C 1 (z + i z 0 ) + D 1 ] (11.57)
=
C 2 [A 1 (z + i z 0 ) + B 1 ] + D 2 [C 1 (z + i z 0 ) + D 1 ]
(A 2 A 1 + B 2C 1 ) (z + i z 0 ) + (A 2 B 1 + B 2 D 1 )
=
(C 2 A 1 + D 2C 1 ) (z + i z 0 ) + (C 2 B 1 + D 2 D 1 )
A (z + i z 0 ) + B
=
C (z + i z 0 ) + D
Thus, we can construct any ABCD matrix that we wish from matrices that are
known to obey the ABCD law. The resulting matrix also obeys the ABCD law.
Exercises 295
Exercises
P11.1 Fill in the steps leading to (11.14) from (11.13). Show that the intensity
distribution (11.6) is consistent with (11.14).
L11.2 Set up a collimated ‘plane wave’ in the laboratory using a HeNe laser
(λ = 633 nm) and appropriate lenses.
(a) Choose a rectangular aperture (∆x by ∆y) and place it in the plane
wave. Observe the Fraunhofer diffraction on a very far away screen (i.e.,
¢2
where z À k2 aperture radius is satisfied). Check that the location of
¡
CCD
Camera
Filters
Screen
Laser
Far-away Removable
mirror mirror Aperture
Figure 11.18
P11.3 On the night of April 18, 1775, a signal was sent from the Old North
Church steeple to Paul Revere, who was 1.8 miles away: “One if by
land, two if by sea.” If in the dark, Paul’s pupils had 4 mm diameters,
what is the minimum possible separation between the two lanterns
that would allow him to correctly interpret the signal? Assume that the
predominant wavelength of the lanterns was 580 nm.
HINT: In the eye, the index of refraction is about 1.33 so the wavelength
is shorter. This leads to a smaller diffraction pattern on the retina.
However, in accordance with Snell’s law, two rays separated by an angle
580 nm outside of the eye are separated by an angle θ/1.33 inside the
eye. The two rays then hit on the retina closer together. As far as
resolution is concerned, the two effects exactly compensate.
296 Chapter 11 Diffraction Applications
L11.4 Simulate two stars with laser beams (λ = 633 nm). Align them nearly
parallel with a small lateral displacement. Send the beams down a long
corridor until diffraction causes both beams to grow into one another
so that it is no longer apparent that they are from two distinct sources.
Use a lens to image the two sources onto a CCD camera. The camera
should be placed close to the focal plane of the lens. Use a variable iris
near the lens to create different pupil openings.
Laser
CCD
Laser Camera
Filter
Pupil
Figure 11.19
P11.5 Find the diffraction pattern created by an array of nine circles, each
with radius a, which are centered at the following (x 0 , y 0 ) coordinates:
(−b, b), (0, b), (b, b), (−b, 0), (0, 0), (b, 0), (−b, −b), (0, −b), (b, −b) (a is
less than b). Make a plot of the result for the situation where (in some
choice of units) a = 1, b = 5a, and k/d = 1. View the plot at different
“zoom levels” to see the finer detail.
P11.8 For the case of N = 1000 in P11.7, you wish to position a narrow slit at
the focus of the lens so that it transmits only the first-order diffraction
¡ ¢
peak (i.e. at khx/ 2 f = ±π). (a) How wide should the slit be if it is to
be half the separation between the first intensity zeros to either side of
the peak?
(b) What small change in wavelength (away from λ = 500 nm) will
cause the intensity peak to shift by the width of the slit found in part
(a)?
L11.9 (a) Use a HeNe laser to determine the period h of a reflective grating.
(b) Give an estimate of the blaze angle φ on the grating. HINT: Assume
that the blaze angle is optimized for first-order diffraction of the HeNe
laser (for one side) at normal incidence. The blaze angle enables a
mirror-like reflection of the diffracted light on each groove. (video)
(c) You have two mirrors of focal length 75 cm and the reflective grating Figure 11.21
in the lab. You also have two very narrow adjustable slits and the ability
to ‘tune’ the angle of the grating. Sketch how to use these items to make
a monochromator (scans through one wavelength at a time). If the
beam that hits the grating is 5 cm wide, what do you expect the ultimate
resolving power of the monochromator to be in the wavelength range
of 500 nm? Do not worry about aberration such as astigmatism from
using the mirrors off axis.
298 Chapter 11 Diffraction Applications
Light out
Slit
Grating
Slit
Light in
Figure 11.22
L11.10 Study the Jarrell Ash monochromator. Use a tungsten lamp as a source
and observe how the instrument works by taking the entire top off.
Do not breathe or touch when you do this. In the dark, trace the light
inside of the instrument with a white plastic card and observe what
happens when you change the wavelength setting. Place the top back
on when you are done. (video)
(a) Predict the best theoretical resolving power that this instrument can
do assuming 1200 lines per millimeter.
(b) What should the width ∆x of the entrance and exit slits be to obtain
this resolving power? Assume λ = 500 nm.
HINT: Set ∆x to be the distance between the peak and the first zero of
the diffraction pattern at the exit slit for monochromatic light.
P11.12 Use the Fraunhofer integral formula (either (10.19) or (10.28)) to deter-
mine the far-field pattern of a Gaussian laser focus (11.35).
HINT: The answer should agree with P11.11 part (b).
L11.13 Consider the following setup where a diverging laser beam is collimated
using an uncoated lens. A double reflection from both surfaces of the
lens (known as a ghost) comes out in the forward direction, focusing
Ghost Beam after a short distance. Use a CCD camera to study this focused beam.
The collimated beam serves as a reference to reveal the phase of the
focused beam through interference. Because the weak ghost beam
Figure 11.23
Exercises 299
concentrates near its focus, the two beams can have similar intensities
for optimal interference effects. (video)5
Uncoated
Filter Pin Hole Lens
Laser CCD
Camera
Lens
150 cm
Figure 11.24
kρ 2 −1 z
· q µ ¶¸
I t ρ, z = I 2 + I 1 ρ, z + 2 I 2 I 1 ρ, z cos −φ
¡ ¢ ¡ ¢ ¡ ¢
− tan
2R (z) z0
both R (z) and the Gouy shift tan−1 z/z 0 , which are not present in the
intensity distribution of a single beam (see (11.46)).
(a) Determine the f-number for the ghost beam (see Example 11.4).
Use this measurement to predict a value for w 0 . HINT: You know that
z=0 z = +z0
at the lens, the focusing beam is the same size as the collimated beam.
(b) Measure the actual spot size w 0 at the focus. How does it compare
to the prediction?
HINT: Before measuring the spot size, make a subtle adjustment to
the tilt of the lens. This incidentally causes the phase between the two z = -z0 z = +2z0
beams to vary by small amounts, which you can set to φ = ±π/2. Then
at the focus the cosine term vanishes and the two beams don’t interfere
(i.e. the intensities simply add). This is accomplished if the center of
the interference pattern is as dark as possible either far before or far
after the focus.
z = -2z0 z = +3z0
(c) Observe the effect of the Gouy shift. Since tan−1 z/z 0 varies over a
range of π, you should see that the ring pattern before versus after the
focus inverts (i.e. the bright rings exchange with the dark ones).
(d) Predict the Rayleigh range z 0 and check that the radius of curvature
R (z) ≡ z + z 02 /z agrees with measurement. z = -3z0 z = +4z0
HINT: You should see interference rings similar to those in Fig. 11.25.
The only phase term that varies with ρ is kρ 2 /2R (z). If you count N Figure 11.25
fringes out to a radius ρ, then kρ 2 /2R (z) has varied by 2πN .
5 J. Peatross and M. V. Pack, “Viewing the Mathematical Structure of Gaussian Laser Beams in a
P11.14 Find the solutions to (11.55) (i.e. find z 0 and z 00 in terms of z and z 0 ).
Show that the results are in agreement with (11.53) and (11.54).
P11.15 Assuming a collimated beam (i.e. z = 0 and beam waist w 0 ), find the
location L = −z 0 and size w 00 of the resulting focus when the beam goes
through a thin lens with focal length f .
L11.16 Place a lens in a HeNe laser beam soon after the exit mirror of the cavity.
Characterize the focus of the resulting laser beam, and compare the
results with the expressions derived in P11.15.
P11.17 Prove the ABCD law for a beam propagating through a thick window of
material with matrix
· ¸ · ¸
A B 1 d /n
=
C D 0 1
Chapter 12
12.1 Interferograms
Consider the Michelson interferometer seen in Fig. 12.1. Suppose that the beam-
spliter divides the fields evenly, so that the overall output intensity is given by
(8.1):
Figure 12.1 Michelson interferom-
I tot = 2I 0 [1 + cos (ωτ)] (12.1) eter.
As a reminder, τ is the roundtrip delay time of one path relative to the other. This
equation is based on the idealized case, where the amplitude and phase of the two
1 See M. Born and E. Wolf, Principles of Optics, 7th ed., Sect. 7.5.5 (Cambridge: Cambridge
301
302 Chapter 12 Interferograms and Holography
(a) beams are uniform and perfectly aligned to each other following the beamsplitter.
The entire beam ‘blinks’ on and off as the delay path τ is varied.
What happens if one of the retro-reflecting mirrors is misaligned by a small
angle θ? The fringe patterns seen in Fig. 12.2 (a)-(c) are the result. By the law
of reflection, the beam returning from the misaligned mirror deviates from the
‘ideal’ path by an angle 2θ. This puts a relative phase variation of
I tot = 2I 0 1 + cos φ + ωτ
£ ¡ ¢¤
(12.3)
(c) The phase term φ depends on the local position within the beam through x and
y. Regions of uniform phase, called fringes (in this case individual stripes), have
the same intensity. As the delay τ is varied, the fringes seem to ‘move’ across the
detector. In this case, the fringes appear at one edge of the beam and disappear
at the other.
Another interesting situation arises when the beams in a Michelson interfer-
ometer are diverging. A fringe pattern of concentric circles will be seen at the
(d) detector when the two beam paths are unequal (see Fig. 12.2 (d)). The radius of
curvature for the beam traveling the longer path is increased by the added amount
of delay d = τc. Thus, if beam 1 has radius of curvature R 1 when returning to the
beam splitter, then beam 2 will have radius R 2 = R 1 +d upon return (assuming flat
mirrors). The relative phase (see phase term in (11.41)) between the two beams is
Optic to
be tested
12.3 Generating Holograms
In the late 1940’s, Dennis Gabor developed the concept of holography, but it wasn’t
until after the invention of the laser that this field really blossomed. Consider
a coherent monochromatic beam of light that is split in half by a beamsplitter,
similar to that in a Michelson interferometer. Let one beam, called the reference
Imaging
beam, proceed directly to a recording film, and let the other beam scatter from Objective
an arbitrary object back towards the same film. The two beams interfere at the Camera
recording film. It is best to split the beam initially into unequal intensities such
that the light scattered from the object has an intensity similar to the reference Figure 12.4 Twyman-Green setup
for testing lenses.
beam at the film.
The purpose of the film is to record the interference pattern. It is important
that the coherence length of the light be much longer than the difference in
path length starting from the beam splitter and ending at the film. In addition,
during exposure to the film, it is important that the whole setup be stable against
vibrations on the scale of a wavelength since this will cause the fringes to wash
304 Chapter 12 Interferograms and Holography
Object
Film out. For simplicity, we neglect the vector nature of the electric field, assuming
that the scattering from the object for the most part preserves polarization and
that the angle between the two beams incident on the film is modest (so that the
electric fields of the two beams are close to parallel). To the extent that the light
scattered from the object contains the polarization component orthogonal to that
of the reference beam, it provides a uniform (unwanted) background exposure to
Beamsplitter the film on top of which the fringe pattern is recorded.
Figure 12.5 Exposure of holo- In general terms, we may write the electric field arriving at the film as5
graphic film. E film (r) e −i ωt = E object (r) e −i ωt + E ref (r) e −i ωt (12.5)
Here, the coordinate r indicates locations on the film surface, which may have
arbitrary shape, but often is a plane. The field E object (r), which is scattered from
the object, is in general very complicated. The field E ref (r) may be equally compli-
cated, but typically it is convenient if it has a simple form such as a plane wave,
since this beam must be re-created later in order to view the hologram.
The intensity of the field (12.5) is given by
1 ¯ ¯2
I film (r) = c²0 ¯E object (r) + E ref (r)¯
2
1 h¯ ¯2 i (12.6)
= c²0 ¯E object (r)¯ + |E ref (r)|2 + E ref
∗ ∗
(r) E object (r) + E ref (r) E object (r)
2
For typical photographic film, the exposure of the film is proportional to the
intensity of the light hitting it. This is known as the linear response regime. That
Dennis Gabor (19001979, Hungarian) is, after the film is developed, the transmittance T of the light through the film is
was born in Budapest. As a teenager, proportional to the intensity of the light that exposed it (I film ). However, for low
he fought for Hungary in World War
I. Following the war, he studied at the
exposure levels, or for film specifically designed for holography, the transmission
Technical University of Budapest and of the light through the film can be proportional to the square of the intensity
later at the Technical University of
Berlin. In 1927, Gabor completed his
of the light that exposes the film. Thus, after the film is exposed to the fringe
doctoral dissertation on cathode ray pattern and developed, the film acquires a spatially varying transmission function
tubes and began a long career work-
according to
ing on electron-beam devices such as
2
oscilloscopes, televisions, and electron T (r) ∝ I film (r) (12.7)
microscopes. It was in the context of
`electron optics' that he invented the
If at a later point in time light of intensity I incident is directed onto the film, it will
concept of holography, which relied on transmit according to I transmitted = T (r)I incident . In this case, the field, as it emerges
the wave nature of electron beams. Ga-
bor did this work while working for a
from the other side of the film, will be
British company, after eeing Germany
when Hitler came to power. Holography
E transmitted (r) = t (r) E incident (r) ∝ I film (r) E incident (r) (12.8)
did not become practical until after the
p
invention of the laser, which provided
where t (r) = T (r).
a bright coherent light source. (Gabor
had attempted to make holograms ear-
lier using a spectral line from a mercury 12.4 Holographic Wavefront Reconstruction
lamp.) In 1964 the rst hologram was
produced. Soon after, holograms be-
came commercially available and were To see a holographic image, we re-illuminate film (previously exposed and devel-
popularized. Gabor accepted a post as oped) with the original reference beam. That is, we send in
professor of applied physics at the Impe-
rial College of London from 1958 until E incident (r) = E ref (r) (12.9)
he retired in 1967. He was awarded the
Nobel prize in physics in 1971 for the
5 See P. W. Milonni and J. H. Eberly, Lasers, Sect. 16.4-16.5 (New York: Wiley, 1988); G. R. Fowles,
invention of holography. (Wikipedia) Introduction to Modern Optics, 2nd ed., Sect. 5.7 (Toronto: Dover, 1975).
12.4 Holographic Wavefront Reconstruction 305
Image
and view the light that is transmitted. According to (12.6) and (12.8), the trans-
Film
mitted field is proportional to
Example 12.1
Analyze the three field terms in (12.10) for a hologram made from a point object,
as depicted in Fig. 12.7.
Figure 12.7 Exposure to holo-
graphic film by a point source
Solution: Presumably, the point object is illuminated sufficiently brightly so as to
and a reference plane wave. The
make the scattered light have an intensity similar to the reference beam at the film.
holographic fringe pattern for a
Let the reference plane wave strike the film at normal incidence. Then the reference point object and a plane wave ref-
field will have constant amplitude and phase across it; call it E ref . The field from erence beam exposing a flat film is
the point object can be treated as a spherical wave: shown on the right.
E ref L p 2 2
E object ρ = p e i k L +ρ
¡ ¢
(point source example) (12.11)
L2 + ρ2
Here ρ represents the radial distance from the center of the film to some other
306 Chapter 12 Interferograms and Holography
Reference Undeflected point on the film. We have taken the amplitude of the object field to match E ref in
beam Film beam the center of the film.
After the film is exposed, developed, and re-illuminated by the reference beam, the
field emerging from the right-hand-side of the film, according to (12.10), becomes
· 2 2
E ref L
¸
E ref L p 2 2
E transmitted ρ ∝ 2 2
e i k L +ρ
¡ ¢
+ E ref E ref + E ref p
L +ρ
2 2
L2 + ρ2
p (12.12)
E ref L 2 2
2
+ E ref e −i k L +ρ
L +ρ
p
2 2
Reference
beam Film We see the three distinct waves that emerge from the holographic film. The first
term in (12.12) represents the plane wave reference beam passing straight through
the film with some variation in amplitude (depicted in Fig. 12.8 (a)). The second
Virtual term in (12.12) has the identical form as the field from the original object (aside
image
from an overall amplitude factor). It describes an outward-expanding spherical
wave, which gives rise to a virtual image at the location of the original point object,
as depicted in Fig. 12.8 (b). The final term in (12.12) corresponds to a converging
spherical wave, which focuses to a point at a distance L from the observer’s side of
the screen (depicted in Fig. 12.8 (c)).
Field associated
with virtual
image
Reference
beam
Real
image
Field associated
with real
image
Film
Exercises
P12.4 Consider a diffraction grating as a simple hologram. Let the light from
the “object” be a plane wave (object placed at infinity) directed onto
a flat film at angle θ. Let the reference beam strike the film at normal
incidence, and take the wavelength to be λ.
(a) What is the period of the fringes?
(b) Show that when re-illuminated by the reference beam, the three
terms in (12.10) give rise to zero-order and 1st-order diffraction (occur-
ring on each side of zero-order).
P12.5 (a) Show that the phase of the real image in (12.12) may be approxi-
mated as ∆φ = −kρ 2 /2L, aside from a spatially independent overall
phase. Compare with (11.10) and comment.
(b) This hologram is similar to a Fresnel zone plate, used to focus
extreme ultraviolet light or x-rays, for which it is difficult to make a lens.
Graph the field transmission for the hologram as a function of ρ and
superimpose a similar graph for a “best-fit” mask that has regions of
either 100% or 0% transmission. Use λ = 633 nm and L = (5 × 105 − 14 )λ
308 Chapter 12 Interferograms and Holography
(this places the point source about a 32 cm before the screen). See
Fig. 12.9.
Consider the holographic pattern produced by the point object de-
scribed in section 12.4.
0.5 Hologram
Transmittance
0
0 0.5 1
R55 T or F: The imaging relation 1/ f = 1/d o + 1/d i relies on the paraxial ray
approximation.
309
310 Review, Chapters 9–12
R60 T or F: The array theorem is useful for deriving the Fresnel diffraction
from a grating.
R64 T or F: The central peak of the Fraunhofer diffraction from two nar-
row slits separated by spacing h has the same width as the central
diffraction peak from a single slit with width ∆x = h.
Problems
R69 (a) Consider a ray of light emitted from an object, which travels a
distance d o before traversing a lens of focal length f and then traveling
a distance d i .
· ¸ · ¸
y2 y1
Write a vector equation relating to . Be sure to simplify
θ2 θ1
image the equation so that only one ABCD matrix is involved.
object · ¸ · ¸
1 0 1 d
HINT: ,
−1/ f 1 0 1
(b) Explain the requirement on the ABCD matrix in part (a) that ensures
Figure 12.10 that an image appears for the distances chosen. From this requirement,
extract a familiar constraint on d o and d i . Also, make a reasonable
definition for magnification M in terms of y 1 and y 2 , then substitute to
find M in terms of d o and d i .
311
(c) A telescope is formed with two thin lenses separated by the sum of
their focal lengths f 1 and f 2 . Rays from a given far-away point all strike
the first lens with essentially the same angle θ1 . Angular magnification
M θ quantifies the telescope’s purpose of enlarging the apparent angle
between points in the field of view.
Give a sensible definition for angular magnification in terms of θ1 and
θ2 . Use ABCD-matrix formulation to derive the angular magnification
of the telescope in terms of f 1 and f 2 . Figure 12.11
· ¸
A B
R70 (a) Show that a system represented by a matrix (beginning
C D
and ending in the same index of refraction) can be made to look like
the matrix for a thin lens if the beginning and ending positions along
the z-axis are referenced from two principal planes, located distances
p 1 and p 2 before and after the system.
¯ ¯
¯ A B ¯
HINT: ¯ ¯ ¯ = 1.
C D ¯
(b) Where are the principal planes located and what is the effective
focal length for two identical thin lenses with focal lengths f that are
separated by a distance d = f (see Fig. 12.12)?
R71 Derive the on-axis intensity (i.e. x, y = 0) of a Gaussian laser beam if Figure 12.12
you know that at z = 0 the electric field of the beam is
ρ 02
− 2
E ρ 0 , z = 0 = E 0 e w0
¡ ¢
Fresnel:
k 2 +y 2
i e i kd e i 2d (x )Ï ¢ k 02 02 k 0 0
E x, y, d ∼ E x 0 , y 0 , 0 e i 2d (x +y ) e −i d (xx +y y ) d x 0 d y 0
¡ ¢ ¡
=−
λd
Z∞
π B 2 +C
r
−Ax 2 +B x+C
e dx = e 4A .
A
−∞
R72 (a) You decide to construct a simple laser cavity with a flat mirror and
another mirror with concave curvature of R = 100 cm. What is the
longest possible stable cavity that you can make?
HINT: Sylvester’s theorem is
¸N
A sin N θ − sin (N − 1) θ B sin N θ
· · ¸
A B 1
=
C D sin θ C sin N θ D sin N θ − sin (N − 1) θ
312 Review, Chapters 9–12
R73 (a) Compute the Fraunhofer diffraction intensity pattern for a uni-
formly illuminated circular aperture with diameter `.
HINT:
k 2
+y 2 )
i e i kd e i 2d (x
ZZ
k 0 0
E x, y, d ∼ E x 0 , y 0 , 0 e −i d (xx +y y ) d x 0 d y 0
¡ ¢ ¡ ¢
=−
λd
Z2π
1
e ±i α cos(θ−θ ) d θ 0
0
J 0 (α) =
2π
0
Za
a
J 0 (bx) xd x = J 1 (ab)
b
0
J 1 (1.22π) = 0
2J 1 (x)
lim =1
x→0 x
(b) The first lens of a telescope has a diameter of 30 cm, which is the
only place where light is clipped. You wish to use the telescope to
examine two stars in a binary system. The stars are approximately 25
light-years away. How far apart need the stars be (in the perpendicular
sense) for you to distinguish them in the visible range of λ = 500 nm?
Compare with the radius of Earth’s orbit, 1.5 × 108 km.
313
R74 (a) Derive the Fraunhofer diffraction pattern for the field from a uni-
formly illuminated single slit of width ∆x. (Don’t worry about the
y-dimension.)
(b) Find the Fraunhofer intensity pattern for a grating of N slits of width
∆x positioned on the mask at x n0 = h n − N2+1 so that the spacing
¡ ¢
N rN −1
rn =r
X
n=1 r −1
(c) Consider Fraunhofer diffraction from the grating in part (b). The
grating is 5.0 cm wide and is uniformly illuminated. For best resolution
in a monochromator with a 50 cm focal length, what should the width
of the exit slit be? Assume a wavelength of λ = 500 nm.
the field at the center of the focus found in part (a), and the width
is w 0 = 2λ f # /π and f # ≡ f /`. The figure below shows how well the
Gaussian approximation fits the actual curve. We have assumed that
the first aperture is a distance f before the lens so that at the focus after
the lens the wave front is flat at the pinhole. To avoid integration, you
may want to use the result of P11.12 or P11.11(b) to get the Fraunhofer
limit of the Gaussian profile. (See figure below.)
Blackbody Radiation
Hot objects glow. In 1860, Kirchhoff proposed that the radiation emitted by hot
objects as a function of frequency is approximately the same for all materials.1
The notion that all materials behave similarly led to the concept of an ideal
blackbody radiator. Most materials have a certain shininess that causes light to
reflect or scatter in addition to being absorbed and reemitted. However, light
that falls upon an ideal blackbody is absorbed perfectly before the possibility of
reemission, hence the name blackbody.
The distribution of frequencies emitted by a blackbody radiator is related
to its temperature. We often consider a blackbody radiator that is in thermal
equilibrium with the surrounding light that is absorbed and reemitted. If it is
not in thermal equilibrium, for example, if more light is emitted than absorbed,
then the object inevitably cools as light escapes to the environment, moving the
system toward thermal equilibrium.
The Sun is a good example of a blackbody radiator. The light emitted from the Gustav Kirchho (18241887, German)
was born in Konigsberg, the son of a
Sun is associated with its surface temperature. Any light that arrives to the Sun lawyer. Kirchho attended the Univer-
from outer space is virtually 100% absorbed, however little light that might be, so sity of Konigsberg. While still a student,
he developed what are now called Kirch-
the name blackbody aptly describes it. Mostly, light escapes to the much colder ho 's law for electrical circuits. During
surrounding space (i.e. it is not in thermal equilibrium), and the temperature of his career, Kirchho was a professor in
Breslau, Heidelberg, and nally Berlin.
the Sun’s surface is maintained by the fusion process within. As another example, Kirchho was one of the rst to study
a glowing tungsten filament in an ordinary light bulb may be reasonably described the spectra emitted by various objects
when heated. Not coincidentally, his
as a blackbody radiator. However, surface reflections make it less than ideal both
colleague in heidelberg was Robert
for absorption and emission. Bunsen, inventor of the Bunsen burner.
Experimentally, a near perfect blackbody radiator can be constructed from Kirchho coined the term `blackbody'
radiation. He demonstrated that an ex-
a hollow object. An example is shown in Fig. 13.1. As the interior of the object cited gas gives o a discrete spectrum,
is heated, the light present inside the internal cavity is in equilibrium with the and that an unexcited gas surrounding
a blackbody emitter produces dark lines
glowing walls. A small hole can be drilled through the wall to observe the radiation in the blackbody spectrum. Together
inside without significantly disturbing the system. The observation hole can be Kirchho and Bunsen discovered cae-
sium and rubidium. Later in his career,
thought of as a perfect blackbody since any light entering the hole from the
Kirchho showed how to derive Fres-
outside is eventually absorbed (before being potentially reemitted), if not on the nel's diraction formula starting from
the wave equation. (Wikipedia)
1 An important exception is atomic vapors, which have relatively few discrete spectral lines.
However, Kirchhoff’s assumption holds quite well for most solids, which are sufficiently complex.
315
316 Chapter 13 Blackbody Radiation
I = eσT 4 , (13.1)
plification.
13.2 Failure of the Equipartition Principle 317
Within the enclosed cavity, light travels at speed c isotropically in all directions. A
factor of 1/2 arrises because only half of the energy travels towards the hole from
within the cavity as opposed to away. The remaining factor of 1/2 occurs because
the light emerging from the hole is directionally distributed over a hemisphere as
opposed to flowing in the direction of the surface normal n̂. The average over the
hemisphere is carried out as follows:
2π π/2 2π π/2
dφ r · n̂ sin θd θ dφ r cos θ sin θd θ
R R R R
0 0 0 0 1
= = (13.3)
2π π/2 2π π/2 2
dφ r sin θd θ dφ r sin θd θ
R R R R
0 0 0 0
Although (13.1) describes the total intensity of the light that leaves a blackbody
surface, it does not describe what frequencies make up the radiation field. This
frequency distribution was not fully described for another two decades, when
Max Planck developed his famous formula. Planck was first to arrive at the correct
formula for the spectrum of blackbody radiation, building on the work of others,
most notably Wien, who came very close. At first, Planck tweaked Wien’s formula
to match newly available experimental data. When he attempted to explain
it, he was forced to introduce the concept of light quanta. Even Planck was
uncomfortable with and perhaps disbelieved the assumption that his formula
implied, but he deserves credit for recognizing and articulating it.
potential energy). The problem then reduces to that of finding the number of
unique modes for the radiation at each frequency.5 The idea is that requiring each
mode of electromagnetic energy to hold energy k B T should reveal the spectral
shape of blackbody radiation.
where each component of the wave number in any of the three dimensions is an
integer times
k 0 = 2π/L (13.5)
Considering a box of size L does not artificially restrict our analysis, since we may
later take the limit L → ∞ so that our box represents the entire universe. Moreover,
L will naturally disappear from our calculation when we later consider the density
of modes.
Figure 13.3 The volume of a thin
spherical shell in n, m, ` space. We can think of a given wave number k as specifying the equation of a sphere in a
coordinate system with axes labeled n, m, and `:
¶2
k
µ
n 2 + m 2 + `2 = (13.6)
k0
The fact that the integers n, m, and ` range over both positive and negative values
automatically takes into account that the field may travel in the forwards or the
backwards direction.
We need to know how many more ways there are to choose n, m, and ` when the
wave number k/k 0 increases to (k + d k)/k 0 . The answer is the difference in the
volume of the two spheres shown in Fig. 13.3:
k2 d k
µ ¶
# modes in (k,k+d k) = 4π 2 (13.7)
k0 k0
This is the number of terms in (13.4) associated with a wave number between k
and k + d k.
5 See O. Svelto, Principles of Lasers, 4th ed., translated by D. C. Hanna, Sect. 2.2.1 (New York:
where the extra factor of 2 accounts for two independent polarizations, not speci-
fied in (13.4). As anticipated, the dependence on L has disappeared from (13.8)
after substituting from (13.5).
We can immediately see that (13.8) disagrees drastically with the Stefan-
Boltzmann law (13.2), since (13.8) is proportional to temperature rather than
to its fourth power. In addition, the integral in (13.8) is seen to diverge, meaning
that regardless of the temperature, the light carries infinite energy density! This
has since been named the ultraviolet catastrophe since the divergence occurs
on the short wavelength end of the spectrum. This is a clear failure of classical
physics to explain blackbody radiation. Nevertheless, Rayleigh emphasized the
fact that his formula works well for the longer wavelengths.
It is instructive to make the change of variables k = ω/c in the integral to write
Z∞
ω2
u field = k B T dω (13.9)
π2 c 3
0
James Jeans (18771946, English) was
2 2 3
The important factor ω /π c can now be understood to be the number of modes born in Ormskirk, England. He attended
Cambridge University and later taught
per frequency. Then (13.9) is rewritten as there for most of his career. He also
taught at Princeton University for a
Z∞ number of years. One of his major con-
describes (incorrectly) the spectral energy density of the radiation field associated and cosmology. (Wikipedia)
became available over a fairly wide wavelength range. In keeping with Kirchhoff’s
notion of an ideal blackbody radiator, the results were observed to be indepen-
dent of the material for most solids. The intensity per frequency depended only
on temperature and when integrated over all frequencies agreed with the Stefan-
Boltzmann law (13.1).
In 1896, Wilhelm Wien considered the known physical and mathematical
constraints on the spectrum of blackbody radiation and proposed a spectral
function that seemed to work:8
ħω3 e −ħω/kB T
0 2 4 6 8 10 ρ Wien (ω) = (13.12)
π2 c 3
ħω3
ρ Planck (ω) = (13.13)
π2 c 3 e ħω/kB T − 1
£ ¤
The Boltzmann factor can be normalized by dividing by the sum of all such factors
to obtain the probability of having energy nħω in a particular mode:
Max Planck (18581947, German)
e −nħω/kB T h i was born in Kiel, the sixth child in his
Pn = ∞ = e −nħω/kB T 1 − e −ħω/kB T (13.14) family. His father was a law professor.
P −mħω/k T When Max was about nine years old,
e B
m=0 his family moved to Munich where he
attended gymnasium. A mathematician,
We used (0.66) to accomplish the above sum, which is a geometric series. Herman Muller took an interest in his
schooling and tutored him in mechanics
The expected energy in a particular mode of the field is the sum of each possible and astronomy. Planck was a gifted
energy level (i.e. nħω) times the probability of it occurring: musician, but he decided to pursue a
career in physics. At age 16 he enrolled
∞ h ∞
iX in the University of Munich. By age 22,
−ħω/k B T −nħω/k B T
X
nħωP n = ħω 1 − e ne he had nished his doctoral dissertation
n=0 n=0 and habilitation thesis. He was initially
h
−ħω/k B T
i ∂ ∞ ignored by the academic community and
e −nħω/kB T
X
= ħω 1 − e worked for a time as an unpaid lecturer.
∂ (ħω/k B T ) n=0
(13.15) He became an associate professor of
h
−ħω/1−k B T
i ∂ 1 theoretical physics at the University of
= −ħω 1 − e Kiel and then a few years later took
∂ (ħω/k B T ) 1 − e −ħω/kB T over Kirchho 's post at the University
ħω of Berlin. After nearly twenty years of
= idillic and happy family life, a series
e ħω/kB T − 1
of tragedies hit the Planck household.
We used (0.66) again as well as a clever derivative trick. Planck's rst wife and mother of four,
died. Then his eldest son was killed
in action during World War I. Soon
after, his twin daughters each died
Equation (13.15) provides the expected energy in any of the modes of the radi- giving birth to their rst child. Later
Planck's remaining son from his rst
ation field, as dictated by Planck’s assumption. To obtain the Planck distribution
marriage was executed for participating
(13.13), we replace k B T in the Rayleigh-Jeans formula (13.10) with the correct in a failed attempt to assassinate Hitler.
expected energy (13.15).10 Planck won the Nobel prize in 1918 for
his introduction of energy quanta, but
It is interesting that we are now able to derive the constant in the Stefan- he had serious reservations about the
Boltzmann law (13.2) in terms of Planck’s constant ħ (see P13.3). The Stefan- course that quantum mechanics theory
took. (Wikipedia)
Boltzmann law is obtained by integrating the spectral density function (13.13)
10 See O. Svelto, Principles of Lasers, 4th ed., translated by D. C. Hanna, Sect. 2.2.2 (New York:
over all frequencies to obtain the total field energy density, which is in thermal
equilibrium with the blackbody radiator:
Z∞
4 π2 k B4 4 4
u field = ρ Planck (ω)d ω = T ≡ σT 4 (13.16)
c 60c 2 ħ3 c
0
Since Planck’s constant was not introduced until a couple decades after the Stefan-
Boltzmann law was developed, one might more appropriately say that the Stefan-
Boltzmann constant pins down Planck’s constant.
Example 13.1
Determine ρ Planck (λ) such that
Z∞ Z∞
u field = ρ Planck (ω) d ω = ρ Planck (λ) d λ
0 0
where ρ Planck (ω) and ρ Planck (λ) represent distinct functions distiguished by their
arguments.
Z0 ¶ Z∞
ħ (2πc/λ)3 dλ
µ
16ħc
u field = ¤ −2πc 2 = ¤dλ
π2 c 3 e ħ(2πc/λ)/kB T − 1 λ λ5 e 2πħc/λkB T − 1
£ £
∞ 0
By inspection, we get
8πhc
ρ Planck (λ) = (13.17)
λ5 e hc/λkB T − 1
£ ¤
emission between a pair of states, it follows from (13.23) that one also knows the
rate of spontaneous emission. This is remarkable because to derive A 21 directly,
one needs to use the full theory of quantum electrodynamics (the complete
photon description). However, to obtain B 21 , it is actually only necessary to use a
semiclassical theory, where the light is treated classically and the energy levels in
the material are treated quantum-mechanically using the Schrödinger equation.
In writing the rate equations, (13.18), Einstein predicted the possibility of
creating lasers fifty years in advance of their development. These rate equations
are still valid even if the light is not in thermal equilibrium with the material.
The equations suggest that if the population in the upper state 2 can be made
artificially large, then amplification will result via the stimulated transition. The
rate equations also show that a population inversion (more population in the
upper state than in the lower one) cannot be achieved by ‘pumping’ the material
with the same frequency of light that one hopes to amplify. This is because the
stimulated absorption rate is balanced by the stimulated emission rate. The
material-dependent parameters A 21 and B 12 = B 21 are called the Einstein A and B
coefficients.
P = u field /3 (13.24)
on the walls of the container. This can be derived from the fact that radiation of
energy ∆E imparts a momentum
∆E
∆p = cos θ (13.25)
c
Derivation of (13.24)
Consider a thin layer of space adjacent to a container wall with area A. If the layer
has thickness ∆z, then the volume in the layer is A∆z. Half of the radiation inside
the layer flows toward the wall, where it is absorbed. The total energy in the layer
that will be absorbed is then ∆E = (A∆z)u field /2, which arrives during the interval
∆t = ∆z/(c cos θ), assuming for the moment that all light is directed with angle θ;
we must average the angle of light propagation over a hemisphere.
The pressure on the wall due to absorption (i.e. force or d p/d t per area) is then
2π π/2
∆p 1
dφ sin θ d θ
R R
∆t A Zπ/2
0 0 u field u field
P abs = = cos2 θ sin θ d θ = (13.26)
2π π/2 2 6
dφ sin θ d θ
R R
0
0 0
In equilibrium, an equal amount of radiation is also emitted from the wall. This
gives an additional pressure P emit = P abs , which confirms that the total pressure is
given by (13.24).
Figure 13.5 Field inside a black-
body radiator.
We derive the Stefan-Boltzmann law using the concept of entropy, which is
defined in differential form by the quantity
dQ
dS ≡ (13.27)
T
where d Q is the injection of heat (or energy) into the radiation field in the box
and T is the temperature at which that injection takes place. We would like to
write d Q in terms of u field , V , and T . Then we may invoke the fact that S is a state
variable, which implies
∂2 S ∂2 S
= (13.28)
∂T ∂V ∂V ∂T
This is a mathematical statement of the fact that S is fully defined if the internal
energy, temperature, and volume of a system are specified. That is, S does not
depend on past temperature and volume history; it is dictated by the present
state of the system.
To obtain d Q in the form that we need, we can use the 1st law of thermody-
namics. It states that a change in internal energy dU = d (u fieldV ) can take place
by the injection of heat d Q or by doing work dW = P dV as the volume increases:
d Q = dU + P dV = d (u fieldV ) + P dV
1
= V d u field + u field dV + u field dV (13.29)
3
d u field 4
=V d T + u field dV
dT 3
We have used energy density times volume to obtain the total energy U in the radi-
ation field in the box. We have also used (13.24) to obtain the work accomplished
by pressure as the volume changes.
326 Chapter 13 Blackbody Radiation
V d u field 4u field
dS = dT + dV (13.30)
T dT 3T
When we differentiate (13.30) with respect to temperature or volume we get
∂S 4u field
=
∂V 3T (13.31)
∂S V d u field
=
∂T T dT
We are now able to evaluate the partial derivatives in (13.28), which give
which depends on the number of configurations n obj for a given state (defined,
for example, by fixed energy and volume). Now imagine that the object is placed
in contact with a very large thermal reservoir. The ‘object’ could be the electro-
magnetic radiation inside a hollow blackbody apparatus, and the reservoir could
be the walls of the apparatus, capable of holding far more energy than the light
field can hold. The condition for thermal equilibrium between the object and the
reservoir is
∂S obj ∂S res 1
= ≡ (13.35)
∂Uobj ∂Ures T
where temperature has been introduced as a definition, which is consistent with
(13.27).
The total number of configurations for the combined system is N = n obj n res ,
where n obj and n res are the number of configurations available within the object
and the reservoir separately. A thermodynamic principle is that all possible
13.B Boltzmann Factor 327
N
P∝ = n res = e S res /kB (13.36)
n obj
¡ eq ¢ ∂S res ¯
¯
∼ eq
¡ ¢
S res (Ures ) = S res Ures + ¯ Ures −Ures + ... (13.37)
∂Ures Ures
¯ eq
Higher order terms are not needed since we assume the reservoir to be very large
so that it is disturbed only slightly by variations in the object. Since the overall
energy of the system is fixed, we may write
eq
Ures −Ures = ∆Ures = −∆Uobj (13.38)
where ∆Uobj is a small change in energy in the object. When (13.35), (13.37), and
(13.38) are introduced into (13.36), the probability for the specific configuration
eq ∆Uob j
1
S (U )−
becomes P ∝ e kB res res kB T , or simply
∆Uob j
−
P ∝e kB T
(13.39)
since the first term in the exponent is constant. ∆Uobj represents an amount
energy added to the object to establish a configuration. In the case of blackbody
radiation, a mode takes on energy ∆Uobj = nħω, where n is the number of energy
quanta in the mode. The probability that a mode carries energy nħω is therefore
− knħω
proportional to e B T
.
328 Chapter 13 Blackbody Radiation
Exercises
P13.1 The Sun has a radius of R S = 6.96 × 108 m. What is the total power that
it radiates, given a surface temperature of 5750 K?
P13.3 Derive (or try to derive) the Stefan-Boltzmann law by integrating the
(a) Rayleigh-Jeans energy density
Z∞
u field = ρ Rayleigh-Jeans (ω) d ω
0
Please comment.
(b) Wien energy density
Z∞
u field = ρ Wien (ω) d ω
0
Please evaluate σ.
R∞ 6
HINT: x 3 e −ax d x = a4
.
0
(c) Planck energy density
Z∞
u field = ρ Planck (ω) d ω
0
0.00290 m · K
λmax =
T
which gives the strongest wavelength present in the blackbody spectral
distribution.
HINT: See Example 13.1. You may like to know that the solution to the
transcendental equation (5 − x) e x = 5 is x = 4.965.
(b) What is the strongest wavelength emitted by the Sun, which has a
surface temperature of 5750 K (see P13.1)?
(c) Also find νmax and show that it is not the same as c/λmax . Why would
we be interested mainly in λmax ?
Index
331
332 INDEX
uniaxial, 123
unit vector, 1
unpolarized light, 145, 159