Optics Lecture Notes
Optics Lecture Notes
Optics Lecture Notes
1 Introduction 4
1.1 Structure of the topics in optics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Video material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1
3.6.1 The cause of (in)coherence: fluctuations, correlations, and time-averages . . . . . . . . . . . . 73
3.6.2 Coherence width . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.6.3 Spatial coherence and temporal coherence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.6.4 Coherence and imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.7 Exercise: dark field image of a grating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
2
9.2.4 A better algorithm: the Hybrid Input-Output algorithm . . . . . . . . . . . . . . . . . . . . . 145
9.2.5 Multiple solutions to the phase retrieval problem . . . . . . . . . . . . . . . . . . . . . . . . . 147
9.3 Ptychography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
9.3.1 Object reconstruction using a known probe . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
9.3.2 Simultaneous object and probe reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
3
Chapter 1
Introduction
Moreover, I think that the structure as proposed in these lecture notes teaches one to approach optical prob-
lems with the most practical mindset. For example, when designing optical instruments such as cameras, it is often
sufficient to think of light in terms of rays, while ignoring its vectorial electromagnetic nature. After all, histori-
cally, people have shown themselves quite capable of designing optical instruments without having any knowledge
of the electromagnetic nature of light. So if one were instead to teach optics starting from Maxwell’s equations
of electromagnetism, and deriving ray optics as an approximate solution, it could give the wrong impression that
understanding Maxwell’s equations is a prerequisite to being able to trace rays and perform useful work in the field
of optics.
To give the reader clear guidance on what the relevant points are, each chapter starts with a checklist of ques-
tion one should be able to answer after reading and comprehending the chapter.
4
Throughout these lecture notes, I will refer to other references which I think are helpful in understanding the
subject matter. I think that videos have the capacity to convey much more information much more efficiently
through the combined use of audio and rapidly changing visuals. In older times when videos were not as readily
accessible to us as they are today, one could do little better than print static figures, cluttered with annotations, and
explained in lengthy prose, which forces the reader to go back and forth between the figure and its accompanying
text in an attempt to fully understand the figure.
However, with videos, one can build up the figure one component at a time, while guiding the viewer through
the process via audio commentary, so that the view can remain focused on the figure. It can be played back at
whatever speed a certain individual finds appropriate, and it can be replayed as many times as necessary. Now that
everyone has easy access to the tools required to create and view such videos, I believe that educators should more
widely make an effort to use them optimally. Choosing to restrict oneself to the old medium of static figures and
long texts seems to me unacceptable in this day and age. While of course written texts have their own advantages
and should not become obsolete (hence why I wrote the lecture notes), one should start to make a serious effort to
supplement them with more modern media of high quality.
To anyone who wishes to start making low-cost educational videos with high-quality content themselves (which
I would encourage), I will quickly outline the procedure by which I choose to make them myself, in case it would
be of any help:
• Make slides in MS PowerPoint (or some freeware alternative that has an equation editor), and save each slide
as a .png-file.
• Write a script and record it using Audacity (free audio recording software). The ‘Noise-Reduction’ tool may
be very helpful, depending on the microphone you use. Export the recording as .wav or .mp3.
• Import the slides and recorded audio to OpenShot Video Editor (free video editing software). Match the
slides to the audio simply by dragging and dropping the slides to the right locations. Export the video as
an .mp4-file. Video tutorials for using OpenShot are found here: Jonathan Thomas - Getting Started —
OpenShot Video Editor Tutorial: youtu.be/1k-ISfd-YBE.
• When I make longer videos, I edit them in smaller chunks (∼10 minutes), save the videos separately, and
combine them afterwards. If you edit the entire video in one single OpenShotVideoEditor file, then saving
the file can take a long time, which can become a pain if you save regularly while editing.
Personally, I think this video lecture format has the following advantages over live-recording tablet notes or recording
a live lecture:
• Text, equations, and figures are more legible than if they were hand-written/drawn.
• Carefully writing a script and reading it allows one to formulate a message more concisely than if one were
to improvise a story.
• Visuals can be changed more rapidly while still matching the audio well, and without requiring the speaker
to multitask.
5
Part I
6
Chapter 2
Geometric/ray optics
Videos:
• ‘University level introductory optics course’: youtu.be/20jRuOFQTLM, time stamps: 6:24 - 23:16
• ‘01. Geometric Optics’: youtu.be/xSLoXZ3W49c
Light as rays:
• What is the evidence indicating that light can be modeled as rays?
7
Aberrations:
• What does it mean for a system to be ‘aberrated’ ?
• What can cause aberrations?
So let’s see how a pinhole camera works. Suppose we have an object, and we have a screen some distance away
from the object. If each object point emits rays in all directions, then each object point generates a large blur of
light on the screen. Because all object points are blurred on the screen, no clear image is formed. Now consider
what happens if between the object and the screen, we insert a wall with a small pinhole (see Fig. 2.1). Because
the rays that are scattered from each object point travel in straight lines, the pinhole makes sure that the light
that comes from a single object point can reach the screen only in a single point (or at least a very small region).
As a result, a clear image is formed, and this image is inverted because the rays coming from the top of the object
will reach the bottom of the screen. So we see how the pinhole camera supports the idea that light consists of rays
traveling in straight lines, and that each object point scatters rays of light in all directions. Moreover, it illustrates
what it means to create an image of an object:
Each object point scatters rays of light in all directions, and rays of light travel in straight lines.
Creating an image of an object means making sure that all the light coming from a single
object point, ends up in a single image point.
We saw how this imaging condition is achieved in the case of a pinhole camera: the pinhole selects only a narrow
bundle from all the rays that are scattered by a certain object point, and this narrow bundle can only cover a
small region of the screen. The downside of this method of creating images is obvious: you block a lot of light, so
the image will be very dim. Hence, to view the image, the image needs to be formed in a dark room, or ‘camera
obscura’. One might wonder whether there is a way to form images that captures more light from the object. This
would require that the divergent set of rays that comes from a single object point is made to converge in a single
image point. To achieve that, we need to be able to change the direction light rays.
8
Figure 2.1: Left: how a pinhole camera works. Each object point emits rays in all directions. The pinhole selects
a single ray, such that on the screen, each point receives light from only a single object point. Right: in general,
an imaging system makes sure that each image point receives light from only a single object point. If we want to
capture more than one ray per object point, it means we have to make diverging rays converge again.
Videos:
• ‘Making Your Own Room With a View — National Geographic’: https://youtu.be/gvzpu0Q9RTU
Here, θi and θt denote the angles of the incident ray and transmitted ray respectively. Again, these angles
are defined with respect to the surface normal (see Fig. 2.2). ni and nt are called the refractive indices of the
media in which the incident and transmitted rays travel. For example, air has a refractive index of nair = 1,
water has a refractive index of nwater = 1.3, and glass has a refractive index of nglass = 1.7. The refraction of
light explains why a straw looks bent if it’s partially submerged in water, and why a submerged object has an
apparent depth that is different than its actual depth. If we are familiar with the wave nature of light (which
will be discussed in Chapter 3), we may remark that
• The refractive index of a medium is related to the speed of light in that medium. E.g. cglass = c/nglass ,
where cglass denotes the speed of light in glass, and c denotes the speed of light in vacuum.
• The refractive index of a medium affects the wavelength of light in that medium. E.g. λglass = λ/nglass ,
where λglass denotes the wavelength of light in glass, and λ denotes the wavelength of light in vacuum.
• The refractive index of a medium depends on the wavelength of light in that medium, e.g. the refractive
index of glass nglass (λ) is a function of the vacuum wavelength. This property is called dispersion, because
the wavelength-dependence of the refractive index gives glass the ability to separate (‘disperse’) different
wavelengths (i.e. colors) of light. Think for example of how a prism can separate white light into a
spectrum of colors.
However, at this point it is sufficient to know that the refractive index n is ‘some number’ that you can look
up for different media in a table.
9
Figure 2.2: Two ways to change the direction of light rays: reflection and refraction
So reflection and refraction are two ways to change the direction of rays of light, and we know the laws that describe
how the direction is changed. This has enabled us to design optical elements that can focus a divergent set of rays
into a single point, and therefore create an image of an object:
To create an image, we need to focus a divergent set of rays coming from an object point into a
single image point.
To focus light using refraction we use lenses (pieces of glass with curved surfaces).
By using the law of reflection and Snell’s law of refraction, we can determine what curva-
ture should be present to achieve imaging.
(If one is familiar with the wave nature of light, one can introduce a third way to focus
light, namely by diffraction. This is done in a Fresnel zone plate.)
10
Figure 2.3: To create an image, we must change the direction of light rays such that a diverging set of rays from a
single object point are made to converge in a single image point (see Fig. 2.1). We saw that there are two ways to
change the direction of light rays: reflection and refraction (see Fig. 2.2). Therefore, two optical elements that can
be used to create images are curved mirrors (reflection) and lenses (refraction).
However, let’s for now not be concerned with the complexities of lens design, but rather let’s assume that we
have some perfect lens that can focus a diverging bundle of rays into a single image point, and let’s see what sort of
imaging systems we can create using one or multiple of such ideal lenses. One might ask: if we already have an ideal
lens with which we can create images, why should we bother with sequences of multiple lenses? To answer that
question, consider the fact that the human eye itself can be considered a lens that creates an image on the retina.
Very often we use a lens to aid our vision, for example when we wear glasses, or use a magnifying glass. Such an
optical instrument combined with the human eye already forms a compound optical system. One should note that
when we use a lens as a magnifying glass, we don’t use it to create an image: it doesn’t focus the diverging rays of
light from an object point into an image point. Rather, a magnifying glass bends the rays of light coming from a
certain object in such a way, that it appears as if they came from an object point located elsewhere (see Fig. 2.4). In
particular, in the case of a magnifying glass, the rays coming from an object are bent in such a way that it appears as
if they came from a larger object. This apparent larger object is said to be the virtual image that is created by the
11
magnifying glass. The lens in our eye then converts the virtual image into a real image that is projected on the retina.
Real image:
• All rays coming from an object point are focused in one image point.
• Can be projected onto a screen.
Virtual image:
• All rays coming from an object point are bent such that it appears as if they came from another
point.
• Requires another lens to be converted into a real image (e.g. the lens of the eye).
• Can be seen by looking through an optical system (think of e.g. magnifying, or a pair of
binoculars).
Figure 2.4: The difference between a real image and a virtual image. In a real image, rays from a single object
point are made to converge in a single image point. In a virtual image, rays from a single object point are bent in
such a way that it appears as if they came from another point (the virtual image point). The eye converts a virtual
image into a real image on the retina.
To determine what kind of image a single lens creates, we can trace several rays according to a few simple
rules. First we must decide whether the lens is convex/positive (it makes a bundle of rays more convergent) or
concave/negative (it makes a bundle of rays more di vergent). Then, we must know how strong the lens, i.e. we must
know the focal distance f . The smaller the focal distance, the stronger the lens. Once we know whether the lens
is convex or concave, and what the positions of the focal points are, we can trace rays according to the following rules:
12
Ray tracing rules for an ideal convex/positive thin lens:
• A ray that is horizontal before the lens will be bent such that it passes through the back focal
point of the lens.
• A ray that passes through the center of the lens will not be bent, i.e. it will go straight through.
• A ray that passes through the front focal plane before the lens will be bent such that after the
lens it will be horizontal.
• A ray that would pass through the back focal point if the lens were absent, will be bent such
that it goes horizontal after the lens.
To see how the lens forms an image of an object, we can take an object point and trace these rays according
to these rules. If the rays intersect after the lens, then the point of intersection is the (real) image point. If the
transmitted rays don’t intersect after the lens, but they intersect before the lens if we extend them, then that point
of intersection is the (virtual) image point.
These rules are simple enough to apply to a single lens, or perhaps two lenses. However, for arbitrary imaging
systems that consist of more lenses, manual ray-tracing will become cumbersome. To analyze optical systems more
systematically, one typically uses ray transfer matrices. In this formalism, we put the height y of a ray and its angle
θ in a vector (see Fig. 2.5):
y
. (2.3)
θ
As the ray travels through an optical system, the height and the angle may change. For example, if the ray prop-
agates freely at a non-zero angle θ (i.e. not horizontally), the height will change. If the ray propagates through
through a lens at non-zero height (i.e. not through the center of the lens), then the lens will bend the light, which
means the angle will change. We can describe these changes in height and angle using matrices, provided that we
make the paraxial approximation:
Paraxial approximation, ‘small-angle approximation’: the angle θ that any ray makes with the hori-
zontal axis (also called ‘optical axis’) is so small that sin θ ≈ tan θ ≈ θ.
Now we can see what the transfer matrices look like for free-space propagation and propagation through an ideal
thin lens:
• Free-space propagation over a distance d: If a ray starts at height y1 and propagates for a distance d
at an angle θ1 , then the new height will be y1 + d tan θ1 , and the angle will be unchanged. By applying the
paraxial approximation we can write the new height and angle as
y2 = y1 + dθ1 ,
(2.4)
θ2 = θ1 ,
• Ideal thin lens with focal length f : If a ray enters the lens horizontally (i.e. θ1 = 0) at height y1 , then
after the lens, the angle θ2 will be such that the ray goes through the back focal point. If the lens is thin,
13
then the height of the ray will not have changed immediately after transmission by the lens. This means that
− tan θ2 = y1 /f . In the paraxial approximation we have
θ2 = −y1 /f if θ1 = 0. (2.6)
Moreover, we know that if the ray travels through the center of the lens (i.e. y1 = 0), then the ray will not
be bent, i.e.
θ2 = θ1 if y1 = 0. (2.7)
Putting these results together gives
θ2 = θ1 − y1 /f. (2.8)
Combined with the fact that y2 = y1 for a thin lens, we can write the matrix equation
y2 1 0 y1
= . (2.9)
θ2 −1/f 1 θ1
1 0
So the ray transfer matrix for transmission through an ideal thin lens with focal length f is . Note
−1/f 1
that for a convex lens we take f positive, while we can describe a concave lens by taking f negative.
With these two transfer matrices we can analyze a simple single-lens imaging system. Suppose we have an object
that is a distance so before an ideal thin lens with focal length f , and there’s a screen a distance si behind the lens.
Then the system transfer matrix is found by multiplying the transfer matrices for free-space propagation and thin
lens transmission (note that the first multiplication is with the matrix on the right):
1 − sfi so + si − sofsi
1 si 1 0 1 so
= . (2.10)
0 1 −1/f 1 0 1 − f1 1 − sfo
Now we can ask: under which condition is an image of the object formed at the screen? Recall what it means to
create an image: all the rays coming an object point must converge at a single image point. In other words, given
an object point at height y1 , then the final height y2 shouldn’t depend on the initial ray angle θ1 . Therefore:
14
Figure 2.5: Ray transfer matrices for free-space propagation and lens transmission, and how they can be used to
derive the imaging condition for a single thin lens.
If we image a small object onto a screen, then defining the magnification is rather straightforward: we mea-
sure the height of the image and we divide it by the height of the object. This is called the linear magnification
(see Fig. 2.6). But what do we do if we don’t project an image onto a screen, but instead we look at a virtual
image through e.g. a magnifying glass? How do we then define the magnification? Naively one might say that
we could simply divide the height of the virtual image by the height of the object. But that can easily yield a
very misleading measure of magnification. For example, if we put an object at the focal point of the magnifying
glass, then the virtual image is infinitely large, which would mean the magnification is also infinite. However, when
we look through the magnifying glass, the image doesn’t look infinitely large, because it is also infinitely far away,
which makes the image appear smaller due to perspective. So how can we define a sensible measure of magnification
for cases where look at a virtual image through an optical instrument?
15
Figure 2.6: Linear magnification is simply the image height divided by the object height.
The main issue is that we need to define the apparent size of an object and image, rather than its actual size.
The apparent size and actual size are different due to the effect of perspective: a large object that is far away
may appear smaller than a small object that is close by (see Fig. 2.7). To have a measure of apparent object size
(or image size) that properly takes the effect of perspective into account, one may intuitively think that we can
divide the object height by the object distance. It turns that this is indeed the correct expression in the paraxial
approximation: to quantify the apparent size of an object, we use the concept of angular size. The angular size is
the angle that an object subtends on your field of view. If the object has height y and is a distance s away, then
the angle θ that it subtends is y y
θ = tan−1 ≈ . (2.13)
s s
In the final expression, the paraxial approximation is applied, we indeed find that the apparent size of an object,
which takes perspective into account, is given by the actual object size divided by its distance to the viewer.
Therefore, to define the magnification of an optical instruments through which we view a virtual image, we use the
angular magnification, which is given by the angular size of the virtual image as seen through the optical instrument,
divided by the angular size of the object as seen by the unaided eye.
16
Figure 2.7: Angular size is a way to quantify the apparent size of an object. I.e. angular size takes perspective into
account: a large object may appear smaller than a small object, if the large object is far away, and the small object
is nearby.
• Linear magnification: image height divided by object height. If it’s negative, it means the
image is inverted.
• Angular size: the ‘apparent size’ of an object, i.e. the angle that an object subtends on your
field of view. This is determined by both the object height and the viewer’s distance to the
object: the angular size becomes smaller if the object is farther away, i.e. the object appears
smaller due to perspective. In the paraxial approximation, the angular size is given by the
object size divided by the object distance.
• Angular magnification: the angular size of the virtual image as seen through the optical instru-
ment, divided by the angular size of the object as seen by the unaided eye.
Now let’s see how we can calculate the angular magnification for a magnifying glass, microscope, and for a telescope.
The reason why we choose these optical instruments, is because they are fundamentally different in an important
way: a magnifying glass or microscope serves to magnify objects that are nearby but small, whereas a telescope
serves to magnify objects that are large but far away.
First we examine the magnifying glass, which is basically a simple positive ideal thin lens (see Fig. 2.8). To
calculate its angular magnification, we must first determine the angular size of the object as seen with the unaided
eye. We know that the angular size depends on the viewer’s distance to the object, so what distance should we
choose? Since we want to see object as large as possible, it makes sense to put the object as close to our eyes as pos-
sible while still being able to focus on it. This point is called the near point, and it has a distance of snear = 25cm to
17
your eye. Note that this value is purely determined by biological factors, it has no fundamental physical relevance.
Therefore, if the object has a height y, then its angular size is
y
θunaided = . (2.14)
snear
Now let’s put the object in the focal point of the magnifying glass. The virtual image will be infinitely large, and it
will lie infinitely large away. However, we can still calculate the angle it subtends: if the magnifying glass has focal
length f , then rays that form the tip of the virtual image propagate (in the paraxial approximation) at an angle
y
θmag = . (2.15)
f
Therefore, the angular magnification of a magnifying glass is given by
θmag snear
m= = . (2.16)
θunaided f
Now let’s examine a simple two-lens compound microscope (see Fig. 2.9). In a simple compound microscope,
the objective lens creates a magnified real image of the object, and this image is then viewed through the eyepiece
lens as if it were a magnifying glass. If the objective lens creates a real image with magnification Mob , and the
eyepiece lens magnifies this image with an angular magnification of me , then the total angular magnification of the
compound microscope is
mmicroscope = Mob · me . (2.17)
18
Figure 2.9: Angular magnification of a compound microscope.
Now let’s examine the telescope (see Fig. 2.10). Note that this case is fundamentally different from the previous
two cases, because we don’t want to view small nearby object, but a huge far away object. What this should tell you,
is that when we want to find the angular magnification of the telescope, it should never involve the near point snear .
In the case of a magnifying glass or microscope, it makes sense to use the near point, because that’s where you’d view
a small object with the unaided eye. However, it makes no sense to view a far-away star or planet at your near-point!
When you look through a telescope, we can make the approximation that the object we’re imaging lies at in-
finity. The cone of rays that is scattered by each object point will have approximately become a parallel bundle
of rays once it reaches the telescope. Different object points generate parallel bundles of rays at different angles.
In a simple two-lens telescope, the first lens (which is called the objective lens) converts these bundles of rays to
a real image. The bundle of rays with the largest angle (which comes from the outermost object point) defines
the angular size of the object when seen by the unaided eye. From simple geometry, it follows that in the paraxial
approximation the real image formed by the objective lens with focal length fob has a height of
yob = −fob θunaided . (2.18)
The negative sign indicates that the image is inverted. This real image is then viewed through the eyepiece lens
with focal length fe as if you’re viewing an object through a magnifying glass. The angular size of this image seen
through the eyepiece lens is, according to Eq. (2.15), equal to
yob
θmag = . (2.19)
fe
Combining this result with Eq. (2.18) gives the angular magnification of the telescope
θmag fob
=− . (2.20)
θunaided fe
It is instructive to note the value of the concept of angular magnification here. If we were to use the telescope to
create a real image that we can project on a screen or capture with a camera, the linear magnification would be
19
practically 0, because the celestial object that we’re imaging is many times larger than the image we create on the
camera or the screen. But how can the magnification be nearly 0, if the image we see on the screen appears clearly
larger to us than the celestial object as viewed with the naked eye? The reason is that linear magnification doesn’t
take into account the effect of perspective, whereas angular magnification does. The real image we may create with
a telescope is many times smaller than the celestial object we’re imaging, but because this small real image is much
closer by, it still appears larger to us than the celestial object seen from very far away.
20
• Reduction of aberrations: an aperture can block aberrated rays which would otherwise blur the
image.
• Through-focus resolution: an aperture stop can determine how quickly the image resolution
changes as you move the object or detector out of best focus, i.e. depth of focus.
• Through-focus magnification: an aperture stop can determine how the image magnification
changes as you move the object or detector out of best focus, i.e. telecentricity.
There are many different ways in which an imaging system can be aberrated. One specific type of aberration
is called spherical aberration. In this case, when an object point scatters a cone of rays, then the rays at the edge
of the cone will focus in a different place than the rays at the center of the cone. As a result, the image will be
blurred. To prevent this from happening, we can introduce an aperture stop so that only the rays near the center
of the cone reach the image plane. Now all the rays from a single object that reach the image plane do converge at
a single image point.
However, when we reduce the size of the aperture stop, there will also be some consequences that cannot be
explained with the ray model of light, but they can be understood with the wave model of light. As we will see in
Section 3.4, the imaging resolution will decrease if we make the aperture stop smaller, which is due to diffraction
effects. Therefore, there will be an optimal size for the aperture stop where it is small enough to reduce the effects
of aberrations, but large enough to reduce the effects of diffraction.
Figure 2.11: The effect of the aperture stop on reducing aberrations. Top: there are aberrated rays which don’t
converge to the image point, thereby degrading image quality. Bottom: the aperture stop blocks the aberrated
rays, thereby improving image quality.
Reducing the size of an aperture stop reduces the effect of aberrations by blocking the rays that
wouldn’t converge at the image point.
21
2.5.2 Depth of focus
So far, we have only considered what happens in the image plane, where, for an ideal imaging system, all the rays
that come from a single object converge. However, objects at different distances have different image planes, so
if we have only one detection plane, then inevitably some objects will be out of best focus. Therefore, it is also
important to examine what happens slightly away from the image plane (i.e. what happens in the case of slight
defocus).
The divergent cone of rays that is scattered by a single object point will become a convergent cone of rays in
the image plane. As we move away from the image plane, the cone of rays expands, and the image becomes blurred.
The rate at which the image becomes blurred depends on how quickly this cone of rays diverges. This in turn
depends on the size of the aperture stop: if the aperture stop is smaller, then cone of rays becomes narrower, and
therefore the depth of focus of the imaging system increases. However, as mentioned in the previous section, one
should keep in mind that reducing the size of the aperture stop also increases the negative effects of diffraction.
Figure 2.12: The effect of the aperture stop on the depth of focus. Top: in the image plane, the rays that form the
image diverge rapidly, which leads to a large blur when moving the detector out of focus. Bottom: the aperture
stop reduces the cone of rays that form the image, so that the blur becomes less large when moving the detector
out of focus.
Reducing the size of an aperture stop increases the depth of focus by making the cone of rays
that comes from an object point narrower, thereby making it diverge less quickly when moving the
detection plane out of focus.
2.5.3 Telecentricity
When discussing depth of focus, we examine how resolution changes as function of defocus. Telecentricity refers to
how magnification changes as function of defocus. Depending on the application, one may have different require-
ments for how magnification should depend on object distance. For example, if we want to capture perspective
in a natural manner, objects that are farther away should appear smaller on your image. However, if you want
an optical instrument that can measure the physical size of an object, independent of its distance to your imaging
system, then you’d want the magnification to be independent of defocus. So let’s see how the position of an aperture
affects how magnification changes as function of defocus.
Since we’re now only focusing on magnification, and not resolution, we only need to examine the main direc-
tion of the cone of rays, not the rate at which it diverges. The main direction of the cone of rays is given by its
central ray. This ray is called the chief ray. Rays at the edges of the cone of rays are called marginal rays. Since the
aperture stop determines by definition the cone of rays that reaches the image plane, the chief ray goes through the
center of the aperture stop, and the marginal rays touch the edges of the aperture stop. Therefore, the position of
the aperture selects which ray will be the chief ray, and therefore it determines the direction of the chief ray in the
object plane and image plane. This in turn determines how magnification changes as function of object distance
and detection plane.
If we want the imaging system to be telecentric, i.e. if the magnification should be constant as function of object
position and detection plane position, then the chief rays should be horizontal. One can make a further distinction
22
between object-side telecentrity (chief rays at the object plane are horizontal), and image-side telecentricity (chief
rays at the image plane are horizontal).
Figure 2.13: The effect of the aperture stop on telecentricity. Top: the aperture stop is placed such that the chief
rays are horizontal in the image plane, which means that the magnification remains constant when moving the
detector out of focus. Bottom: the aperture stop is placed such that the chief rays are not horizontal in the image
plane, which means that the magnification changes when moving the detector out of focus.
To determine which part of the cone of rays from each object point will reach the image plane (i.e. which part
will be transmitted by the aperture stop), it could be cumbersome to trace the rays through the optical system
until they reach the aperture stop. Therefore, one typically treats the optical system as a black box. We define the
entrance pupil as the image of the aperture stop that is generated by all the optical elements before the stop. If a
ray is blocked by the entrance pupil, it means it will be blocked by the aperture stop as well (that follows directly
from the definition of an image: a ray that goes through a certain object point goes through the corresponding
image point regardless of the ray angle). Therefore, to see which cone of rays will reach the image, one only needs
to trace the rays from an object point directly to the entrance pupil, rather than through all optical elements before
the aperture stop. Similarly, one defines the exit pupil as the image of the aperture that is generated by all the
optical elements after the aperture stop. To see which cone on rays forms a certain image point, one only needs
to trace the rays from the image point to the exit pupil. The optical system can thus be described as a black box,
where one only needs to trace the rays from the object to the entrance pupil, and from the exit pupil to the image.
All the aberrations of the optical systems are condensed into a single operation that relates the optical field in the
exit pupil to the field in the entrance pupil.
The location of an aperture stop determines which ray from the cone of rays that comes from an
object point will be the chief ray. The direction of the chief ray determines how the magnification
changes as a function of defocus.
2.6 Aberrations
In an ideal imaging system, all the rays that come from a single object point, converge in a single image point.
When the rays do not exactly converge in a single image point, the image gets blurred, and we say that the imaging
23
system is aberrated. The word ‘aberrated’ comes from the Latin ‘ab’ (=‘away from’) and ‘errare’ (‘to wander’ or
‘to make an error’), which when put together means ‘to stray’. The word ‘aberration’ is therefore in no way related
to the optical scientist Ernst Abbe, who made important contributions to understanding the diffraction limit of
microscopes. By knowing the origin of the word, it should be easy to remember that ‘aberration’ is spelled with
one ‘b’ and two ‘r’s.
Why might an imaging system be aberrated? One can think of the following causes:
• Fabrication error: the surfaces of the manufactured lenses aren’t exactly as designed, which causes the rays
to deviate from their designed trajectories.
• Misalignment: the lens elements may be positioned at different locations or orientations than what was
designed.
• Error in the paraxial approximation: the imaging system was designed using the paraxial approximation,
e.g. by using ray transfer matrices as described in Sec. 2.3. The actual ray trajectories can differ from the
designed trajectories if the paraxial approximation is not sufficiently valid.
• Dispersion: the imaging system was designed to work for a certain wavelength (i.e. color). Because the
refractive indices of the lens elements depend on the wavelength of the light, the ray trajectories will change
when changing the wavelength. This is called ‘chromatic aberration’. The other aberrations, which are
unrelated to the wavelength of the used light, are called monochromatic aberrations.
Since one can arbitrarily deform or misalign lens elements, there are hardly any restrictions on the aberrations that
an imaging system can have. However, to meaningfully communicate about the aberrations of imaging systems, we
must define and name certain aberrations. For the sake of practicality, these specific aberrations must be common,
and they must be able to describe more arbitrary aberrations, i.e.
One such set of aberrations are the Seidel aberrations, which are named after their inventor Ludwig von Sei-
del. He found these aberrations by considering the third-order correction to the paraxial approximation. I.e. he
compared the rays obeying the paraxial approximation sin θ ≈ θ, to rays obeying the more accurate approximation
sin θ ≈ θ − θ3 /3!, which includes the next term in the Taylor expansion for sin θ. This analysis led to the definition
of the five third-order Seidel aberrations. It is important to memorize the names and meanings of these aber-
rations if you don’t want to be overwhelmed whenever optical designers discuss errors in their imaging systems.
For an object point with coordinates x = 0 and y = h, the Seidel aberrations can be described by the follow-
ing formula1
This equation tells us the following. Consider the ray that travels from the object point (0, h) through the exit
pupil point (s sin ϕ, s cos ϕ) (i.e. ϕ is the angle between the object point coordinate vector and the exit pupil point
coordinate vector), see Fig. 2.14. In an ideal imaging system, this ray would end up at some image point (x′ , y ′ ).
For a fixed object point (0, h), the image point (x′ , y ′ ) would be the same for all exit pupil points (s sin ϕ, s cos ϕ)
(all rays coming from a single object point end up in a single image point). However, if aberrations are present,
and their strengths are given by the coefficients B1 . . . B5 , then the rays will deviate from the ideal image point by
an amount (∆x′ , ∆y ′ ) which may depend on s, ϕ, and h.
1 See section 3.2 of W.J. Smith, Modern optical engineering: the design of optical systems, 3rd Edition, McGraw-Hill Education,
2008.
24
Figure 2.14: Diagram indicating the meaning of h, s, and ϕ.
For a given object point h, we can plot in a spot diagram where all its rays (i.e. all s, ϕ) end up in the
image plane (i.e. x′ , y ′ plane). Such a diagram tells you how a certain object point is imaged: the more spread out
the spot is, the more blurred the image is, and the poorer the image resolution. Note that these aberrations are
rotationally symmetric: they depend only on the angle between the object point and pupil point. Therefore, it has
been assumed that the optical system is rotationally symmetric, which in practice doesn’t always have to be the case.
25
order, so there are in principle infinitely many aberrations in which an arbitrary rotationally symmetric aberration
can be decomposed as in Eq. (2.21). Moreover, there is in general not necessarily a guarantee that the imaging
system is rotationally symmetric, so aberrations can be described even more generally. In Sec. 3.5.6, aberrations
are discussed further in the context of wave optics.
Figure 2.15: Spot diagrams for the five third-order Seidel aberrations. Points with the same color have the same s
but different ϕ. To get an impression of what the spot diagrams look like for object points that do not lie on the
y-axis, realize that the Seidel aberrations assume a rotationally symmetric optical system. The Matlab code to plot
the figures is code 2.1.
26
B1 : spherical aberration
B2 : coma
B3 : astigmatism. Note that the characteristic feature of astigmatism (different focal planes for rays along the x-
and y-directions of the pupil) is not apparent in a side view. Also see Fig. 2.17.
B4 : field curvature
B5 : distortion
Figure 2.16: Ray diagrams for the Seidel aberrations as defined in Eq. (2.22). The Matlab code to plot the figures
is code 2.2.
27
Figure 2.17: In the case of astigmatism, the fan of rays along the x-axis has a different focal length than the rays
along the y-axis.
The five Seidel aberrations are found by considering the third-order correction to the paraxial
approximation. The aberrations are: spherical, coma, astigmatism, field curvature, and distortion.
They can be illustrated with spot diagrams.
Listing 2.1: MATLAB code to plot the spot diagrams for the Seidel aberrations as defined in Eq. (2.22)
1 % Pupil coordinates
2 s=linspace(0,1,10);
3 phi=linspace(0,2*pi,50);
4 [S,Phi]=meshgrid(s,phi);
5
6 % Object points
7 hvec=[1 0 −1];
8
9 figure(1)
10 set(gcf,'position',[10 50 200 720])
11 for k=1:length(hvec)
12 B1=0; B2=1; B3=0; B4=0; B5=0; % Seidel coeffiecients
13 h=hvec(k);
14
15 %Ray displacements
16 Dy=B1*S.ˆ3.*cos(Phi)+B2*S.ˆ2*h.*(2+cos(2*Phi))+(3*B3+B4)*S*hˆ2.*cos(Phi)+B5*hˆ3;
17 Dx=B1*S.ˆ3.*sin(Phi)+B2*S.ˆ2*h.*sin(2*Phi)+(B3+B4)*S*hˆ2.*sin(Phi);
18
19 subplot(length(hvec),1,k)
20 plot(Dx,Dy,'.−')
21 axis image
22 axis([−3 3 −3 3])
23 xlabel('\Delta x''')
24 ylabel('\Delta y''')
25 title(['h=' num2str(h)])
26 end
Listing 2.2: MATLAB code to plot the ray diagrams for the Seidel aberrations as defined in Eq. (2.22)
1 close all
2 clear all
3 clc
4
5 % Object points
6 hvec=[−1 −0.5 0 0.5 1];
7
8 %Pupil points
9 svec=linspace(−1,1,3);
10
28
11 x=linspace(−1.5,1.5,100);
12
13 figure(1)
14 set(gcf,'position',[10 50 700 300])
15
16 hold on
17 plot(x,zeros(size(x)),'k')
18 plot([0 0],[−1 1],'k')
19 axis([−1.5 1.5 −1.5 1.5])
20 axis off
21
22 for k=1:length(hvec)
23 for m=1:length(svec)
24 h=hvec(k);
25 s=svec(m);
26
27 y=size(x);
28
29 reg0=x<−1; %Before object point
30 reg1=(x≥−1).*(x<0)==1; %Between object point and lens
31 reg2=(x≥0); %After lens
32
33 %Seidel coefficients
34 B1=0; B2=0; B3=0; B4=0; B5=0.3;
35
36 %Ray displacement
37 Dy=B1*s.ˆ3+B2*s.ˆ2*h+(3*B3+B4)*s*hˆ2+B5*hˆ3;
38
39 % Formulas for rays before and after the lens
40 y(reg0)=NaN;
41 y(reg1)=(s−h)*x(reg1)+s;
42 y(reg2)=(−h+Dy−s)*x(reg2)+s;
43
44 a=1−k/length(hvec);
45 plot(x,y,'color',[1−a a a],'Linewidth',1)
46
47 end
48 end
49 hold off
29
2.7 Exercise: apparent depth
Goals:
You’re looking from a height h straight down at a point object that lies at a depth d below the water surface.
The water has a refractive index n, while air has a refractive index 1. Rays are emitted at all angles θ from the
object, and they enter the pupil of your eye which has radius R. Because of refraction, the object appears to be at
a different depth d′ .
1. Demonstrate that the apparent depth is given by
p
′ 1 − n2 sin2 θ
d =d . (2.23)
n cos θ
30
Chapter 3
Videos:
• ‘University level introductory optics course’: youtu.be/20jRuOFQTLM, time stamps: 23:23 - 1:26:03
• ‘02. Angular Spectrum Method’: youtu.be/31072jVfIUE
Light as waves:
• What is the evidence indicating that light can be modeled as waves?
• Write down the 1D/3D wave equation.
31
Diffraction gratings:
• What are two purposes to study diffraction gratings?
• What is the expression for the diffraction pattern of a diffraction grating?
Interferometry:
• What is interferometry useful for?
• Sketch the setup for a Michelson interferometer.
• What interference patterns can one observe in a Michelson interferometer and why?
• What is the condition for thin film constructive/destructive interference?
• What is the intensity transmitted by a Fabry-Pérot interferometer?
• What is the resolvance of a Fabry-Pérot interferometer?
• What happens to the part of the light field that contains features smaller than the wavelength of the light?
• How can super resolution be achieved?
Fourier optics:
• What is the uncertainty principle in the context of Fourier transform, and how can it be used to understand
how the NA determines the resolution limit?
• What is a convolution?
• How can Fourier transforms be used to calculate convolutions?
• How are low-pass filtering and Point Spread Functions (PSFs) related?
• How can Fourier transforms be used to interpret the diffraction pattern of a diffraction grating?
• How are aberrations modeled in Fourier optics?
32
Coherence:
• Why can laser light interfere more easily than sunlight in e.g. a double slit experiment?
• How is it possible that sunlight cannot form fringes in a certain double slit experiment, while the same sunlight
can show thin film interference effects?
So how did we come to the conclusion that light is in fact a wave? One convincing piece of evidence is the
interference pattern that is observed in Young’s double slit experiment. In this experiment, we shine light through
two slits onto a screen. If we shine light through only one slit, there will be a blur of light on the screen. If we shine
light through the other slit, there will be a similar blur. If we shine light through both slits, one might intuitively
expect that the intensities of the two blurs add together to form a blur that is twice as bright. But this is not what
is observed. Rather, we observe fringes of darkness where there would be light if we shone light through a single
slit. So how can light added to more light result in less light on some locations of the screen?
Physically speaking, the answer is that light is a wave. Mathematically speaking, the answer is that light doesn’t
just have brightness, but also a ‘sign’: i.e. a positive or negative sign, or more generally speaking, a ‘phase’ (in
the sense of complex numbers). This means that light can cancel itself out: if the peaks of one wave coincide with
the troughs of another wave, then the net result will be that there is no wave, just like +1 and -1 are ‘something’
when considered separately, but they yield ‘nothing’ when you add them together. The idea of complex phase can
be interpreted as a generalization of the +/- sign. Whereas with the +/- signs you either have a complete doubling
(|1 + 1| = 2, or | − 1 + (−1)| = 2) or a complete cancellation (|1 + (−1)| = 0), with complex phases you can have
anything in between: p
|eiϕ1 + eiϕ2 | = 2 + 2 cos(ϕ1 − ϕ2 ). (3.1)
As we can see from this equation, we achieve doubling when the phases ϕ1 , ϕ2 are the same (i.e. the ‘signs’ are the
same), and we achieve complete cancellation when the phases have a difference of π, which is the maximal difference
when considering phase is defined modulo 2π (i.e. the ‘signs’ are opposite).
In the following, we will look at how light can be described as a wave, how complex numbers and phase play
a role in this, how we can mathematically describe interference effects, and what the implications are for the reso-
lution of imaging systems.
Videos:
• ‘Imaginary numbers with real applications: complex exponentials and Euler’s formula’: youtu.be/-
4G8JLIOHMI
33
3.1.1 1D wave equation
The way that a wave ψ evolves in space and time is determined by the wave equation. To develop an understanding
of this wave equation, let’s first consider the simple one-dimensional case. In this case, the wave equation is given
by
∂ 2 ψ(x, t) 1 ∂ 2 ψ(x, t)
2
− 2 = 0, (3.2)
∂x c ∂t2
where x denotes position, t denotes time, and c denotes the propagation speed of light. One can verify by substitution
that the following expression (also known as the d’Alembert solution) satisfies the wave equation for arbitrary twice-
differentiable functions f and g
ψ(x, t) = f (x − ct) + g(x + ct). (3.3)
Physically speaking, this equation says that a solution ψ to the wave equation is the sum of some function f trav-
eling in the +x direction with speed c, and some function g travelling in the −x direction with speed c.
A special solution which is relevant in optics is the solution that consists of a single wavelength λ. Since in optics
the wavelength of light determines the color, such a solution is called monochromatic (mono=one, chroma=color).
A function with a single wavelength λ is given by cos (2πx/λ), because this function has period λ. To have it satisfy
the general solution to the wave equation as given by Eq. (3.3), we must substitute x with x − ct. This yields the
expression
2π
f (x − ct) = cos (x − ct) = cos (kx − ωt) , (3.4)
λ
where
ω = kc. (3.5)
2π 2π
Here, k = λ is called the wave number, and ω = 2πf = T is called the angular frequency. T is the time it takes
to complete oscillation, and f = 1/T is the frequency, which indicates the number of oscillations per unit time. So
we find that if the wave has a single wavelength λ (i.e. it’s monochromatic), then it follows that it also has a single
well-defined oscillation frequency. Therefore, monochromatic fields are also called time-harmonic fields.
The advantage of writing this expression in exponential form, is that we can now factorize this expression using
rules for exponentials:
ei(kx−ωt+ϕ) = eikx eiϕ e−iωt . (3.8)
For a time-harmonic field in three dimensions, which may vary as a function of space, we can write the field as
34
Here, we have factorized the field in a spatial part U (x, y, z), which may have non-trivial dependence on x, y, z,
and a temporal part e−iωt , which is trivial because we assume the field is monochromatic. Therefore, whenever
we calculate with time-harmonic (i.e. monochromatic fields), we typically just use the complex-valued function
U (x, y, z), and omit the time-dependence e−iωt . Since U is a complex-valued function, it has both an amplitude
and a phase which vary as a function of position. The amplitude indicates how large the oscillation is in a certain
point. The phase difference between two points indicates the delay in the oscillations between two points. E.g. if
the oscillations at the two points go up at the same time and go down at the same time, then the two points have
zero phase difference. If the oscillation goes up in one point while it goes down in another point, then there’s a
non-zero phase difference.
Note that what we observe experimentally, is not the complex-valued field U , but the time-averaged intensity
of the field. An optical field oscillates at around 1015 Hz, which is way too fast for any detector to measure. Since
we cannot measure the distinct oscillations in each point, we cannot directly measure the phase of the complex-
valued field (however, we can infer phase from interference effects that are visible in the intensity), and we can
only measure time-averaged quantities. The instantaneous intensity is given by ψ(x, t)2 , and it oscillates at optical
frequencies. The measured brightness is the time-average of the instantaneous intensity. To calculate time averages
of oscillating field, note that
⟨cos2 (ωt)⟩ = 1/2, (3.10)
where the angular brackets denote the time average. To easily see why this is true, note that sin2 (ωt) + cos2 (ωt) = 1
for all t, and that the time averages of sin2 (ωt) and cos2 (ωt) are equal because the functions are identical except
for a fixed shift.
To see how convenient complex notation is, consider the time-averaged intensity of the sum of two time-harmonic
fields
⟨(A cos(ωt) + B cos(ωt + ϕ))2 ⟩ . (3.11)
Expanding the square and using Eq. (3.10) gives
1 2 1 2
A + B + 2AB ⟨cos(ωt) cos(ωt + ϕ)⟩ . (3.12)
2 2
To further evaluate this expression, we use the trigonometric identity
1 1
cos(a) cos(b) = cos(a + b) + cos(a − b). (3.13)
2 2
Plugging this in (3.12) gives the final expression for the time-averaged intensity of the sum of two time-harmonic
fields
1 2
I = ⟨(A cos(ωt) + B cos(ωt + ϕ))2 ⟩ = A + B 2 + 2AB cos(ϕ) .
(3.14)
2
To find this expression, we had to carry around the time-dependence of the fields, we had to use a trigonometric
identity, and we had to explicitly take time-averages. We can arrive at the same result much more easily by assuming
that the time-averaged intensity I can be found by taking the squared modulus of the complex-valued field U
1
I(x) = |U (x)|2 . (3.15)
2
With this expression, we can leave out time-dependence, and we don’t have to explicitly take time-averages. To see
that this expression gives the correct result when taking the sum of two fields, one can straightforwardly calculate
1 2 1 1 2
A + Beiϕ = A2 + B + ABeiϕ + ABe−iϕ
2 2 2 (3.16)
1 1 2
= A2 + B + AB cos(ϕ),
2 2
where we used Euler’s identity, Eq. (3.6) to obtain the last line. Note that this result is the same as Eq. (3.14), as
it should be. Typically we don’t care about overall constant factors, so for convenience we don’t bother with the
factor 1/2 in Eq. (3.15), and instead we assume
35
It is useful to describe time-harmonic fields using complex notation, because then we don’t have write
down the trivial time-dependence, and we don’t have to explicitly calculate time-averages, which
typically requires the use of trigonometric identities. In complex notation, a time-harmonic field is
described by a complex-valued function that only depends on position. Its amplitude describes how
strongly the field oscillates in each point, while the phase difference between two points describe the
delay between the oscillations in the two points. To calculate the time-averaged intensity (which is
the quantity that we can measure experimentally), we take the squared modulus of the complex-
valued field.
3.1.3 3D waves
For the sake of simplicity, we considered the 1D wave equation. Now for completeness, we will consider the 3D
wave equation
1 ∂ 2 ψ(x, y, z, t)
∇2 ψ(x, y, z, t) − 2 = 0. (3.18)
c ∂t2
Here, ∇2 is called the Laplacian, and it means
∂2 ∂2 ∂2
∇2 = + + . (3.19)
∂x2 ∂y 2 ∂z 2
It is common in optics to consider time-harmonic fields. So we consider solutions of the form
Plugging this into the 3D wave equation, calculating the time derivative, and dividing out e−iωt results in the
Helmholtz equation
(∇2 + k 2 )U (x, y, z) = 0. (3.21)
Here we used ω = kc, see Eq. (3.5). Recall that k = 2π λ , where λ denotes the wavelength of the time-
harmonic/monochromatic field. Note that through the assumption of time-harmonic fields and the use of com-
plex notation, the Helmholtz equation has no dependence on time anymore. In the following, we will discuss two
solutions to the Helmholtz that are of particular importance in optics: plane waves and spherical waves.
There are several examples that illustrate why plane waves in particular are worthwhile considering:
• We can use it to derive Snell’s law (see Section 3.1.6). This provides a clear connection with ray optics.
• A spherical wave emitted by a point source will approximately become a plane wave when it propagates far
enough (see Section 3.1.8).
• It is common to decompose a field in plane waves (which is called a Fourier transform), and to add many
plane waves together (which is the inverse Fourier transform) (see Section 3.5).
36
Videos:
• ‘02. Angular Spectrum Method’: youtu.be/31072jVfIUE
eik|r|
U (r) = . (3.23)
4π|r|
More specifically, it’s a Green’s function of the Helmholtz equation, i.e. it’s a solution to
Intuitively, the Green’s function is an impulse response. It tells you what field is generated by a point source. In
other words, the Green’s function of the Helmholtz equation says that a point source emits a spherical wave. The
reason why this is important, is because it provides a clear connection with ray optics: to design an imaging system,
we assumed that each object point scatters rays in all directions. Now, in the context of wave optics, the picture
of having ‘rays propagating in all direction’ is substituted with ‘a spherical wave propagating in all directions’.
Whereas in imaging using ray optics it was important to have ‘all the rays coming from one object point converge
to a single image point’, in wave optics the goal is to ‘convert the diverging spherical wave coming from one point
source to a converging spherical wave that focuses the field in an image point’.
Figure 3.1: The wave equation, complex notation, Helmholtz equation, plane waves, and spherical waves.
37
3.1.6 Snell’s law
We saw that in the context of ray optics, Snell’s law determines how light is refracted when it transitions from one
medium to another. But why is this law valid? Can it be derived from more fundamental principles? We will see
that Snell’s law can be derived when we assume that a refractive index n reduces the wavelength λ by λ → λ/n,
and that the wave fronts must be continuous along the interface between the two media (see Fig. 3.2).
Figure 3.2: By assuming that the wavelength changes in a medium with refractive index n, and requiring that
the wavefronts are continuous along the interface, it follows that the direction of propagation changes when light
propagates from one medium to another (Snell’s law).
Let’s assume the interface lies in the xy-plane, so that the surface normal points in the z-direction. We assume
that the direction of propagation (i.e. the direction of the wave vector k) lies in the xz-plane, and we define the
angle of incidence θi as the angle that k makes with the z-axis. Using Eq. (3.22), we can thus write the incident
plane wave as
i 2π (x sin θi −z cos θi )
e λi . (3.25)
Similarly, if we define the angle of refraction θr as the angle that the wave vector kr of the transmitted field makes
with the z-axis, then we can write the transmitted plane wave as
2π
ei λr (x sin θr −z cos θr ) . (3.26)
Note that the wavelengths λi , λr are given by
λ0
λi,r = , (3.27)
ni,r
where λ0 is the vacuum wavelength, and ni , nr are the refractive indices of the incident and refractive media
respectively. The interface lies in the plane z = 0, so if we require that the fields are continuous along the interface,
we require that Eqs. (3.25) and (3.26) are equal for z = 0, and for all x. From this follows
1 1
sin θi = sin θr . (3.28)
λi λr
By expressing the wavelengths λi,r in terms of the vacuum wavelength λ0 as described by Eq. (3.27), we find
ni sin θi = nr sin θr , (3.29)
which is Snell’s law.
Snell’s law is derived by assuming that the wave number kn in a medium with refractive index n is
given by kn = nk0 (where k0 = 2π/λ0 is the wave number in vacuum), and requiring that the tangential
component (i.e. parallel to the interface) of the wave vector is continuous across the interface.
38
Note that such a situation can easily occur when ni > nr (e.g. in a transition from glass to air), and θi is sufficiently
large. First let’s introduce some useful terminology. The critical angle θc is the incident angle such that
ni
sin θc = 1. (3.31)
nr
Physically speaking, the critical angle is the incident angle for which the transmitted ray will propagate parallel to
the interface, i.e. θr = 90◦ . Therefore, we can see from Eq. (3.30) that if the incident angle exceeds the critical
angle, the refracted angle is ill-defined. In the context of ray optics, one can observe that in such a case the ray
is not transmitted, but reflected. This phenomenon is called Total Internal Reflection (TIR). However, when we
analyze the same situation using wave optics rather than ray optics, we find that something more is going on.
The length of the wave vector k should be k = 2π/λ. Therefore, since we consider the case where ky = 0, we
can write
p 2π
kx2 + kz2 = k = . (3.32)
λ
Assuming that the interface between the two media lies in the xy-plane, we require that at the interface kx is
continuous, i.e.
kx,i = kx,r . (3.33)
What will happen if kx,i > 2π
λr (i.e. the wavefronts at the interface are spaced apart by a distance that is shorter than
2
the wavelength in the refractive medium λr )? In that case, kz,r should according to Eq. (3.32) become negative,
but that can only be true if kz,r is imaginary. If we plug an imaginary kz in the expression for a plane wave (i.e.
Eq. (3.22)), then we obtain a field that propagates along the interface (in the x-direction), but decays exponentially
perpendicular to the interface (in the z-direction)
The existence of such a field is demonstrated by a phenomenon called Frustrated Total Internal Reflection (FTIR)
(the quantum mechanical analogue is known as quantum tunneling). Suppose you have two blocks of glass which
you put next to each other with just a small gap of air in between. If light inside the first block hits the glass-air
interface at an angle larger than the critical angle, then the evanescent field in the air gap will be converted to a
propagating wave in the second block of glass. So light has ‘tunneled’ across the air gap, and the total internal
reflection has been ‘frustrated’. The amount of light that is transmitted to the second block of light decays expo-
nentially with the air gap size, because the evanescent wave decays exponentially.
When light goes from an optically dense (i.e. high refractive index) medium to an optically less dense
medium, it is Totally Internally reflected if the incident angle exceeds the critical angle. With the
wave model of light, it can be demonstrated that there will be an exponentially decaying evanescent
field at the other side of the medium. The existence of this field can be demonstrated via Frustrated
Total Internal Reflection.
Consider two narrow slits, which are spaced a distance d apart. When illuminating these slits, each slit emits
a ‘spherical’ wave (technically it’s cylindrical, but if we look at a top view of the experiment, we might as well
consider them spherical, see Fig. 3.3). If we consider a small region on a screen that is very large distance L away,
then each spherical wave will approximately have become a plane wave. The angle of the plane wave depends on
39
the position of the slit. Using complex notation, we can write the fields from the two slits as
2π
U1 (x, z) = ei(kx,1 x+kz,1 z) = ei λ (x sin θ1 +z cos θ1 ) ,
2π
(3.35)
U2 (x, z) = ei(kx,2 x+kz,2 z) = ei λ (x sin θ2 +z cos θ2 ) .
From the geometry of the experimental setup it follows that for L ≫ d we can apply the paraxial approximation
d/2
sin θ1 ≈ − cos θ1 = 1,
L (3.36)
d/2
sin θ2 ≈ cos θ2 = 1.
L
Note that when we apply this approximation, we consider only points x that are near the symmetry axis (which goes
right between the two slits). If we consider observation points x that are further off-axis, the paraxial approximation
becomes invalid.
We thus find the two complex-valued fields due to the two slits at the screen (where we choose z = 0)
d/2 d/2
U1 (x) = ei λ (−x )
2π 2π
L U2 (x) = ei λ x L . (3.37)
To find the total field, we simply add them together, and to find the intensity, we simply take the squared modulus
(see Eq. (3.17))
I(x) = |U1 (x) + U2 (x)|2
2πd
= 2 + 2 cos x
λL (3.38)
πd
= 4 cos2 x .
λL
This formula describes a fringe pattern, as is experimentally observed in the double slit experiment.
To find the formula for the interference pattern, we used the complex fields U1 and U2 . Alternatively, we can
find the maxima and minima of the interference pattern by considering path length differences (see Fig. 3.4). We
expect to see interference maxima when the fields oscillate in phase (i.e. the peaks and the valleys of the oscillations
of the field coincide), and we expect to see interference minima when fields oscillate π radians out of phase (i.e.
the peaks of the oscillations of one field coincide with the valleys of the oscillations of the other field, so that they
cancel out when added up). Two fields are in phase when their path length difference δ is an integer multiple of
the wavelength λ. The fields are in counterphase when their path lengths differ by half a wavelength
δ = m · λ in phase, constructive interference
(3.39)
1
δ = m+ · λ counterphase, destructive interference.
2
To find the path length difference for a certain point x on the screen, we assume that the two rays from the two slits
that reach that point x are approximately parallel (of course parallel rays can technically never meet at the same
point, but if L ≫ d, then the angle between the two rays is so small that they may considered parallel). Given a
point x, the rays will make an angle θ ≈ x/L with the horizontal axis (i.e. z-axis). Here we again used the paraxial
approximation. From trigonometry it follows that the path length difference δ for a certain observation point x is
given by
x
δ ≈ d sin θ ≈ d . (3.40)
L
Note that first approximation δ ≈ d sin θ requires that the screen is very far away (i.e. L is large), so that two
rays from the two slits that intersect at the screen can be considered parallel. This approximation does not put a
x
strong restriction on θ. However, in the second approximation sin θ ≈ L , we assume that the observation point x
is sufficiently close to the symmetry axis so that the paraxial approximation is valid.
Plugging Eq. (3.40) in the condition for constructive interference (Eq. (3.39)), we find
x λL
d =m·λ → x=m . (3.41)
L d
40
Plugging this result in Eq. (3.38) confirms that indeed at these points interference maxima will be observed.
We can find the interference maxima and minima of the double-slit experiment in two ways:
• by adding together the complex-valued fields of plane waves at different angles and taking the
squared modulus to find the intensity,
• or by calculating where the path lengths of the rays from the two slits to an observation point on
the screen differ by an integer multiple of the wavelength, or by and additional half wavelength.
Intensity maxima are observed when constructive interference occurs, while minima are observed
when destructive interference occurs.
Figure 3.3: Derivation of the interference pattern in a double slit experiment using plane waves and complex
notation.
Figure 3.4: Derivation of the interference maxima and minima in a double slit experiment using path length
differences.
41
the interference maxima of the double slit experiment. Such a ‘many slit’ screen is called a diffraction grating. Its
interference pattern has the following properties:
• The more slits there are, the more sharply peaked are the interference maxima. This is because the condition
for constructive interference becomes more stringent as there are more slits.
• The greater the distance between the slits, the more closely spaced are the interference maxima. This can
already be seen from the double slit experiment: in Eq. (3.38) we see that a larger d leads to a more rapidly
oscillating cosine. This is because for larger d, the two plane waves at the screen interfere under a larger angle.
• The narrower the width of the slit, the bigger the envelope of the interference pattern. To obtain the inter-
ference pattern for the double slit experiment, i.e. Eq. (3.38), we assumed the slits to be ideal point sources,
so as a result the cosine is stretched out indefinitely. In other words, the envelope of the interference pattern
is infinitely large, because the slit is infinitesimally narrow. In practice, the slits will have finite size, which
means the interference pattern will have an envelope of limited size.
These properties of interference of a diffraction grating will be analyzed mathematically in the following sections.
The reason why diffraction gratings are of interest to us, is because they help us understand the resolution limit
due to diffraction (see Section 3.4), and because they can be used to separate different wavelengths of light (see
Section 3.2.2), which can already be seen from the fact that the double slit interference pattern (see Eq. (3.38))
depends on the wavelength λ.
Figure 3.5: Comparison of a double-slit to an N -slit diffraction grating. The interference maxima lie at the same
position, but for a diffraction grating they are more sharply peaked.
Videos:
• ‘Optics: Fraunhofer diffraction - multiple slits — MIT Video Demonstrations in Lasers and Optics’:
youtu.be/KlKduOOHukU
42
3.2.1 Interference pattern of a diffraction grating
We saw in Eq. (3.37), that to find the interference pattern of a double slit, we defined for each slit a plane wave
whose angle depends on slit position. If we now consider the case of many slits that are spaced a distance d apart,
then we can similarly define for each slit the plane wave
Md
eik L x , (3.42)
where m is an integer that counts the slits, k is the wave number 2π/λ, L is the distance from the grating to
the screen, and x is an observation point on the screen that is close to the symmetry axis, so that the paraxial
approximation is valid. If we have N + 1 slits that are centered around x = 0, then M runs from −N/2 to N/2
(let’s assume for convenience that N is even). To find the diffracted field at the screen, we sum all the plane waves
N/2 (N/2+1)d (N/2)d
X md eik L x
− e−ik L x
eik L x = d formula for geometric series
M =−N/2
eik L x − 1
N +1 d N +1 d
d 2 2
eik 2L x eik L x
− e−ik L x
= d d d take out common factor in numerator and denominator
eik 2L x eik 2L x − e−ik 2L x (3.43)
N kdx
sin 2L
= use Euler’s formula
sin kdx
2L
sin N γ
= introduce new quantity γ,
sin γ
where we defined
1 1 x
kd sin θ ≈ kd .γ= (3.44)
2 2 L
The interference maxima occur when the denominator equals 0, i.e. when γ = m · π (where m is some integer).
This leads to the condition
1 x 2π Lλ
kd = m · π → x = mL =m . (3.45)
2 L kd d
This is the same condition that we found for the double slit experiment, see Eq. (3.41): the smaller the distance
d between the slits, the larger the spacing between the interference maxima. So we see that the locations of the
interference maxima are the same for the double slit and the N -slit diffraction grating. However, the widths of the
interfere maxima are not the same: the more slits we have, the narrower the peaks. To verify this mathematically,
consider a peak at position
γ = m · π location of interference maxmimum. (3.46)
To find the width of this maximum, we find the nearest 0. The function becomes 0 when the numerator is 0, so when
n
N γ = nπ → γ = N π (where n is some integer). To find the 0 that is closest to the peak, we choose n = mN + 1,
so that
1
γ = m+ π 0-point that is nearest to interference maximum. (3.47)
N
To find the half-width of the peak, we take the difference between Eqs. (3.47) and (3.46)
1 2L λL
∆γ = π → ∆x = ∆γ = . (3.48)
N kd Nd
Therefore, we find that indeed the interference maxima become narrower as we increase the number of slits N .
So far, we have assumed that the slits are so thin that they can be assumed to be point sources, which at a
far away screen generate plane waves at different angles. Now, we’re going to see what happens when the slits have
a finite width b. A slit with a finite width b at a position x = 0 can be interpreted as a series of point sources that
range from −b/2 to +b/2. The resulting field U (x) at the screen is then simply the sum of the fields of all these
43
point sources, i.e.
Z +b/2
s
U (x) = eik L x ds
−b/2
L ik s x b/2
= e L
ikx −b/2 (3.49)
2L kb
= sin x
kx 2L
sin β
=b ,
β
where we defined
1 1 x
kb sin θ ≈ kb .
β= (3.50)
2 2 L
Let’s see if we can interpret intuitively the expression U (x) for the diffracted field of a single slit with finite width
b. First, note that U (x) contains an overall factor of b. This makes sense: if the slit is wider, we expect there to be
more light at the screen. Furthermore, we see from the definition of β that if b becomes smaller, the diffracted field
U (x) becomes broader. This also makes sense, because in the limit of b → 0, the slit becomes a point source, and
we assumed the far-field and paraxial approximation where a point source generates a plane wave of infinite extent
on the screen. Conversely, we see that a broader slit width b results in a narrower diffracted field U (x). This might
seem counterintuitive: if we have a very broad slit, wouldn’t the screen just be flooded with light? The answer to
that is: yes, that would be the case. However, we started our calculations with certain assumptions on the dimen-
sions of the system, namely that the size of the grating (which relates to both the slit distance d and the slit width
b) is much smaller than the distance to the screen L (more specifically, we make the Fraunhofer approximation,
which sets a condition on the so-called Fresnel number ). Once the initial assumptions are violated, then of course
the outcomes of our calculation have no physical relevance anymore. But as long as b is still sufficiently small, then
indeed a larger b leads to a narrower diffracted field U (x).
We have now examined the diffracted fields for multiple infinitesimally thin slits (Eq. (3.43)), and for a single
slit with finite width b (Eq. (3.49)). What happens if we have multiple slits with finite widths? To find the answer
to that question, we can simply take the summation over N point sources as in Eq. (3.43), and integrate them over
a shift s from −b/2 to +b/2 as in Eq. (3.49). This results in the expression
Z +b/2 N/2
X md+s
U (x) = eik L x
ds
−b/2 m=−N/2
! N/2
Z +b/2
s
ik L x
X
ik md
L x
(3.51)
= e ds e
−b/2 m=−N/2
sin β sin N γ
=b · .
β sin γ
We see that the resulting diffracted field is simply the product of the single-finite-width-slit field and the multiple-
infinitesimally-thin-slits field. The factor sin Nγ
sin γ , which is due to the number of slits N and slit distance d, determines
the positions interference maxima and their widths. The factor b sinβ β , which is due to the finite width b of each slit,
results an envelope function that determines how rapidly the intensity of the overall diffraction pattern decays.
44
Figure 3.6: Mathematical derivation of the intensity pattern of a diffraction grating. Each point in the grating
results in a plane wave with a certain angle at the detection screen. Summing all the plane waves with different
angles together, and taking the intensity of the resulting field yields the intensity pattern at the detection screen.
45
Figure 3.7: Sketch indicating how different grating parameters relate to different features of the diffraction pattern:
slit distance d relates to peak distance; number of slits N relates to peak width; slit width b relates to envelope
width.
The relevant features of the interference pattern of a diffraction grating are determined by three
grating parameters:
• The slit distance d. This determines the distance between interference maxima. Smaller d
means larger distance between maxima.
• The number of slits N . This determines the width of the interference maxima. Larger N means
narrower width of maxima.
• The slit width b. This determines the envelope of the overall diffraction pattern. Larger b means
narrower envelope.
The formula for the diffracted field is b sinβ β · sin N γ
sin γ (take the squared modulus to find the intensity).
• To find the locations of the interference maxima (determined by d), set sin γ = 0.
• To find the width of the interference maxima (determined by N ), find where sin N γ = 0, as near
π
as possible to the interference maximum. This results in a width of ∆γ = N .
• To find where the envelope vanishes (determined by b), set sin β = 0 (except for β = 0).
These points are important if you want to infer the grating parameters d, b, N from a measured
diffraction pattern.
46
3.2.2 Grating spectrometry
We saw in Eq. (3.45) that the positions of the interference maxima x = m Lλ d depend on the wavelength λ. Recall
that m is an integer denoting the diffraction order This means that a grating can separate different colors of light.
This allows us to study the spectrum of light, i.e. we can use gratings for spectroscopy. Now we can ask, how well
can we separate different wavelengths of light? This depends on several factors:
• Narrower diffraction peaks help us separate wavelengths better. So larger N allows for better spectroscopy.
• The larger the diffraction order m, the more the peak position x varies as function of wavelength λ (see Eq.
(3.45)). So larger m allows for better spectroscopy.
So we expect the ‘quality of spectroscopy’ to be proportional to m · N . How can we define this quantity more
precisely? We can say that two wavelengths with a difference ∆λ can be just resolved when the peak of one
wavelength coincides with the first zero of the peak of the other wavelength, i.e. the two peaks are separated by
one half-width of the peak. In Eq. (3.48), we saw that the half-width is given by
λL
W = . (3.52)
Nd
So let’s consider two diffraction peaks separated by a distance ∆x = W . What difference in wavelength ∆λ does
this correspond to? We require that the peaks corresponding to two different wavelengths are separated by
L∆λ
∆x = m = W, (3.53)
d
and solving for minimum resolvable wavelength difference ∆λ gives
Wd λ
∆λ = = . (3.54)
mL mN
The quality of spectroscopy is better if ∆λ is smaller. So we can define a measure of quality called the resolvance
or chromatic resolving power as
λ
= m · N. (3.55)
∆λ
This quantity is indeed a measure of quality of spectroscopy because it is larger when ∆λ is smaller, and it is
proportional to m · N , as we expect on grounds of intuition.
47
Figure 3.8: The resolving power of a diffraction grating. To resolve two wavelengths whose difference is ∆λ, the
distance between their peak positions xm must be at least as large as their peak width W .
There is still one more phenomenon that affects the quality of spectroscopy. Each diffraction order m generates
the complete spectrum of the input light. If this spectrum is too broad, the spectra of different diffraction orders
will overlap with each other, which will be a problem if we want to perform spectroscopy accurately. To quantify
when this overlap of spectra will happen, we assume the spectrum of the input light ranges from λmin to λmax .
Overlap starts to occur when λmax from the mth spectrum coincides with λmin from the m + 1th spectrum. So
overlap starts to occur when
xm,λmax = xm+1,λmin . (3.56)
Plugging in the expression for the peak positions (Eq. (3.45)) gives
λmax L λmin L
m· = (m + 1) · . (3.57)
d d
Solving for the minimum allowable spectral range gives
λmin
λmax − λmin = . (3.58)
m
This quantity is called the Free Spectral Range (FSR).
48
Figure 3.9: To find the Free Spectral Range (FSR) of a diffraction grating, we examine when the largest wavelength
λmax of the mth diffraction order overlaps with the shortest wavelength λmin of the m + 1st order.
A diffraction grating separates different wavelengths of light, so it can be used for spectroscopy
(measuring the spectrum of multi-wavelength/multi-color input light). The quality of spectroscopy
can be quantified using two measures:
• The resolvance / chromatic resolving power : this quantity indicates the smallest difference in
wavelength ∆λ that can still be resolved. For a diffraction grating, the resolvance is λ/∆λ = m·N .
A larger number of slits N helps because it makes the diffraction peaks narrower, and a larger
diffraction order m helps because it makes the diffraction peak position depend more strongly
on wavelength.
• The free spectral range: this quantity indicates the maximum allowable wavelength range the
input may contain before the spectra for different diffraction orders start to overlap. For a
diffraction grating, the free spectral range is λmax − λmin = λmin
m . The free spectral range is
shorter for higher diffraction orders m, because for higher orders the spectrum gets stretched
out more, so it’s easier for adjacent spectra to overlap.
3.3 Interferometry
So far, we’ve seen that interference can be used to demonstrate the wave nature of light, and that it also has
practical applications such as spectroscopy. What we will see in the following, is that interference can also be used
to measure very small changes in distance. This is possible, is because an interference pattern changes visibly if
path lengths change by less than a wavelength. For optical wavelengths (λ ≈ 500nm), it means we can detect
changes in path length of tens of nanometers.
49
3.3.1 Michelson interferometer
In a Michelson interferometer, a beam is split into two different paths with a beam splitter, and then both beams
are reflected by a mirror so that they are recombined and interfere with each other (see Fig. 3.10). By applying a
change in one of the two paths, the interference pattern will change. As a first example, let’s see what happens if
we change the length of one of the two paths by ∆z (note that if we move a mirror by ∆L, then the path length
changes by ∆z = 2∆L due to reflection). A plane wave propagating in the z-direction is described by eikz . If we
add this plane wave to a plane wave that has undergone an extra path length of ∆z, the total intensity is given by
2
I(∆z) = eikz + eik(z+∆z)
We see that if ∆z = 0, the intensity is maximal, and if ∆z = λ/2, the intensity is 0. So a path length change of
half a wavelength is clearly visible in the interference pattern.
Now suppose we give one of the plane waves a tilt of an angle α with respect to the z-axis, so that the tilted
plane wave is described by eik(x sin α+z cos α) . Then the interference pattern at z = 0 is given by
2
I(x, ∆z) = eikx sin α + eik∆z
(3.60)
= 2 + 2 cos (x sin α − ∆z) .
We see that now the intensity varies as function of x: we see a fringe pattern. The larger the angle α, the closer
the fringes are to each other. By changing the path length difference ∆z, the fringe pattern is shifted.
50
Figure 3.10: If we use a collimated laser beam in a Michelson interferometer we can observe different interference
patterns depending on alignment. If the two beams are parallel at the detector, then we have a spot with a constant
intensity that depends on the path length difference. If there is an angle between the two beams at the detector,
we see a fringe patterns whose period depends on the tilt angle.
As a final example, we consider the interference pattern that is generated by an extended source. What this
means, is that we consider source that consist of multiple point sources, each of which radiates rays in all directions
θ. The collection of rays (from all point sources) that propagate in the same direction θ can be described by the
plane wave eik(x sin θ+z cos θ) . If we split this field in two, and delay one field by z → z + ∆z, we find that the
interference of these two fields gives
2
I(θ, ∆z) = eik(x sin θ+z cos θ) + eik(x sin θ+(z+∆z) cos θ)
51
Figure 3.11: If we use an extended source in a Michelson interferometer, then fields are emitted at different angles
θ, and for each angle θ, the fields have a different path length difference. Therefore, if the light if focused with a
lens (which converts an angle θ to a position r, we observe ring-shaped interference fringes.)
In a Michelson interferometer, a beam is split in two and then made to interfere. Depending on the
type of source and the alignment of the setup, different interference patterns may form:
• Two parallel beams: the intensity is constant over the screen, but changes as you change the
path length difference between the two beams.
• Two beams at an angle with each other: straight interference fringes are visible. The larger
the angle between the two beams, the smaller the spacing between the fringes. If you change
the path length difference, the fringes will shift.
• Extended source: when focusing the light with a lens, ring-shaped fringes are visible (Haidinger
fringes). If you increase the path length difference, more fringes will form.
Videos:
• ‘Optics: Two-beam interference - collimated beams — MIT Video Demonstrations in Lasers and Optics’:
youtu.be/J4Ecq7hIzYU
So let’s see what path length difference a thin film with refractive index n and thickness d introduces on a plane
wave with incident angle θ. Let’s assume the walls of the thin film lie in the planes z = 0 and z = −d, and the
52
angle θ is defined with respect to the z axis (see Fig. 3.12). We can write the incident plane wave as
The field that is directly reflected by the top layer at z = 0 is thus given by
The field that will be reflected by the bottom layer first is refracted by the top layer, so that its angle of propagation
will change from θ to θn . Then, it has to propagate to z = −d through a medium with refractive index n. So the
field that is reflected by the bottom layer is
When the field that is reflected by the bottom layer propagates by another distance d back to the top layer, it will
accumulate another phase of d cos θn due to propagation in the z-direction
eink(x sin θn +2d cos θn ) field reflected by bottom layer, evaluated at the top layer. (3.66)
By comparing Eqs. (3.63) and (3.66), we find that the phase difference between the two fields is equal to
Recall that for constructive interference, the phase difference must be an integer multiple of 2π, and for destructive
interference there must be an additional phase difference of π
δ = m · 2π constructive interference,
(3.68)
1
δ = m+ · 2π destructive interference.
2
Note that the condition for constructive interference depends on both the wavelength λ (because k = 2π λ ), and
the refracted angle θn . Therefore, different wavelengths interfere constructively at different reflection angles, which
explains why soap bubbles can separate the different colors that are present in sunlight, and it explains why we see
the colors change when look at the bubble from different angles.
Figure 3.12: Deriving the phase shift that occurs during reflection by a thin film, which leads to thin film interference.
53
In thin film interference, fields that are reflected from different surface interfere with each other.
Due to the distance between the two layers, there is a phase shift δ = 2nkd cos θn between the two
fields. This phase shift depends on the incident angle and the wavelength. Therefore, different
wavelengths interfere constructively at different angles.
So let’s calculate what happens when we have multiple reflections inside the thin film. To simplify the calcu-
lation, let’s assume that the refractive index inside the ‘film’ is n = 1, so that we basically have a narrow cavity
with two semi-transparent mirrors on the side. Let’s say the mirrors have amplitude reflection and transmission
coefficients of r and t respectively. If the field travels straight through the cavity without any internal reflections,
then the field’s amplitude is multiplied by t2 , because it is transmitted twice (once by the first mirror, once by the
second). If the field undergoes two reflections before it exits the cavity (there have to be two reflections, otherwise
the field exits from the wrong side), then one might think that the amplitude of the incident field is multiplied by
r2 t2 , because the field is transmitted by the first mirror, reflected back by the second mirror, then reflected by the
first mirror, and finally transmitted by the second mirror. However, in the previous section on thin film interference,
we saw that due to propagation through this ‘film’ of thickness d, the field undergoes an extra phase shift of δ!
So to find the transmitted field, you’d have to multiply the incident field by r2 t2 eiδ . More generally, if the field is
bounced M times back and forth between the two mirrors, the amplitude transmission coefficient of the cavity is
given by t2 (r2 eiδ )M . To find the total transmitted field, we have to add all the transmitted fields together (see Fig.
3.13). So given an incident field E0 , the transmitted field E is given by
∞
X
E= E0 t2 (r2 eiδ )M
M =0
E0 t2 (3.69)
=
1 − r2 eiδ
E0 T
= .
1 − Reiδ
In the final line, we defined T = t2 , and R = r2 . To find the transmitted intensity, we need to take the squared
modulus of the field. So if the input intensity is I0 , then the transmitted intensity I is given by
I0 T 2
I=
|1 − Reiδ |2
I0 T 2
= 2
(3.70)
1 + R − 2R cos δ
I0 T 2 δ
= substitute cos δ = 1 − 2 sin2 .
(1 − R)2 + 4R sin2 δ
2
2
We see that the transmitted intensity depends on the reflectivity of the mirrors R and the phase shift δ, which in
turn depends on the incident angle θ and the wavelength λ
54
shift that is accumulated during one reflection becomes more relevant. Now let’s consider the ideal case that R = 1.
In this case, the transmitted intensity is clearly maximal when sin 2δ = 0, so when
This equation basically says that to get constructive interference, all the transmitted fields E0 t2 (r2 eiδ )M must be in
phase with each other, no matter how many internal reflections M the field has undergone. Plugging in Eq. (3.72)
the expression for δ from Eq. (3.71), we find
2d cos θ = m · λ. (3.73)
We see that for a given wavelength λ, multiple angles θm satisfy the condition for constructive interference. Recall
the ring-shaped Haidinger fringes that are visible in a Michelson interferometer (Section 3.3.1) when you use an
extended source: an extended source emits plane waves at all angles θ, and a lens is used to focus each angle θ
into a ring. Similarly, if we illuminate the Fabry-Pérot cavity with an extended source (i.e. multiple angles θ),
and focus the transmitted light with a lens, we see ring-shaped interference fringes. Eq. (3.73) thus reveals that
if the input light consists of multiple wavelengths λ, the wavelengths will form concentric rings of different radii.
Moreover, we see that a single wavelength will form multiple rings (each m corresponding to a different ring). So in
the same way that a diffraction grating produces multiple spectra (one spectrum per diffraction order, see Section
3.2.2), the Fabry-Pérot interferometer similarly produces multiple ring-shaped spectra: one per diffraction order m
as described by Eq. (3.73).
So just like the diffraction grating, the Fabry-Pérot interferometer can be used for spectroscopy. In Section
3.2.2, we defined two quantities to measure the quality of spectroscopy:
• Resolvance / chromatic resolving power: the ability to resolve to adjacent wavelengths
55
• Free spectral range: the maxmimum allowed wavelength range for the input light before the spectra
(corresponding to different diffraction orders m) start overlapping.
We will now calculate these quantities for the Fabry-Pérot interferometer, so that we can compare its performance
to that of the diffraction grating.
Remember that to calculate the resolvance, we must calculate the half-width of the interference peak corresponding
to a single wavelength, and then we must see which wavelength shift ∆λ corresponding to the peak half-width (the
half-width is the distance between the peak maximum and the point where the peak has practically dropped off to
0). We saw that the intensity I(δ) as described by Eq. (3.70) peaks at δ for which sin 2δ = 0. By using T = 1 − R
(energy conservation), we can rewrite Eq. (3.70) as
I0
I=
1+ 4R
(1−R)2 sin2 δ
2
(3.74)
I0
= ,
1 + F sin2 2δ
To find the quarter peak width in terms of δ, we find the δ for which I attains half its maximum value I0 (the
full width of the peak would be determined by the points where the peak attains values of practically 0). We thus
require that
δ1/2
F sin2 = 1. (3.76)
2
If we approximate sin δ1/2 ≈ δ1/2 , then we find that the half-width is
4
W = 2δ1/2 ≈ √ . (3.77)
F
Now we must find which wavelength change ∆λ corresponds to this half-width W . From Eq. (3.71) we find
dδ 4πd
= − 2 cos θ, (3.78)
dλ λ
and therefore
λ2
∆λ ≈ − ∆δ. (3.79)
4πd cos θ
Setting ∆δ equal to W = √4F (see Eq. (3.77)) gives (if we ignore the sign of ∆λ, because we only care about the
magnitude of the wavelength difference)
λ2
∆λ ≈ √ . (3.80)
πd cos θ F
The resolvance is given by
√
λ πd cos θ F
=
∆λ √λ
πm F (3.81)
= see Eq. (3.73)
2
= mF,
Now that we have calculated the resolvance for the Fabry-Pérot cavity, we can now proceed to calculate the
56
free spectral range. To find when two different wavelengths λ from two adjacent diffraction orders overlap, we use
Eq. (3.73), which gives the diffraction angle θ for a certain wavelength λ and a certain diffraction order m. Two
wavelengths overlap when they have the same angle θ. So let’s choose θ fixed, and write down the expression for
the longest wavelength λmax of the mth diffraction order, and the shortest wavelength λmin of the m + 1th order
2d cos θ
λmax = ,
m (3.83)
2d cos θ
λmin = .
m+1
The free spectral range is given by
2d cos θ 2d cos θ
λmax − λmin = −
m m+1
2d cos θ
= (3.84)
m(m + 1)
λmin
= .
m
Alternatively, we can rewrite the expression by defining
so that
2d cos θ
∆λ =
m(m + 1)
λmax λmin (3.86)
=
2d cos θ
λmax (λmax − ∆λ)
=
2d cos θ
Solving for ∆λ gives
λ2max λ2max
∆λ = ≈ . (3.87)
2d cos θ + λmax 2d cos θ
To express the free spectral range in term of frequency f instead of wavelength λ, we use
c df c c
f= → = − 2 → |∆f | = 2 ∆λ. (3.88)
λ dλ λ λ
Plugging this in Eq. (3.86) gives
c
|∆f | ≈ . (3.89)
2d cos θ
57
Figure 3.14: Summary of the resolvance and free spectral range of a Fabry-Pérot cavity.
58
In a Fabry-Pérot cavity, ‘multiple thin-film interferences’ take place, because the field bounces
back and forth between two semi-transparent mirrors. Due to the multiple reflections, more
fields interfere, which makes the condition for constructive interference more selective (just like
interference maxima of a diffraction grating become narrower when fields from more slits interfere).
This makes the Fabry-Pérot cavity, like diffraction gratings, suitable for spectroscopy.
The Fabry-Pérot cavity ensures that different wavelengths interfere constructively at different
angles θ. Just like in the case of Haidinger fringes in the Michelson interferometer (see Section
3.3.1), we can use an extended source to illuminate the Fabry-Pérot cavity with multiple angles, and
we can put a lens after the cavity to convert the multitude of angles into ring-shaped interference
fringes.
Given a cavity with width d, we can find the angle θm for which a certain wavelength λ in-
terferes constructively using the following relation:
where m is an integer that denotes the diffraction order. This equation basically says that to get
constructive interference, all the transmitted fields E0 t2 (r2 eiδ )M must be in phase with each other,
no matter how many internal reflections M the field has undergone. The more δ deviates from
this condition, the more the transmitted fields will destructively interfere with each other. If the
reflectivity R of the mirrors is high, there are more transmitted fields (i.e. more M for which the
transmitted field has non-negligible amplitude) that can destructively interfere with each other. So
a high reflectivity R means that the condition on δ for contructive interference is more stringent,
which means the interference peaks will be narrower. Having narrower interference peaks means
that different wavelengths can be separated better, so the quality of spectroscopy improves with
better reflecting mirrors.
When studying the diffraction grating in Section 3.2.2, we defined two quantities to specify
the quality of spectroscopy: the resolvance and the free-spectral range. For the Fabry-Pérot
interferometer, we found that the resolvance is
λ
= mF, (3.90)
∆λ
where m denotes the diffraction order and F denotes the finesse of the cavity, which becomes larger
as the reflectivity R get closer to 1. The free spectral range is given by
λmin λ2max
λmax − λmin = = , (3.91)
m 2d cos θ + λmax
which indicates that the free spectral range improves as the cavity length d decreases.
Suppose we have a diffraction grating with some periodicity p (this quantity is similar to the slit distance d,
59
except that we don’t necessarily have to talk about slits, but any periodic feature). This periodicity p is also
sometimes called the pitch of the grating. The interference maxima form at angles θm for which p sin θm = m · λ
(see Eqs. (3.41) and (3.45)). We see that a smaller pitch p, or a longer wavelength λ, leads to larger diffraction
angles θm . Now suppose we want to create an image of the grating using a single lens. If the lens is large enough
to capture the first diffraction orders (i.e. m = ±1), then these diffraction order will re-interfere in the image plane
as two plane waves. This gives a sinusoidal intensity pattern, which basically represents an image of the periodic
grating. If the angles of the first diffraction orders are so large that they miss the lens, then the periodic character
of the grating is not visible in the image.
Figure 3.15: A diffraction grating can be used to illustrate how the imaging resolution is limited by the wavelength
of the light and the size of the lens aperture.
So let’s calculate quantitatively under what condition the periodic grating can be imaged. The diffraction angle
θ of the first order is given by
λ
sin θ = . (3.92)
p
Suppose the lens has a certain diameter and a certain distance to the object, such that the largest ray angle it can
capture is α. We define the Numerical Aperture (NA) as
NA = sin α. (3.93)
So to ensure that the lens can capture the first diffraction orders, we require
60
In other words, the smallest feature that an imaging system can image is λ/NA. This is called the diffraction limit.
Note that this result follows from the wave nature of light (quite clearly, since it involves the quantity λ), and
interference effects. This resolution limit cannot be derived from ray optics: in ray optics it is in principle possible
that all rays from a single object point intersect perfectly in a single image point, so according to this model, the
resolution can be arbitrarily high.
Note that sin α and sin θ cannot be larger than 1. From Eq. (3.92), it follows that θ is undefined if p < λ.
So what happens physically once we choose p < λ? Note that θ increases as we decrease p. When we decrease
p to p = λ, then the diffraction angle becomes θ = 90◦ , so the first diffraction orders will propagate along the
grating surface. This situation should remind you of Total Internal Reflection, which we discussed in Section 3.1.7:
we saw that the refracted angle could steadily increase to θ = 90◦ as the incident angle approached the critical
angle. We saw that beyond this angle, an evanescent field forms. This field decays exponentially with distance to
the interface. A similar thing happens in the case of the grating: when p is reduced to p < λ, there cannot be
any propagating diffraction orders, but there is an evanescent field that decays exponentially. What follows is an
important conclusion: the evanescent field contains information for sub-wavelength resolution.
Therefore, if we want to design imaging systems that can beat the diffraction limit, we must somehow make
sure that the information of the evanescent field reaches the detector. One way to achieve this is to probe the field
very close to the sample, where the evanescent field hasn’t yet decayed to negligibly small values. This is done in a
near-field scanning optical microscope (NSOM).
Another way to capture information from the evanescent field, is to immerse the sample in a medium with a
high refractive index n. In that medium, the wavelength is reduced by λ → λ/n, which means the smallest feature
that generates propagating diffraction order becomes p → p/n. If these propagating diffraction orders can be made
to reach the camera, we have improved the resolution by a factor n. For this reason, the Numerical Aperture is not
defined as sin α, but as n sin α. With this definition, it becomes possible in principle to achieve NA > 1.
The wave nature of light limits the optimal resolution of an imaging system to λ/NA, where λ is the
wavelength of the light that is used, and NA denotes the Numerical Aperture of the imaging system.
This limit is called the diffraction limit. The sub-wavelength information lies in the evanescent field,
which does not propagate, but decays exponentially. So to achieve sub-wavelength image resolution,
one needs to capture the information in evanescent field. This can be done by measuring very close
to the sample, or to immerse the same in a medium where the wavelength is reduced, so that the
evanescent field is converted to a propagating field.
The summation of plane waves and the decomposition in plane waves is mathematically conveniently described by
the Fourier transform and its inverse. From the examples that we just mentioned, it should become clear that the
Fourier transform plays an important role in optics: far-field diffraction is described by a Fourier transform, and to
calculate the field in the focal plane of a lens we use the Fourier transform. In fact, there exists an entire field of
optics called ‘Fourier optics’. However, the use of the Fourier transform is definitely not limited to merely optics!
It is used everywhere where waves and periodicity are involved: this includes quantum mechanics, crystallography,
and signal processing. So let’s see how a Fourier transform works mathematically, and how it is applied in optics.
61
3.5.1 Mathematics of the Fourier transform
A Fourier transform decomposes a function into its plane wave components. To get a feeling for how this works, you
could recall how you decompose a vector into its orthogonal components (e.g. x, y, z components). To decompose a
vector into its orthogonal components, we take the inner product of the vector with the orthonormal basis vectors
(orthonormal means: the vectors have length 1, and their inner products are 0, so they are perpendicular to each
other). To give a simple example, consider a vector
f1
f⃗ = f2 . (3.96)
f3
Then to decompose f⃗ into its orthogonal components, we take the inner products
fx = f⃗ · ⃗ex = f1 ,
fy = f⃗ · ⃗ey = f2 , (3.98)
fz = f⃗ · ⃗ez = f3 .
Here, taking the inner product means multiplying the vector entries with each other, and then summing them
X
⃗g · f⃗ = g[n]∗ · f [n]. (3.99)
n
Note that we take the complex conjugate of g[n] in case it is complex-valued (i.e. we take the Hermitian conjugate
of the vector). This is because if we take the dot product of ⃗g with itself, it should give the squared length of the
vector, and to achieve this, we need to calculate n g[n]∗ · g[n]. If we want to reconstruct the original vector f⃗ from
P
its orthogonal components, we simply sum the components back together
So this is how we decompose a vector into its orthogonal components. Now we want to decompose a continuous
function into plane wave components: how are we going to do that? First, we need to define the dot product
between two continuous functions. Analogously to Eq. (3.99), we can define it as
Z ∞
⟨g(x)|f (x)⟩ = g(x)∗ f (x) dx. (3.101)
−∞
In our case, the orthogonal components are plane waves of different frequencies, so
Here we denote frequency with the Greek letter ν (‘nu’). It is also common to denote the spatial frequencies with
f , but this can sometimes cause confusion in Fourier Optics, when we work with lenses with focal length f . These
basis functions gν (x) are orthogonal to each other, because if we take two different basis functions, the inner product
is 0, (
Z ∞ Z ∞
′ 0 if ν ̸= ν ′ ,
gν (x)∗ gν ′ (x) dx = e2πi(ν −ν) dx = (3.103)
−∞ −∞ ∞ if ν = ν ′ .
The infinity (∞) that we get when taking the inner product of a basis function with itself may look concerning.
However, this is captured with a mathematical ‘function’ (technically it’s a generalized function or a distribution)
called the Dirac delta, which, intuitively speaking, is infinite in a single point, and 0 elsewhere:
62
where δ(x) is such that Z ∞
f (x)δ(x) dx = f (0). (3.105)
−∞
We can now define the Fourier transform: the Fourier transform decomposes a function f (x) into plane wave
components gν (x) = e2πiνx by taking the inner product
Z +∞
ˆ
f (ν) = ⟨gν (x)|f (x)⟩ = f (x)e−2πixν dx. (3.106)
−∞
To find the original function f (x) from its Fourier transform fˆ(ν), we need to add all the components back together
Z ∞
f (x) = fˆ(ν)e2πiνx dν. (3.107)
−∞
The Fourier transform and inverse Fourier transform as in Eqs. (3.106) and (3.107), are defined for a 1-dimensional
function f (x). In optics, we typically consider fields in a 2-dimensional plane, so we have a 2D function f (x, y). In
that case, we will have 2D basis functions
The two-dimensional spatial Fourier transform decomposes a function f (x, y) into plane waves
e2πi(νx x+νy y) by taking the inner product
ZZ ∞
fˆ(νx , νy ) = f (x, y)e−2πi(νx x+νy y) dx dy.
−∞
The inverse Fourier transform is performed by adding all plane waves back together with the appro-
priate weights fˆ(νx , νy ) ZZ ∞
f (x, y) = fˆ(νx , νy )e2πi(νx x+νy y) dνx dνy .
−∞
In a way, these questions have already been touched upon in Section 3.4, where we discussed how capturing
diffraction orders from a grating is important for high-resolution image formation. The −1st , 0th , 1st diffraction
orders basically correspond to different plane waves, which together form the sinusoidal grating
1 1
1 + cos(νx x + νy y) = |{z}
1 + |e2πi(ν{z
x x+νy y)
} + 2 −e
2πi(νx x+νy y)
. (3.110)
| {z } 2 | {z }
th st
Grating 0 order 1 order −1st order
So when we said that a certain diffraction order may not be captured by the imaging lens and will therefore not
show up in the image, we were actually saying that a certain plane wave component e2πi(νx x+νy y) will be missing
from the image. In other words, we set a certain weight fˆ(νx , νy ) equal to 0: the imaging system filters out certain
parts of the Fourier transform of the object. More specifically, because the lens fails to capture the high diffraction
angles, the lens filters out the high spatial frequencies (νx , νy ): the lens acts as a low-pass filter.
63
So what does it intuitively mean to filter out certain spatial frequencies from an image? What do we expect
the image to look like after Fourier filtering? Simply put, high spatial frequencies correspond to small features, and
low spatial frequencies correspond to the broad features. This is because the high spatial frequencies correspond to
rapidly oscillating fields (or gratings with a small pitch), while low spatial frequencies correspond to slowly varying
fields. Therefore, when applying a low-pass filter (i.e. removing high spatial frequencies), any sharp edges will
become blurred, and the resolution decreases. Applying a high-pass filter (i.e. removing low spatial frequencies)
results in an image where only/mainly the edges are visible.
Figure 3.16: A Fourier transform of an image indicates which spatial frequencies are present in the image. By
removing high spatial frequencies, the image becomes blurred. By removing low spatial frequencies, only the edges
become visible.
An imaging system filters the Fourier transform of a sample. More specifically, it is a low-pass filter,
which limits the imaging resolution according to the diffraction limit.
see e.g. Eq. (3.37). Here we assume that x is sufficiently small so that the paraxial approximation is satisfied. So
if we have a collection of point sources at locations (x′ , y ′ ) with complex amplitudes U0 (x′ , y ′ ), then they will all
generate plane waves in the far-field with weights U0 (x′ , y ′ ). The total far field will then be given by the sum of
these plane waves
ZZ
′ x ′ y
U (x, y) = U0 (x′ , y ′ )e−2πi(x Lλ +y Lλ ) dx′ dy ′
x y (3.112)
= Û0 , ,
Lλ Lλ
where in the last line we used the definition of the 2D Fourier transform as in Eq. (3.109). Therefore, we see that
the far-field is given by the Fourier transform of the near field. A more detailed derivation of this result is given in
64
Section 6.1.2.
In the sections on the Michelson interferometer and the Fabry-Pérot interferometer (Sections 3.3.1 and 3.3.3),
we saw that we could explain Haidinger fringes by observing that a lens with focal length f focuses a plane wave
with angle θ at the point x′ = θf (note that in the paraxial approximation we use θ and sin θ interchangeably). We
can formulate this in the reverse manner: a point field at the position x′ in the back focal plane would have been
′
produced by an incident plane wave with angle θ = xf . So if the field in the back focal plane consists of multiple
field points at positions (x′ , y ′ ) with complex amplitudes U0 (x′ , y ′ ), they would have been generated by the sum of
incident plane waves
′ ′
ZZ
′ ′ i 2π x xf +y yf
U (x, y) = U0 (x′ , y ′ )eiϕ0 (x ,y ) e λ
dx′ dy ′ . (3.113)
′ ′
Note that I included a factor eiϕ0 (x ,y ) because it’s not obvious that the plane waves that generate different points
in the back focal should have accumulated the same phase. In fact, at this point it would be impossible to tell,
because I haven’t specified in which plane I evaluate U (x, y): I only stated that it’s an incident field on the other
side of the lens, but there are many planes on the other side of the lens.
If we put point sources at positions (x, y) with complex amplitudes U (x, y) in the front focal plane, then (analogously
to Eq. (3.113)) we find that the field in the back focal plane is given by
ZZ
′x ′y
U (x, y)eiϕ(x,y) e−i λ (x f +y f ) dx dy.
2π
U0 (x′ , y ′ ) = (3.114)
Note that compared to Eq. (3.113), there’s a minus sign in the exponential because the order of the planes have
switched while the propagation direction, i.e. direction of increasing phase, of the field is the same. We know that
an on-axis (i.e. (x, y) = (0, 0)) point source in the front focal plane should yield a plane wave parallel to the optical
axis in the back focal plane, and a plane wave parallel to the optical axis in the front focal plane should yield an
on-axis focal spot in the back focal plane, i.e.
Plugging the first case into Eq. (3.114), and plugging the second case into Eq. (3.113) indicate that ϕ0 (x′ y ′ )
and ϕ(x, y) should be constant. If we plug this result in Eqs. (3.113) and (3.114), we find that the fields in the
front focal plane and back focal plane of an ideal thin lens are related by Fourier transform (we will give a more
detailed derivation in Section 6.2.2). A convincing demonstration of this fact is the Abbe-Porter experiment. In this
experiment, a mesh sample is put in the front focal plane of an imaging lens, and the Fourier transformed field that
forms in the back focal plane can be filtered using slits. In the image plane, one can observe the Fourier filtered
image. E.g. if you filter away the vertical diffraction orders, the horizontal lines of the mesh vanish and vice versa.
65
Figure 3.17: Far field propagation is described by a Fourier transform, and an ideal thin lens performs a Fourier
transform when going from the front focal plane to the back focal plane.
• A near-field and its far-field are related to each other by Fourier transform.
• The fields in the front focal plane and back focal plane of an ideal thin lens are related to each
other by Fourier transform.
What happens in practice, is that an infinitely large plane wave gets cut off by the finite aperture of the lens.
This leads to a broadening in the focal spot. The more the wave is cut off by the aperture (i.e. the smaller the
aperture), the broader the focal spot. This is consistent with what we saw previously: the smallest feature size that
can be imaged scales with λ/NA, so indeed, if we decrease the Numerical Aperture, the resolution gets worse.
More specifically, the shape of the focal spot due to a circular lens aperture is the Airy disk, which is given by the
Fourier transform of a circular aperture. We saw that a diffracted far-field is also given by a Fourier transform,
so the Airy disk can also be observed by far-field diffraction from a small circular aperture. It is a fundamental
property of the Fourier transform, that if a function is very compact, then its Fourier transform must be very spread
out. In the context of quantum mechanics, where the position-representation and the momentum-representation
of a wave function are related by Fourier transform, this is known as the uncertainty principle. In the context of
66
optics, it means that a smaller Numerical Aperture leads to a broader Airy disk.
In the context of ray optics, we said that to create an image, the rays from a single object point must converge
in a single point. Then we saw using arguments with diffraction gratings that the wave nature of light prohibits
ideal imaging. With the introduction of the Airy disk, we can now state that in a diffraction limited image, each
object point will become an Airy disk in the image. More generally (i.e. in imaging systems that are not necessarily
diffraction limited due to aberrations), each object point will become spread out spot called a Point Spread Function
(PSF) in the image. The broader the PSF is, the more blurred the image becomes.
Figure 3.18: The Fourier transform of a circular aperture is an Airy disk. The size of the Airy disk is inversely
proportional to the size of the aperture.
• The smaller the aperture, the broader the Airy disk. This is an example of the uncertainty
principle, which is a fundamental property of the Fourier transform.
• In an imaging system, each object point becomes in the image plane a spread out field which
is called the Point Spread Function (PSF). In a diffraction-limited imaging system, this PSF is
the Airy disk. The broader the PSF, the worse the imaging resolution.
Videos:
• ‘Optics: Fraunhofer diffraction - circular apertures — MIT Video Demonstrations in Lasers and Optics’:
youtu.be/rmg1XyOSAk0
67
Suppose you have a system with an impulse response g(x). That is, if we send a short pulse (an ‘impulse’)
through the system, then the output will be g(x). If we send in multiple impulses at different x = x′ , and with
different weights h(x′ ), then the output will be the sum of shifted, weighted, impulse responses h(x′ )g(x − x′ ).
Mathematically put, an input h(x) that is sent through a system with impulse response g(x) yields an output which
is the convolution of g with h Z
g(x) ⊗ h(x) = h(x′ )g(x − x′ )dx′ . (3.116)
An ‘impulse’ at time/position x0 is described by the Dirac delta function h(x) = δ(x − x0 ). We can verify that
indeed this impulse yields the shifted impulse response
Z
δ(x′ − x0 )g(x − x′ )dx′ = g(x − x0 ). (3.117)
Note that the convolution is commutative: g(x) ⊗ h(x) = h(x) ⊗ g(x). This can be seen by applying to Eq. (3.116)
a change of integration variables x′ → x − x′ .
Figure 3.19: A convolution is used when each impulse generates an impulse response.
Now that we know what a convolution is, we can see how the Fourier transform helps us compute it more
conveniently. To see how this works, let’s take the Fourier transform of the convolution:
Z Z
F{g(x) ⊗ h(x)}(fx ) = h(x )g(x − x ) dx e−2πixfx dx
′ ′ ′
Z Z
′ −2πixfx
= g(x − x )e dx h(x′ ) dx′ change integration order of x and x′
Z Z
−2πi(x+x′ )fx
= g(x)e dx h(x′ ) dx′ change of integration variables x → x + x′
Z Z
′ ′
−2πixfx
= g(x)e dx h(x′ )e−2πix fx dx′ factorize e−2πi(x+x )fx
Z Z
′
= g(x)e−2πixfx dx h(x′ )e−2πix fx dx′ write double integral as product of integrals
Now let’s see how this result is relevant in the context of optics. We saw in the previous section about the
Airy disk and the PSF (Section 3.5.4) that when we image an object, each object point becomes a PSF in the
image. This is a convolution: the object field h(x) is the input signal, and the PSF g(x) is the impulse response.
So we see that
Image = Object ⊗ PSF. (3.119)
We can take the Fourier transform on both sides to convert the convolution to product
F{Image} = F{Object} · F {PSF}. (3.120)
68
Recall that in a diffraction-limited imaging system, the PSF is an Airy disk, which is the Fourier transform of a
circular aperture, so we can write
F{Image} = F{Object} · Aperture. (3.121)
Here we used the fact that the Fourier transform is the same as its inverse except for a mirroring operation, which
is irrelevant if the aperture is symmetric:
Note that Eq. (3.121) basically describes the low-pass filter that we described in Section 3.5.2: we low-pass filter
the object by taking its Fourier transform and cutting off the higher spatial frequencies with an aperture. So we
see that low-pass filtering is equivalent to convolution with the Airy disk.
Another example of how Fourier transforms and convolutions are relevant in optics, is in understanding the in-
terference pattern of diffraction gratings. The N -slit diffraction grating with slit width b and slit distance d can be
described using a convolution and a product (see Fig. 3.20)
We saw in Section 3.5.3 that the far-field interference pattern is found by taking the Fourier transform. Taking the
Fourier transform changes convolutions to products and vice versa, so
Diffracted field = F{Grating} = F{Grating widthN } ⊗ (F{Slit widthb } · F {Thin slit arrayd }). (3.124)
Recall from the uncertainty principle (see Section 3.5.4), that narrow functions yield broad Fourier transforms and
vice versa. So:
• A small slit spacing d yields a widely spaced Fourier transform F{Thin slit arrayd }.
• A large grating with many slits N yields a narrow Fourier transform F{Grating widthN }).
69
Figure 3.20: A diffraction grating can be described in terms of a multiplication and a convolution. Therefore, its
far field (i.e. Fourier transform) can be described by a convolution and a multiplication. Due to the uncertainty
relation that is inherent to the Fourier transform, the smallest feature in the grating (slit width) determines the
largest feature in the diffraction pattern (envelope), and vice versa. Moreover, whenever dimensions in the grating
(e.g. slit width) become smaller, the corresponding dimensions in the diffraction pattern (e.g. envelope) become
broader.
• A convolution calculates the sum of impulse responses that are shifted and weighted according
to some system input.
• A convolution becomes a product after Fourier transform and vice versa. This is called the
convolution theorem.
• Convolution with an Airy disk PSF is equivalent to low-pass filtering.
• The far-field diffraction pattern of a diffraction grating can be understood qualitatively using
the convolution theorem and the uncertainty relation property of the Fourier transform.
In Fig. 3.1, we interpreted imaging systems in terms of wave optics: each object point emits a diverging spherical
wave, and to form an image, it has to be converted to a converging spherical wave. This converging spherical wave
is truncated by the aperture of the imaging system, so that each object point is mapped to an Airy disk, rather
than a single image point. How do we describe aberrations in wave optics?
Recall that the phase gradient of a field indicates the propagation direction of the field. For example, a plane
70
wave eik·r propagates in the direction of k. In terms of ray optics, we can say that the rays are perpendicular
to surfaces of constant phase (i.e. wavefronts). Therefore, if the ray directions are changed due to aberrations,
it means that the wavefront is distorted, see Fig. 3.21. So in wave optics, aberrations are described as a
wavefront error. In the ideal case, the wavefront in the exit pupil is spherical. Aberrations are described as a
phase multiplication eiW (kx ,ky ) in the exit pupil, which indicates by how much the wavefront deviates from a sphere,
and whose phase gradient indicates how the ray directions are changed. The Point Spread Function (PSF) is then
given by the Fourier transform of the aberration function
The PSF gives the same kind of information as the spot diagram does in ray optics, except the PSF also takes into
account diffraction effects, see Fig. 3.22. For example, for an ideal imaging system the spot diagram consists of a
single image point, whereas the PSF is an Airy disk.
Figure 3.21: Changes in the ray directions can be described as distortions in the wavefront, because we know that
rays are perpendicular to the wavefront. Therefore, aberrations can be described as a wavefront error.
We saw that in ray optics, general aberrations can be decomposed in specific aberrations such as spherical,
coma, astigmatism, field curvature, and distortion. In wave optics, the aberration function W (kx , ky ) is often
decomposed in functions called Zernike polynomials (often abbreviated to simply ‘Zernikes’), which are named after
their inventor Frits Zernike. I.e. one can write
X
W (kx , ky ) = cn Zn (kx , ky ). (3.126)
n
Just like in Seidel’s formalism there is spherical aberration, astigmatism, and coma, so too are there Zernike
polynomials Zn (kx , ky ) describing spherical aberrations, astigmatism, and coma. However, the following difference
should be kept in mind: whereas Seidel’s aberrations describe how the aberrations depend on the object point
h, Zernike polynomials only describe the wavefront error at the pupil, without assuming any dependence on the
object point. Therefore, Seidel’s ‘coma’ (which varies with h) is not quite identical with Zernike’s ‘coma’ function
(whose dependence on h would have to be specified separately). If the dependence on h is not explicitly stated, it
is understood that the aberration function W (kx , ky ) is the same for all object points. In that case, the PSF is the
same for all object points, which means the image can be computed via a simple convolution (see Eq. (3.119)).
71
No aberrations (Airy disk PSF)
Spherical aberration
Coma
Astigmatism
Figure 3.22: Several wavefront errors and their PSFs. Compare to the spot diagrams of Fig. 2.15. Information on
how to numerically simulate PSFs can be found in Chapter 7.
72
Aberrations are interpreted in wave optics as a wavefront error, because the wavefronts determine
the ray directions. A wavefront is a surface of constant phase, so the wavefront error is described
by a phase function in the exit pupil, which can be decomposed in Zernike polynomials. The Point
Spread Function (PSF) is given by the Fourier transform of the phase function, and it is analogous
to the spot diagram in ray optics. If the PSF is the same for all object points, the aberration image
can be described with a simple convolution.
3.6 Coherence
We started this chapter on wave optics by saying that the interference pattern in Young’s double-slit experiment is
convincing evidence for the wave nature of light. Indeed, if we shine laser light through a double slit, we can clearly
observe interference fringes. However, if we shine ordinary sunlight or light from a normal flashlight through the
same double slit, it is quite likely that we don’t see any interference fringes (or at least definitely not as clearly).
Why should this be the case? Does sunlight somehow not have a wave nature, and can only laser light be described
as a wave?
Let’s say we accept this conclusion. Let’s say we define two types of light: coherent light, which can interfere,
and incoherent light, which cannot interfere. Then we can say that laser light is coherent, and sunlight or light
from a flashlight is incoherent, because they can’t form interference fringes in a double slit experiment. But then
one might ask: why does a soap bubble look so colorful when viewed in ordinary sunlight? In Section 3.3.2 we said
that this is explained with thin-film interference. But didn’t we just say that ordinary sunlight is incoherent, and
therefore cannot display any interference effects?
• Why do different types of light (e.g. laser light or sunlight) show different patterns in a double-slit experiment?
• Why does one type of light (e.g. sunlight) show interference effects in one situation (thin film interference in
a soap bubble), but not in another (double-slit experiment)?
To answer these questions, we have to study the topic of coherence, which is a topic that is far from trivial. We will
see that light is not just either coherent or incoherent, but rather, light has a degree of coherence, which is a con-
tinuous scale that varies from fully incoherent to fully coherent. Sunlight has a low degree of coherence, while laser
light has a high degree of coherence. To display thin-film interference effects, a low degree of coherence is sufficient.
To display interference effects in a double slit experiment, the degree of coherence that is required depends on the
distance between the slits: the smaller the distance between the slits, the lower the required degree of coherence.
So in principle, sunlight can in fact display interference effects in a double slit experiment, as long as you put the
slits very closely together.
Videos:
• Demonstration of double-slit interference with sunlight: Veritasium - ‘The Original Double Slit Experiment’,
youtu.be/Iuv6hY6zsd0
• Spatial coherence and the double slit experiment: Thomas Young’s Double Slit Experiment’, youtu.be/CN-
wjj phVA
73
Figure 3.23: It may be impossible to define a phase difference between two fields if they have uncorrelated phase
fluctuations.
As a consequence, one cannot always define a fixed phase difference between two fields: the fields may be in
phase at one time, but out of phase at another time. Recall that these fluctuations occur on a very short time scale,
since optical fields oscillate at 1015 Hz. Therefore, we can only observe the time-averaged effect of these fluctuations.
If the phase fluctuates, it means that the field is not time-harmonic: its time-dependence is not e−iωt anymore.
This means that the light must consist of a finite range of wavelengths. If there are fewer random fluctuations, then
the wavelength range (also called the bandwidth) of the light is smaller. For a very narrow yet finite bandwidth,
the light ‘sort of’ consists of a single wavelength, but the finite bandwidth can still result in observable coherence
effects. Therefore, such light is called quasi-monochromatic.
So how do random phase fluctuations determine whether we see interference effects or not? Recall from Eq.
(3.16) how two complex-valued fields U1 , U2 with phase difference ∆ϕ interfere:
Now suppose ∆ϕ varies so rapidly in time, that we can only observe the time-average. If the fluctuations of ∆ϕ are
completely random, then the time average of cos ∆ϕ vanishes! In that case we observe
Note that there is no interference effect in this expression: |U1 |2 and |U2 |2 cannot cancel each other out to form
interference fringes, and there no mention of phase. We say that in Eq. (3.127) the fields add coherently, and in
Eq. (3.128) the fields add incoherently.
The time average of cos ∆ϕ does not need to vanish completely. It all depends on how much ∆ϕ fluctuates in
time. The more randomly it fluctuates, the smaller the contribution of the interference term cos ∆ϕ to the intensity.
In general, we can adapt Eq. (3.127) to find
Here, γ can take any value between 0 and 1, and it denotes the degree of coherence. The case γ = 0 corresponds
to full incoherence, and γ = 1 corresponds to full coherence. In a double slit experiment, the value of γ affects the
fringe visibility, where for γ = 0 no fringes are visible, while for γ = 1 and |U1 | = |U2 | the fringes are perfectly
visible, because the intensity can reach I = 0 due to destructive interference.
74
Figure 3.24: (In)coherence is caused by (un)correlated phase fluctuations. The fluctuations occur so rapidly, that
only the time-averaged effect can be observed. A phase difference between the fields in the two slits cause the
diffraction pattern to shift. Therefore, if the phase difference fluctuates rapidly, the interference pattern moves
around rapidly, and since we only observe time-averaged intensities, the interference minima get blurred out. The
degree to which the minima are blurred depends on the degree to which the phase difference fluctuates, which
depends on how much the phase fluctuations in the two slits are correlated.
75
slits). The coherence width of sunlight is small, but finite (of the order of 100µm1 ). So to see interference fringes in
a double slit experiment, the distance between the slits should be less than 100µm. However, the thickness of the
wall of a soap bubble is easily less than 1µm, so sunlight has no problem displaying thin-film interference effects.
76
We continue the discussion on coherence in Section 6.1.3.
Figure 3.25: The difference between spatial coherence and temporal coherence.
Videos:
• Optics: Fringe contrast - path difference — MIT Video Demonstrations in Lasers and Optics,
youtu.be/IZGnYe7BUms
• Optics: Coherence length and source spectrum — MIT Video Demonstrations in Lasers and Optics,
youtu.be/LixwAXsN8vg
77
Qualitatively speaking, an incoherent image in general looks cleaner than a coherent image, because in a coherent
image lots of fringes and speckles can be visible due to interference effects.
From Eq. (3.125) it follows that coherent image formation, in the presence of some aberration W (kx , ky ) which is
the same for all object points, can be described by
n o
Icoherent = F F{O(x, y)}(kx , ky ) · Aperture(kx , ky ) · eiW (kx ,ky ) (x, y). (3.132)
where OTF is the Optical Transfer Function, which can be calculated by inverse Fourier transforming the intensity
of the PSF, rather than its complex-valued field.
Figure 3.26: The difference between coherent imaging and incoherent imaging.
78
3.7 Exercise: dark field image of a grating
Goals:
• To understand how the Fourier transform of a sample relates to its diffraction orders.
• To understand how the capturing of diffraction orders affects image formation.
A plane wave with wavelength λ illuminates a grating with period (or ‘pitch’) p at an incident angle θ. A lens with
a certain NA is used to create an image of the grating.
1. Assume the grating is located at z = 0. Write down the expression for the incident field that illuminates the
grating.
2. Write down the general expression for the Fourier series for a periodic function that has period p. We assume
that this function describes the reflection function of the grating.
3. Using the answers to the previous two questions, write down the expressions for the directions of the reflected
diffraction orders. Compare this result to the expression you’d find using the optical path length difference
of rays.
4. What condition should λ, p, θ satisfy in order to create an image of the grating? What would the grating lines
look like in the image?
79
Chapter 4
Videos:
• ‘University level introductory optics course’: youtu.be/20jRuOFQTLM, time stamps: 1:26:03 - 1:47:01
• ‘05. Polarization’: youtu.be/RowMxWt4mVE
Polarization:
• Why does a Jones vector have only two components, while the electric field vector has three?
• Which Jones vectors correspond to horizontal/vertical/circular polarization, and why?
• Which Jones matrices describe horizontal/vertical polarizers, and why?
• What is Malus’ law and why does it hold?
• What is birefringence?
• How can by birefringence be used to change polarization states?
• Which Jones matrices describe half/quarter wave plates?
80
Fresnel equations:
• How do you demonstrate that the tangential components of the E and B fields must be continuous at an
interface?
• How do you find the B field given a time-harmonic plane wave E field?
4.1 Polarization
In Chapter 2, we described light as rays. This model explains how a pinhole camera works, and we used it to
analyze optical instruments such as microscopes and telescopes. In Chapter 3 we described light as waves. This
model explains why we see interference fringes in a double slit experiment. We saw how we could use interference
to separate different colors of light, and we saw how the wave nature of light fundamentally limits the resolution of
imaging systems.
Now, we introduce yet another more complete model of light. This time, we model light as an electromagnetic
wave, as proposed by James Clerk Maxwell in 1861. The most important difference with the wave model that we
discussed in Chapter 3, is that now we consider a vectorial wave equation as opposed to a scalar wave equation.
This means that the field doesn’t just have an amplitude ψ(x, y, z, t), but it has a direction that is described by the
components Ex (x, y, z, t), Ey (x, y, z, t), and Ez (x, y, z, t). This extra degree of freedom gives light the property of
polarization. We will see how to describe polarization mathematically, what types of polarization exist, and how we
can manipulate the polarization state of light. Moreover, we will see how polarization is used to derive the reflection
and transmission coefficients when light strikes an interface between two media. The equations that describe how
these coefficients depend on polarization and the angle of incidence are called the Fresnel equations.
81
and similar equations hold for Ey , Ez , Bx , By , Bz . We saw in Section 3.1.4 that a plane wave ei(k·r−ωt) is a solution
to the scalar wave equation. Therefore, we can write for each component of the electric a separate plane wave
solution
Ex (r) = Ax eik·r e−iωt ,
Ey (r) = Ay eik·r e−iωt , (4.3)
ik·r −iωt
Ez (r) = Az e e .
Since Ex , Ey , Ez have to satisfy their own separate scalar wave equations, one might think that the complex-valued
amplitudes Ax , Ay , Az can be chosen independently. However, this is not entirely true, because E also needs to
satisfy the Maxwell’s equation ∇ · E = 0. If we plug our plane wave solution into this equation, we get
Here, Ax and Ay are completely independent complex-valued amplitudes, which indicate how the field is oscillating
over time. To see how different choices of complex numbers Ax , Ay result in different polarization, recall what it
means to use complex notation (see Section 3.1.2), to convert a complex number to a physical field oscillation, we
must multiply with e−iωt and then take the real part
Ex (t) Ax −iωt
= Re e . (4.7)
Ey (t) Ay
82
Figure 4.1: Examples of Jones vectors corresponding to different states of polarization.
• Each component of the electric field satisfies the scalar wave equation.
• For a plane wave, the electric field oscillates perpendicularly to the propagation direction: E ⊥ k.
A horizontal polarizer transmits the horizontal field component, and it blocks the vertical component. A ver-
tical polarizer does the opposite. Therefore, these optical components can be described by the following Jones
83
matrices.
1 0 Ax Ax
Horizontal polarizer: =
0 0 Ay 0
| {z }
Jones matrix
(4.9)
0 0 Ax 0
Vertical polarizer: =
0 1 Ay Ay
| {z }
Jones matrix
In general, a linearly polarized field with field amplitude E0 can be written as a Jones vector of the form
Ax cos θ
= E0 . (4.10)
Ay sin θ
Here, θ denotes the direction of polarization. E.g. θ = 0 corresponds to horizontally polarized light, and θ = π/2
corresponds to vertically polarized light. In general, the intensity of an electric field is given by the squared length
of the field vector
I = |E|2 = |Ex |2 + |Ey |2 + |Ez |2 . (4.11)
Therefore, the intensity of the field in Eq. (4.10) is
I0 = |E0 |2 , (4.12)
due to the fact that cos2 θ + sin2 θ = 1. Now let’s see what happens when we send linearly polarized light with
polarization angle θ through a horizontal polarizer. We know that only the horizontal field component gets trans-
mitted
Ax cos θ E0 cos θ
= E0 −→ . (4.13)
Ay sin θ Horizontal polarizer 0
So the intensity after a polarizer is given by
Now that we know what polarizers do and how they can be described mathematically using Jones matrices, let’s
look at wave plates. A wave plate consists of a birefringent material, which means that different field components
84
Ex , Ey see different refractive indices nx , ny . Therefore, when light propagates through such a medium with
thickness d, the two field components accumulate different phase shifts eiknd
ikn d
Ax eiknx d
Ax e x 0 Ax
→ = . (4.15)
Ay 0 eikny d Ay Ay eikny d
This factor eiknx d is typically ignored because only the phase difference between Ax and Ay matters for the polariza-
tion state. In other words, a global phase such as eiknx d which doesn’t affect and phase difference is usually irrelevant.
To convert linearly polarized light to circularly polarized light we require k, ∆n, and d to be such that we ob-
tain the following Jones matrix
1 0 1 1
Quarter wave plate: = (4.17)
0 i 1 i
| {z }
Jones matrix
It turns linearly polarized light to circularly polarized light and vice versa. Note that i = eiπ/2 , where π/2 is a
quarter of 2π. Because a 2π phase shift corresponds to one full wave oscillation, it’s called a ‘quarter wave’ plate.
Similarly, we can define a half wave plate, which introduces a phase shift of π, i.e. eiπ = −1.
1 0 Ax Ax
Half wave plate: = (4.18)
0 −1 Ay −Ay
| {z }
Jones matrix
So we see that this wave plate mirrors the polarization vector around the x-axis. By rotating the wave plate, we
can make it mirror the polarization vector around any axis, so a half wave plate can be used to arbitrarily rotate
direction of linearly polarized light without loss of intensity.
85
• A polarizer transmits one direction
of polarization, and blocks theother. The Jones matrix for
1 0 0 0
a horizontal polarizer is , for a vertical polarizer it is . The transmitted intensity
0 0 0 1
is given by Malus’ law I = I0 cos2 θ, where θ is the angle between the input polarization vector
and the polarizer’s transmission axis.
• A wave plate introduces a phase shift ∆ϕ between the Ex and Ey components of the field,
because they experience different refractive indices nx , ny . This property is called birefringence.
• A phase
1
shift of ∆ϕ = π/2 yields a quarter wave plate (π/2 = 4 (2π)). The Jones matrix is given
1 0
by (because eiπ/2 = i). Such a wave plate is used to convert linearly polarized light to
0 i
circularly polarized light and vice versa.
• A
phase
1
shift of ∆ϕ = π yields a half wave plate (π = 2 (2π)). The Jones matrix is given by
1 0
(because eiπ = −1). Such a wave plate is used to rotate the polarization state of linearly
0 −1
polarized light.
Videos:
• ‘Optics: Quarter-wave plate — MIT Video Demonstrations in Lasers and Optics’:
https://youtu.be/EBVNbRN805o
• ‘Optics: Half-wave plate — MIT Video Demonstrations in Lasers and Optics’:
https://youtu.be/ sUVXHfUVsY
J = Ax x̂ + Ay ŷ. (4.19)
Now we want to describe describe the same vector J using rotated basis vectors x̂θ , ŷθ . The rotated basis
vectors relate to the unrotated basis vectors as follows (see Fig. 4.4)
86
Plugging this into Eq. (4.19) allows to express J in terms of the rotated basis vectors
We thus see that we find the coefficients Ax,θ , Ay,θ in the rotated coordinate system from the coefficients
Ax , Ay in the unrotated coordinate system as follows
Ax,θ cos θ sin θ Ax
=
Ay,θ − sin θ cos θ Ay
(4.23)
Ax
= R−θ .
Ay
2. Apply the Jones matrix of the rotated polarizer/wave plate: assuming that we’ve chosen our rotated
basis vectors x̂θ , ŷθ such they align with the transmission/extinction axes of the polarizer, or the ordinary/ex-
traordinary axes of the birefringent material of the wave plate, then we can simply apply the Jones matrix in
its diagonal form. E.g. in the case of a polarizer
Ax,θ 1 0 Ax,θ Ax,θ
→ = , (4.24)
Ay,θ 0 0 Ay,θ 0
3. Return to the original, unrotated basis: after finding the transmitted Jones vector in the rotated basis,
we return to the original, unrotated basis by inverting the relation in Eq. (4.23)
Ax cos θ − sin θ Ax,θ
=
Ay sin θ cos θ Ay,θ
(4.26)
Ax,θ
= Rθ .
Ay,θ
If we call the Jones matrix of the unrotated polarizer/wave plate M , then putting all three steps together gives
Ax Ax
→ Rθ M R−θ . (4.27)
Ay Ay
Therefore, given a Jones matrix M of an unrotated polarizer/wave plate, the Jones matrix of the rotated polariz-
er/wave plate is given by Rθ M R−θ .
87
Figure 4.4: The same Jones vector J can be represented in different bases. E.g. the representation of J in the
xy basis is different from its representation in the x′ y ′ basis. In this case, the two representations are related by a
rotation matrix R−θ .
Mathematically, we can describe the correlation between the fluctuations of the two field components using the
time average
⟨Ex (t)∗ Ey (t)⟩ . (4.28)
Note that by taking the complex conjugate of Ex , we are looking at the phase difference between Ex (t) and Ey (t)
This makes sense, because it’s the phase difference between the Ex and Ey components that defines the polarization
state. So if this phase difference ∆ϕ fluctuates randomly, then the polarization state becomes undefined. And indeed,
if ∆ϕ fluctuates randomly in time, then the time average of Eq. (4.28) will become 0, so indeed this quantity appears
to be a measure of the degree of polarization. However, this time average will also vanish if one component is simply
0, but if Ex (t) = 0 and Ey (t) ̸= 0, then the light has a very definite state of polarization: it is simply linearly
polarized. So it’s not sufficient to only keep track of the correlation between the field components as in Eq. (4.28),
88
but we must also keep track of the average intensity of each component separately. The average intensity of one
component is given by
⟨|Ex (t)|2 ⟩ = ⟨Ex (t)∗ Ex (t)⟩ . (4.30)
The two quantities of Eqs. (4.28) and (4.30) are conveniently summarized in a single matrix that’s called the
coherency matrix G
Let’s have a look at a few examples for coherency matrices for different states of polarization:
1
• Linearly (horizontally) polarized light: the Jones vector is J = , so the coherency matrix is
0
1 0
Ghorizontal = (4.32)
0 0
1
• Linearly (45 /diagonally) polarized light: the Jones vector is J =
◦ √1 , so the coherency matrix is
2 1
1 1 1
Gdiagonal = (4.33)
2 1 1
1
• Right circularly polarized light: the Jones vector is J = √1 , so the coherency matrix is
2 −i
1 1 i
Gcircular = (4.34)
2 −i 1
• Unpolarized light: since there is no fixed relation between the Ex and Ey components of the electric field,
we cannot define a Jones vector. To find the coherency matrix, note that the average intensities for Ex (t) and
Ey (t) must be equal, because unpolarized light should show no preference for one polarization or another.
Moreover, the Ex (t) and Ey (t) must be completely uncorrelated. This yields the coherency matrix
1 1 0
Gunpolarized = . (4.35)
2 0 1
These four examples of coherency matrices are of particular importance, because they are used to decompose general
coherency matrices. From Eq. (4.31) it can be seen that in general, the diagonal elements are positive, real-valued,
and add up to the total average intensity. The off-diagonal elements can be complex-valued and are each other’s
complex conjugates. So if we assume that the total average intensity is normalized to 1, we can write an arbitrary
coherency matrix as
1 1 + S1 S2 + iS3
G=
2 S2 − iS3 1 − S1
(4.36)
1 1 0 1 0 0 1 0 i
= + S1 + S2 + S3 .
2 0 1 0 −1 1 0 −i 0
These four matrices are in the context of quantum mechanics known as the Pauli matrices, and the coherency matrix
is in quantum mechanics known as the density matrix. The coefficients S1 , S2 , S3 are known as Stokes parameters,
and they can put in a vector
1
S1
. (4.37)
S2
S3
89
We see that the following choices of parameters lead to the following coherency matrices
1 1 1 1
1 0 0 0
→ Ghorizontal , → Gdiagonal , → Gcircular , → Gunpolarized . (4.38)
0 1 0 0
0 0 1 0
So we see how the Stokes parameters, with which one describe arbitrary coherency matrices, are related to certain
states of polarization. We will now see how this fact will help us determine the Stokes parameters experimentally,
so that we can quantitatively measure the polarization state for partially polarized light.
Recall that ultimately, in experiments we can only measure intensities. Given a certain coherency matrix G,
the total intensity is given by the sum of its diagonal elements, which is called the trace of G, and is denoted as
Tr{G}
I = ⟨|Ex (t)|2 ⟩ + ⟨|Ey (t)|2 ⟩ = Tr{G}. (4.39)
Moreover, recall that in this case of fully polarized light, we could describe a manipulation of the polarization state
by a Jones matrix M that acts on a Jones vector J. Since in the fully polarized case (when the Jones vector is
well-defined), the coherency matrix is given by the outer product of the Jones vector with itself (i.e. G = JJ† , where
† denotes the Hermitian conjugate, see Eq. (4.31)), we can deduce that a Jones matrix M transforms a coherency
matrix G in the following way:
Therefore, if we send partially polarized light with a coherency matrix G through some optical elements (e.g.
polarizers and wave plates) which are described by a Jones matrix J, then the transmitted intensity is given by
I = Tr{M GM † }. (4.41)
Inspired by Eq. (4.38), we want to find the Jones matrices M that correspond to
• Selecting horizontally polarized light (i.e. a horizontal polarization filter).
• Selecting diagonally polarized light.
• Selecting circularly polarized light.
To find the corresponding Jones matrices M , we note the following: the matrix M should be such that it has
eigenvalues 0 (completely blocks the light) and 1 (completely transmits the light). The eigenvector corresponding to
the eigenvalue of 1 should be the Jones vector of the polarization state that we want to select, and the eigenvector
corresponding to the eigenvalue of 0 should be perpendicular to it. Therefore, to define a filter that selects a
polarization state with normalized Jones vector J, we can simply define the Jones matrix as M = JJ † , which
happens to be the same as the expression for the coherency matrix of that state of polarization. Therefore
1 0
Mhorizontal = Ghorizontal = , (4.42)
0 0
and the same goes for the states of diagonal and circular polarization (see Eqs. (4.32), (4.33) and (4.34)).
Now consider the general expression for the coherency matrix in terms of S1 , S2 , S3 as in Eq. (4.36), and cal-
culate the transmitted intensity (using Eq. (4.39)) for the polarization filters with Jones matrices Mhorizontal ,
Mdiagonal , Mcircular (as in Eq. (4.42)). We find
† 1
Ihorizontal = Tr{Mhorizontal Ghorizontal Mhorizontal }= (1 + S1 ),
2
1
Idiagonal = (1 + S2 ), (4.43)
2
1
Icircular = (1 + S3 ).
2
Therefore, by measuring the intensity that is transmitted by polarization filters for horizontally, diagonally, and
circularly polarized light, one can fully determine the state of polarization of a partially polarized beam.
90
The three Stokes parameters S1 , S2 , S3 can be interpreted geometrically as the coordinates of a point in the Poincaré
sphere (in the context of quantum mechanics this is known as the Bloch sphere). If the light is fully polarized (i.e.
it can be described by a Jones vector), then this point lies on the surface of the sphere with radius 1. If the
light is fully unpolarized, then it lies at the origin (i.e. at the center of the sphere). Therefore, the length of the
S1 p
vector S2 , which is S12 + S22 + S32 , is a measure for the degree of polarization, where a length of 1 indicates
S3
fully polarized light, and a length of 0 indicates completely unpolarized light. To see that this is true, consider an
arbitrary normalized Jones vector
|Ax |
J= , (4.44)
|Ay |eiϕ
where |Ax |2 + |Ay |2 = 1. Recall that only the phase difference matters for the state of polarization, so we don’t
need to define a phase for the x component. The corresponding coherency matrix is given by
|Ax |
|Ax | |Ay |e−iϕ
G= iϕ
|Ay |e
|Ax |2 |Ax ||Ay |e−iϕ
= (4.45)
|Ax ||Ay |eiϕ |Ay |2
|Ax |2
|Ax ||Ay |(cos ϕ − i sin ϕ)
= .
|Ax ||Ay |(cos ϕ + i sin ϕ) |Ay |2
S1 = |Ax |2 − |Ay |2 ,
S2 = 2|Ax ||Ay | cos ϕ, (4.46)
S3 = −2|Ax ||Ay | sin ϕ.
To see that in the case of unpolarized light the length is 0, simply observe that according to Eq. (4.36), fully
unpolarized light corresponds to S1 = S2 = S3 = 0.
91
Figure 4.5: The Stokes parameters can be interpreted as a point on the Poincaré sphere.
So far, we have always assumed that the total intensity is normalized to 1. Therefore, in the Stokes vectors
in Eq. (4.37), the first entry is always 1, which seems somewhat redundant. In the more general case, we don’t
normalize the intensity to 1, but rather the total intensity becomes the zeroth Stokes parameter S0 . Therefore, the
Stokes vector is defined as
S0
S1
, (4.48)
S2
S3
where the values of the parameters can be found experimentally via (compare to Eq. (4.43))
S0 = Itot
S1 = 2Ihorizontal − Itot ,
(4.49)
S2 = 2Idiagonal − Itot ,
S3 = 2Icircular − Itot .
To determine the point on the Poincaré sphere that a certain Stokes vector corresponds to, one first needs to nor-
malize the Stokes vector to S0 = 1. In the same way that a 2 × 2 Jones matrix describes a manipulation of a 2 × 1
Jones vector, a 4 × 4 Mueller matrix is used to describe a manipulation of a 4 × 1 Stokes vector.
92
• Partial polarization has a similar physical cause as coherence: the phase difference between
the two field components Ex , Ey may fluctuate randomly and rapidly, and we can only measure
time-averages. Therefore, as the fluctuations become less correlated, the polarization state
becomes less well-defined.
• The coherency matrix describes the average intensities of the two field components, and their
correlation.
• A general coherency matrix can be decomposed using the Stokes parameters S0 , S1 , S2 , S3 . The
values of these parameters can be found experimentally by measuring the total intensity, and
measuring the intensity transmitted by polarizers that transmit horizontally polarized light,
diagonally polarized light, and circularly polarized light.
• The Stokes parameters S1 , S2 , S3 can be used to define a point in the Poincaré sphere. A point
on the surface of the sphere describes fully polarized light. The lower the degree of polarization,
the closer to the origin the point lies.
• Manipulations to a Stokes vector can be described using Mueller matrices.
With this expression, we can see what happens to the intensity if we add different electric field vectors E1 , E2
together. If we assume that both fields propagate in the z-direction, then Ez = 0, so we only bother writing down
Ex , Ey . We see that if the two fields have parallel polarizations, then the field components add coherently, so we
can observe interference effects 2
Ex1 Ex2 2
0 + 0 = |Ex1 + Ex2 | . (4.51)
(Recall the difference between coherent addition and incoherent addition from Eq. (3.128): coherent addition
means |U1 + U2 |2 , incoherent addition means |U1 |2 + |U2 |2 .) If the fields have orthogonal polarizations, then the
field components add incoherently, so we cannot observe interference effects
2
Ex1 0 2 2
0 + Ey2 = |Ex1 | + |Ey2 | . (4.52)
If two fields are unpolarized, but mutually coherent, then they can interfere with each other as if they had parallel
polarizations
* 2 +
Ex1 (t) Ex2 (t)
Ey1 (t) + Ey2 (t)
= ⟨|Ex1 (t) + Ex2 (t)|2 + |Ey1 (t) + Ey2 (t)|2 ⟩
(4.53)
= 2 ⟨|Ex1 (t) + Ex2 (t)|2 ⟩ (because the light is unpolarized)
In the final line, Ex1 and Ex2 are coherent with each other, so interference effects will be visible.
Videos:
• ‘Optics: Fringe contrast - polarization difference — MIT Video Demonstrations in Lasers and Optics’:
https://youtu.be/ uKBaTKZa6c
93
the wave fronts at the interface. Now, we will see how much light is reflected and transmitted at an interface, by
requiring continuity of the tangential components of the electric field E and magnetic field B at the interface. This
requirement of continuity follows from Maxwell’s equations. The resulting equations that describe the transmission
and reflection coefficients are called the Fresnel equations.
We use Stokes’ theorem to write a surface integral over ∇ × E as a contour integral along the boundary of the
surface
Z Z
(∇ × E) · n̂ dA = E · ds
surface A countour s
Z (4.55)
∂B
=− · n̂ dA.
surface A ∂t
Here, n̂ denotes the unit vector that is perpendicular to the surface A, and in the last line we plugged in Faraday’s
law from Eq. (4.54). The surface A that we choose to integrate over is a narrow stripR that crosses the interface
(see Fig. 4.6). We can make the strip arbitrarily narrow so that the surface integral − A ∂B ∂t · n̂ dA vanishes, and
the contour s consists only of the lines across the top and the bottom of the surface (the lines crossing the interface
will become arbitrarily small). Therefore, Eq. (4.55) reduces to
Z b
Ex,1 (x) − Ex,2 (x) dx = 0. (4.56)
a
Here, Ex,1 and Ex,2 are the tangential components of the electric field on either side of the interface. Since the
boundaries a and b are arbitrary, we obtain
Therefore, the tangential components of the electric field should be continuous across the interface, and the same
holds for the tangential components of the magnetic field if there are no surface currents along the interface, and
the two media are non-magnetic (i.e. they have identical µ).
94
Figure 4.6: From Maxwell’s equations, it can be derived that the tangential components of the electric and magnetic
field should be continuous at the interface (without surface currents) between two non-magnetic media.
95
interface. So if one side of the interface are the incident field Ei and reflected field Er , and on the other side is the
transmitted field Et , then we require for their tangential components
To find the tangential component of the electric field E, we must first know the state of polarization: if we have an
incident plane wave with wave vector ki , the electric field could point in any direction that is perpendicular to ki .
Therefore, let’s define two convenient orthogonal states of polarization, a linear combination of which can be used
to describe any other state of polarization:
• S-polarization / Transverse Electric (TE) polarization: the electric field E is tangential to the interface
(hence ‘transverse electric’ polarization). E is perpendicular (in German: ‘senkrecht’, hence ‘S’-polarization)
to the plane of incidence.
• P-polarization / Transverse Electric (TM) polarization: the magnetic field B is tangential to the
interface (hence ‘transverse magnetic’ polarization). E is parallel (hence ‘P’-polarization) to the plane of
incidence.
We will find the transmission and reflection coefficients for these two states of polarization by calculating E and B
using Eq. (4.61), calculating their tangential components, and enforcing continuity across the interface according to
Eqs. (4.62) and (4.63). The incident medium has refractive index n1 , and the transmitting medium has refractive
index n2 . If we define the angle of incidence θi and angle of transmission θt , then for TE polarization we get
Ei + Er = Et Continuity of E∥
(4.64)
n1 Ei cos θi − n1 Er cos θi = n2 Et cos θt Continuity of B∥ ,
which is solved by
Er n1 cos θi − n2 cos θt
=
Ei n1 cos θi + n2 cos θt
(4.65)
Et 2n1 cos θi
= .
Ei n1 cos θi + n2 cos θt
For TM polarization we get
which is solved by
Er n2 cos θi − n1 cos θt
=
Ei n2 cos θi + n1 cos θt
(4.67)
Et 2n1 cos θi
= .
Ei n2 cos θi + n1 cos θt
96
Figure 4.7: Derivation of the Fresnel coefficients for S (TE) polarization and P (TM) polarization.
• The tangential components of the electric and magnetic fields must be continuous at the in-
terface between two media with refractive indices n1 , n2 (assuming the media are non-magnetic
and there are no surface currents). This follows from the Maxwell’s equation that involve ∇ × E
and ∇ × B.
• We define two polarization states: S-polarization/TE-polarization and P-polarization/TM-
polarization. In TE-polarization, E is tangential to the interface, whereas in TM-polarization,
B is tangential to the interface.
• Starting from the principle that the tangential components of both E and B must continuous
across the interface, and using the relation between E and B that follows from Faraday’s law,
we can derive the reflection and transmission coefficients. The expressions for these coefficients
are given by the Fresnel equations, and they depend on the refractive indices n1 , n2 , the angle
of incidence, and the state of polarization.
Videos:
• ‘Optics: Reflection at the air-glass boundary — MIT Video Demonstrations in Lasers and Optics’:
https://youtu.be/hJfqUAKMEdw
• ‘Optics: Reflection at the glass-air boundary — MIT Video Demonstrations in Lasers and Optics’:
https://youtu.be/mNFRaM-2cvg
97
4.2.4 Brewster angle
From the Fresnel equations, it follows that there is an angle of incidence θi for which the TM reflection coefficient is
0, and the TE reflection coefficient is non-zero. This angle of incidence is called the Brewster angle θB . This means
that if an unpolarized plane wave strikes the interface at the Brewster angle θB , the reflected light will be fully TE
polarized, i.e. the electric field will oscillate parallel to the interface. This is why polarized sunglasses can reduce
the glare from reflective surfaces such as water by having vertical polarizers. By setting Er /Ei in Eq. (4.67) equal
to 0 and solving for θi gives the expression for the Brewster angle
−1 n2
θB = tan . (4.68)
n1
Figure 4.8: The Brewster angle is the angle for which the reflection coefficient for P (TM) polarized is zero.
Mathematically, radiation pressure can be derived from the Lorentz force law. This law expresses how electric
98
fields and magnetic fields exert a force on charges and currents:
f = ρE + J × B. (4.69)
Here, f denotes the force per unit volume, ρ is the charge per unit volume, and J is the current per volume. Using
Maxwell’s equations, one can eliminate ρ and J from the equation, so that the force f is solely a function of the
fields E and B
1 ∂(E × B)
f = ϵ0 (∇ · E)E + (∇ × B) × B − ϵ0 − ϵ0 E × (∇ × E). (4.70)
µ0 ∂t
Now the question is: which of these terms denote the force that is carried by radiation, as opposed to static fields?
Radiation consists of continually oscillating fields, so if no fields are oscillating, the force term that corresponds to
radiation force should vanish. Therefore, the force density due to radiation must be given the term that contains
the time derivative (because that’s the term that vanishes if nothing is oscillating)
∂(E × B)
f = ϵ0 . (4.71)
∂t
Through dimensional analysis, we can infer that
1
S = c2 ϵ0 (E × B) = E×B (4.72)
µ0
denotes the intensity (energy per unit area per unit time) that is carried by the radiation. This vector is called the
Poynting vector. The pressure (force per unit area) is then given by S/c.
Figure 4.9: Derivation of the Poynting vector from the Lorentz force law and Maxwell’s equations.
Note that the expression we found for the Poynting vector denotes the instantaneous intensity, but we only
measure time averages (see Section 3.1.2). By using the relation between E and B (see Eq. (4.61)) and the fact
99
that the time average of cos2 (ωt) is 1/2, we find that the time-averaged intensity is given by
1
I = ⟨|S(t)|⟩ = cϵ0 E02 , (4.73)
2
and the time-averaged pressure is I/c. So in case a surface fully absorbs the radiation, the pressure is P = I/c, but
in case a surface is fully reflective, then by conservation of momentum the exerted pressure is P = 2I/c.
100
4.4 Exercise: wave plate thickness
Goals:
• Given an incident wavelength λ and refractive indices nx , ny for x-polarized and y-polarized light, for which
thicknesses d does the wave plate act as quarter wave plate?
• Given the thickness d found in the previous question, for which wavelength λ does the wave plate act as a
half wave plate (assuming nx , ny are constant with wavelength)?
101
Chapter 5
Solutions to exercises
1. To find the location of the virtual image, we must find the location where the rays that reach the eye’s pupil
intersect if there were extended. This point of intersection can be found using Snell’s law and simple geometry
(see above figure).
2. If the image were perfect, then all the rays emitted by the object (i.e. for any angle θ) would intersect in
the same location. However, the formula of the previous question indicates that rays with different angles θ
intersect at different depths d′ . Therefore, the image is not perfect, which means it is aberrated.
3. Rays with different angles θ intersect at different depths, which is characteristic of spherical aberration.
102
2π
1. Evaluating the expression for a plane wave eik·r at z = 0 and with ky = 0 and kx = −k sin θ gives e−i λ x sin θ .
P∞ in 2π
p x.
2. n=−∞ An e
3. The reflected field is found by multiplying the incident field with the reflection of the sample. This gives
λ x(n p −sin θ ) . Therefore, the direction θ
i 2π
P∞ λ
th
n=−∞ An e n of the n diffraction order is given by sin θn = sin θ −
λ
np.
To obtain this result using path length differences, observe that for two points on the grating separated by
a distance p, the path length difference of the incident rays is p sin θ. Rays that are reflected at an angle θn
have an additional path length difference − sin θn . In order for the reflected rays to interfere constructively,
we require that the net path length difference is an integer multiple of λ, i.e. p sin θ − p sin θn = nλ. This can
be rewritten as the result we found by expressing the reflection function as a Fourier series.
4. The lens can only capture rays whose angles satisfy | sin θn | < NA. If no diffraction angles θn satisfy this
condition, then no image of the grating is formed. Note that | sin θ| > NA, because the incident light does
not go through the lens. Therefore, only diffraction orders with n ≥ 1 can possibly go through the lens. The
condition for n = 1 reads | sin θ − λ/p| < NA. If the condition is met only for n = 1, then an image of the
grating is formed, but no grating lines are visible. This is because at least two orders need to interfere in
the image to produce fringes. If the condition is met for both n = 1 and n = 2, then the grating lines (i.e.
interference fringes) become visible in the image.
103
• The phase delay ∆ϕ between the x and y components must be 1
4 · 2π + m · 2π for a quarter wave plate. Using
∆ϕ = 2π λ 1
λ (ny − nx )d, we find d = ny −nx 4 + m .
• Plugging in the expression for d in the expression for ∆ϕ (but now with a different λ′ ) we find ∆ϕ =
1
+m
2π λλ′ 14 + m . This should be equal to 12 · 2π + m′ · 2π, so λ′ = λ 14+m′ .
2
104
Part II
105
Chapter 6
References:
• J.W. Goodman ‘Introduction to Fourier Optics’
Videos:
• ‘02. Angular Spectrum Method’: youtu.be/31072jVfIUE
• ‘03. Diffraction Integrals’: youtu.be/Y5DagMbPEGk
• ‘04. Coherence’: youtu.be/TRZSp2UcROU
Propagation of fields
• What are the two ways to conceptualize the propagation of light fields?
In Chapter 2 we studied light propagation and image formation using the ray model of light. In Chapter 3 we
106
saw that light can be more accurately described as waves, instead of rays. Therefore, we should be able to also
study light propagation and image formation using wave optics, and obtain more accurate results. We already
hinted at some connections between ray optics and wave optics:
• In Section 2.1 we saw that in ray optics each object point scatters rays in all directions, whereas in Section
3.1.5 we saw that in wave optics each object point emits a spherical wave that diverges in all directions.
• In Section 2.1 we saw to create an image, all diverging rays that come from a single object point must converge
to a single image point. In Section 3.5.4 we saw that if we make the diverging spherical wave from a single
point source converge to an image point, we can do no better than focus the light in a finite-sized spot called
the Point Spread Function (PSF). This leads to the diffraction limit, which states that the imaging resolution
is limited by the wavelength of light that is used to create the image (see Section 3.4).
In Section 3.5, we already touched upon the importance of the Fourier transform in wave optics. Here, we continue
this topic: we will see how we can describe light propagation and image formation in the wave model, and the
Fourier transform will play an extremely important and often recurring role in this.
First, we will see how we can model the propagation of monochromatic fields in free space. There are two ways to
conceptualize this propagation, and this will lead to different propagation formulas and different insights:
• Addition of propagated plane waves (Angular Spectrum Method) (Section 6.1.1): the field in a
certain plane is decomposed in plane waves. Each plane wave is propagated separately and the propagated
plane waves are added together in another plane to calculate the field there.
• Addition of impulse responses from point sources (Huygens’ principle, Rayleigh-Sommerfeld
formula) (Section 6.1.2): each point in the field in a certain plane acts as a point source that emits a
spherical wave. To compute the field at a point in another plane, we must add the contributions from all the
point sources together. By approximating the spherical wave fronts as parabolic wave fronts or planar wave
fronts, one can find the Fresnel propagation integral and the Fraunhofer propagation integral.
Then, we will use these propagation formulas to analyze how a thin lens forms an image. We will see in Section 6.2.1
that within the wave model, we can re-derive the thin lens equation f1 = s1o + s1i that we previously derived using
ray optics (see Eq. (2.12)). Moreover, we demonstrate in Section 6.2.2 that a lens performs a Fourier transform, as
was already claimed in Section 3.5.3.
107
• Propagate each plane wave separately to the final plane: we know from Eq. (3.22) that a plane
wave with wavelength λ (and wave number k = 2π λ ) must be of the form e
ik·r
= ei(kx x+ky y+kz z) , where
kx2 + ky2 + kz2 = k 2 . Therefore, if we know that in the plane z = 0 a plane wave assumes the value ei(kx x+ky y) ,
q
then in any other plane z it must assume the value ei(kx x+ky y+kz z) , where kz (kx , ky ) = k 2 − kx2 − ky2 . Or,
if we express it in terms of ν instead q of k, we find that the plane wave in a plane z assumes the value
kz 1
e2πi(νx x+νy y+νz z) where νz = 2π = λ2 − νx2 − νy2 .
• Add together all plane waves in the final plane: now that we know what each plane wave in the initial
plane z = 0 will look like in any other plane z, we can calculate the field in the plane z by adding all the
propagated plane waves together. Adding plane waves together with weights Û0 (νx , νy ) is done with the
inverse Fourier transform.
ZZ
Uz (x, y) = Û0 (νx , νy )e2πi(νx x+νy y+νz z) dx dy
q
2πiz λ12 −νx2 −νy2
= F −1 Û0 (νx , νy )e (x, y) (6.2)
kx ky
= F −1 Û0 , eikz z (x, y)
2π 2π
We can make a few remarks about the final result of Eq. (6.2). Firstly, note that we take the Fourier transform
of the initial field, multiply it by some transfer function eikz z , and then take the inverse Fourier transform. Recall
from Section 3.5.5 that this is a way to perform a convolution! So we can calculate the propagated field by con-
volving the initial field with some impulse response. This is the approach we will explore further in the next section.
q
Another remark concerns the expression for kz . We know that kz = k 2 − kx2 − ky2 . In principle, there is no
restriction on the values for kx , ky , so what happens to kz if kx2 + ky2 > k 2 ? In that case, kz becomes imaginary,
and eikz z will become e−|kz |z , which decays exponentially with the propagation distance z. This is an evanescent
field, as we already discussed in Section 3.1.7. Recall that a higher value of kx2 + ky2 corresponds to higher spatial
frequencies, which correspond to smaller features in an image (see Section 3.5.2). Therefore, saying that spatial
2
frequencies with kx2 + ky2 > k 2 = 2π λ can’t propagate (because they decay exponentially), means that any infor-
mation about features that are smaller than the wavelength λ will be lost as soon as the field has propagated by
several wavelengths: therefore the resolution of an image will be fundamentally limited by the wavelength of the
light that is used, unless we somehow capture information from the evanescent field.
108
Figure 6.1: Using the angular spectrum, we find that high spatial frequencies cannot propagate, but instead they
are evanescent fields.
This is the same conclusion that we drew in Section 3.4 where we derived the diffraction limit. In that Section,
we came to this conclusion by considering the diffraction from gratings: gratings with a smaller pitch yield a greater
diffraction angle, and if the pitch becomes smaller than the wavelength, the diffraction angle will be greater than
90◦ , which means the diffraction orders can’t propagate. In Eq. (3.110) we saw that a sinusoidal grating basically
consists of two plane waves with spatial frequencies ±νx , so that’s why our analysis using plane wave decomposition
yields the same result as the analysis using diffraction gratings.
To propagate an initial field U0 (x, y) by a distance z, we decompose the field into plane waves using a
Fourier transform, we propagate each plane wave by multiplying it with the phase factor eikz z , and
we add the plane waves together with an inverse Fourier transform. The spatial frequencies (kx , ky )
for which kx2 + ky2 > k 2 cannot propagate, because in that case kz becomes imaginary, so that the phase
factor eikz z becomes a decaying exponential e−|kz |z : the field that carries information about features
that are smaller than the wavelength λ is an evanescent field.
109
So let’s assume as our starting point the Rayleigh-Sommerfeld integral
z eikr ′ ′
ZZ
1
Uz (x, y) = U0 (x′ , y ′ ) dx dy . (6.3)
iλ r r
Here, r denotes the distance between the point (x′ , y ′ , 0) in the initial plane, and the point (x, y, z) in the final
plane, i.e. p
r = (x − x′ )2 + (y − y ′ )2 + z 2 . (6.4)
ikr
Eq. (6.3) says that each point (x′ , y ′ , 0) in the initial plane emits a spherical wave e r whose amplitude is equal
to the field U0 (x′ , y ′ ) at that point. However, this spherical wave is multiplied by a factor zr , which means that the
spherical wave ‘propagates more strongly forward than it does sideways’, i.e. this factor zr is largest when z = r
(‘forward direction’), and it vanishes when z = 0 (‘sideways direction’).
zeikr
Figure 6.2: The field that propagates through a small pinhole is the impulse response r2 .
To find the total field Uz (x, y) at the point (x, y) in the observation plane z, we must add together all the impulse
responses from all the points (x′ , y ′ ), i.e. we must integrate over x′ , y ′ .
110
Figure 6.3: The Rayleigh-Sommerfeld states that to propagate a field from an initial plane z = 0 to another plane
z, all the impulse response that originate from the initial plane must be summed together.
To find the commonly used Fresnel and Fraunhofer diffraction integrals, we find approximate expressions for r
under the assumption that z is sufficiently large. We approximate r in the following way:
• The r in the denominator are approximated as r = z.
• The r in the exponent is approximated using a Taylor expansion:
p
r = (x − x′ )2 + (y − y ′ )2 + z 2
r
(x − x′ )2 + (y − y ′ )2
=z 1+
z2
(x − x′ )2 + (y − y ′ )2
≈z 1+ (6.5)
2z 2
(x − x′ )2 + (y − y ′ )2
=z+ Fresnel approximation
2z
x2 + y 2 xx′ + yy ′
≈z+ − Fraunhofer approximation
2z z
In the Fresnel approximation we approximate the spherical wave fronts as parabolic wave fronts, and in the
Fraunhofer approximation we approximate them as plane waves for a small range of observation points (x, y),
such that the paraxial approximation is valid (see e.g. Sections 3.1.8 and 3.2.1).
One might ask: why should we approximate the r in the exponent differently than the r in the denominator? The
reason is that a small error in the exponent affects the value of the integrand much more than a small error in the
1
denominator. For example, for r ≫ λ/2, observe that r+λ/2 ≈ 1r , but eik(r+λ/2) = −eikr regardless of how large r
is compared to λ/2. Therefore, the r in the exponent must be approximated much more carefully than the r in the
denominator.
If we apply the approximations of Eq. (6.5) to the Rayleigh-Sommerfeld integral of Eq. (6.3), we find the fol-
111
lowing expressions for the Fresnel and Fraunhofer integrals:
eikz ik x2 +y2
ZZ
x′2 +y ′2 xx′ +yy ′
Uz (x, y) ≈ e 2z U0 (x′ , y ′ )eik 2z e−ik z dx′ dy ′
iλz
(6.6)
eikz ik x2 +y2
x′2 +y ′2 x y
= e 2z F U0 (x′ , y ′ )eik 2z , Fresnel propagation,
iλz λz λz
and
eikz ik x2 +y2
ZZ
xx′ +yy ′
Uz (x, y) ≈ e 2z U0 (x′ , y ′ )e−ik z dx′ dy ′
iλz
(6.7)
eikz ik x2 +y2 x y
= e 2z F {U0 (x′ , y ′ )} , Fraunhofer propagation.
iλz λz λz
Here we used k = 2π/λ to rewrite the integral as a Fourier transform. To remember these formulas more easily,
let’s introduce shorthand notation for the quadratic phase factor :
x2 +y 2
Qz (x, y) = eik 2z . (6.8)
ikz
If we ignore the constant factor eiλz (which is of little interest if we’re interested in a monochromatic field in only
a single plane z, because the factor doesn’t depend on x and y), then we can remember the formulas for Fresnel
propagation and Fraunhofer propagation as follows:
• Fresnel propagation: multiply the field with a quadratic phase factor, take the Fourier transform, multiply
with another quadratic phase factor
ZZ
2 +y 2 x′2 +y ′2 x′ x+y ′ y
ik x 2z
Uz (x, y) = e U0 (x′ , y ′ )eik 2z e−ik z dx′ dy ′
x y (6.9)
′ ′ ′ ′
= Qz (x, y)F {U0 (x , y )Qz (x , y )} , .
λz λz
• Fraunhofer propagation: take the Fourier transform of the field, multiply with a quadratic phase factor
ZZ
x2 +y 2 x′ x+y ′ y
Uz (x, y) = eik 2z U0 (x′ , y ′ )e−ik z dx′ dy ′
x y (6.10)
= Qz (x, y)F {U0 (x′ , y ′ )} , .
λz λz
If we only care about the intensity of the propagated field, then the quadratic phase factor Qz (x, y) becomes
irrelevant, and we may say that the far field is given by the Fourier transform of the initial field (as was
claimed in Section 3.5.3). Note that as the field propagates (z increases), the field simply expands without
changing its shape, because the Fourier transform is evaluated in (x/λz, y/λz).
So we now know how to derive the Fresnel and Fraunhofer approximations for field propagation. But when are
these approximations valid? Recall that in Eq. (6.5), we approximated the r in the complex exponential eikr . Note
that this complex exponential has a period of 2π. So if the error that we make when approximating r is εr , then
we require that
kεr ≪ 2π. (6.11)
So how large is the error εr that we make when approximating r? In the case of the Fresnel approximation,
√ that
error can be found by taking the next term in the Taylor expansion of r. The Taylor expansion of 1 + δ is
√ δ δ2
1+δ =1+ − + O(δ 3 ). (6.12)
2 8
′ 2 ′ 2
In Eq. (6.5) we have δ = (x−x ) z+(y−y
2
)
and we expand to second order. Therefore, if we define ρ2 = (x − x′ )2 +
′ 2
(y − y ) the error term is
ρ4
|εr | = zδ 2 /8 = 3 . (6.13)
8z
112
When we plug this result in the requirement that we defined in Eq. (6.11), and using k = 2π/λ, we find the following
requirement for the propagation distance z in order for the Fresnel approximation to be valid
z 1 ρ 4/3
≫ . (6.14)
λ 2 λ
Now let’s find the condition for the Fraunhofer approximation to be valid. In Eq. (6.5), we see that to make the
′2
+y ′2
Fraunhofer approximation, the term x 2z must be negligible. Let’s assume that the field in the initial plane is
restricted by an aperture with radius R, so that the maximum value of x′2 + y ′2 is R2 . Then we require according
to Eq. (6.11)
R2 R2 R2
k ≪ 2π ⇔ z ≫ ⇔ ≪ 2. (6.15)
2z 2λ λz
2
The dimensionless number F = R λz is called the Fresnel number, and it is commonly used to indicate whether
the Fraunhofer approximation is valid. It is usually understood that the condition for Fraunhofer propagation is
F ≪ 1. This condition is valid when the propagation distance and/or wavelength is very large compared to the
diffracting aperture. Moreover, keep in mind that the Fraunhofer diffraction formula of Eq. (6.10) assumes that
the observation points x and y are sufficiently close to the optical axis (z-axis) so that the paraxial approximation
is valid (see Eq. (3.40)).
According to the Rayleigh-Sommerfeld integral, each field point emits a spherical wave eikr /r that
is multiplied by an inclination factor z/r, which makes the wave propagate more in the forward
direction, and less in the sideways direction. To find the Fresnel and Fraunhofer approximations,
we approximate the r in the denominator as z, and we Taylor expand the r in the exponent. To
propagate a field using the Fresnel approximation, we multiply it with a quadratic phase factor, and
take the Fourier transform (assuming you don’t care about the phase of the propagated field). To
propagate a field using the Fraunhofer approximation, you simply take the Fourier transform of the
field (again ignoring the phase of the propagated field). The Fraunhofer approximation is valid when
the propagation distance is so large compared to the diffracting aperture, that the Fresnel number
is much smaller than 1.
First, we must be able to describe a partially coherent field mathematically. We saw in Section 3.6 that the
degree of coherence depends on the correlation between random field fluctuations at different points in space and
time. In the context of partial polarization, we saw in Equation (4.28) that this correlation can be computed with
the time average
⟨U (x1 , t + τ )U (x2 , t)∗ ⟩ . (6.16)
This time average denotes the correlation between the field fluctuations at two points x1 , x2 , and with a time
delay τ . We assume the fluctuations are wide-sense stationary, which means that correlation only depends on the
time difference τ , and not on the time t itself. Then, to fully describe a partially coherent field we would need a
5-dimensional function, which defines the correlation between each pair of points (x1 , y1 ), (x2 , y2 ), for each time
interval τ . This function is called the mutual coherence function Γ
Γ(x1 , y1 , x2 , y2 , τ ) = ⟨U (x1 , y1 , t + τ )U (x2 , y2 , t)∗ ⟩ . (6.17)
In case the field is quasi-monochromatic (i.e. it has high temporal coherence, but not necessarily high spatial
coherence), then the degree of correlation does not depend on the time difference τ , so we define the 4-dimensional
mutual intensity function
J(x1 , y1 , x2 , y2 ) = ⟨U (x1 , y1 , t)U (x2 , y2 , t)∗ ⟩ . (6.18)
113
To calculate how an arbitrary partially coherent field propagates from one plane to another, we decompose the
mutual coherence function Γ into coherent modes. First, we get rid of temporal incoherence by decomposing the
field into quasi-monochromatic fields of different wavelengths λ. For each λ we have a 4D mutual intensity Jλ that
describes the spatial coherence properties.
To decompose a 4D mutual intensity function J into coherent modes, we use a mathematical result that’s called
Mercer’s theorem 1 . This theorem guarantees that we can write any physically valid mutual coherence function J
as X
J(x1 , y1 , x2 , y2 ) = ϕn (x1 , y1 )ϕn (x2 , y2 )∗ . (6.19)
n
Here, each n corresponds to a coherent mode ϕn . Note that if we have only a single n, we obtain the mutual
intensity function J(x1 , y1 , x2 , y2 ) = ϕ(x1 , y1 )ϕ(x2 , y2 )∗ , which according to Eq. (6.18) is the mutual intensity of a
fully coherent field. So it is plausible that we can physically interpret ϕn (x, y) as a coherent field that propagates
according to Angular Spectrum / Rayleigh-Sommerfeld / Fresnel / Fraunhofer formulas. To find the time-averaged
intensity of a field from its mutual intensity, we must evaluate it in (x1 , y1 ) = (x2 , y2 )
I(x1 , y1 ) = J(x1 , y1 , x1 , y1 )
= ⟨|U (x1 , y1 , t)|2 ⟩
X (6.20)
= |ϕn (x1 , y1 )|2 .
n
In the final expression we see that to find the intensity, we must sum the coherent modes ϕn together incoherently.
So to summarize, to propagate a quasi-monochromatic spatially partially coherent field, we propagate each coherent
mode ϕn separately, and add them together incoherently to find the intensity.
We can use this result to derive the Van Cittert-Zernike theorem. This theorem says the following: if we have
a quasi-monochromatic spatially incoherent extended source, then the coherence function of its far field (i.e. in the
Fraunhofer regime) is given by the Fourier transformation of the source’s intensity distribution. This should make
some sense intuitively: if the incoherent source is a point source, then the field should be fully spatially coherent,
and indeed the Fourier transform of a point (delta function) is a constant function. The uncertainty relation that is
inherent to the Fourier transform states that the broader the incoherent source is, the less coherent its far field is.
Furthermore, we see that the farther away we put the observation plane from the source plane, the more coherent
the field becomes (because the Fraunhofer integral is a Fourier transform evaluated in (x/λz, y/λz) where z is the
distance between the source and observation planes). To derive this result mathematically, consider the decomposed
mutual intensity function of Eq. (6.19), and propagate each mode ϕn using the Fraunhofer propagation formula of
Eq. (6.10)
X x21 +y12 Z ∗
x′1 x1 +y1
′y x2 2 x′2 x2 +y2
′y
Z
1 2 +y2 2
Jz (x1 , y1 , x2 , y2 ) = eik 2z ϕn (x′1 , y1′ )e−ik z dx′1 dy1′ eik 2z ϕn (x′2 , y2′ )e−ik z dx′2 dy2′
n
x2 2 2 2 x′1 x1 +y1
′ y −x′ x −y ′ y
ZZ X
1 +y1 −x2 −y2 1 2 2 2 2
=e ik 2z [ϕn (x′1 , y1′ )ϕn (x′2 , y2′ )∗ ]e−ik z dx′1 dy1′ dx′2 dy2′
n
x2 2 2 2 x′1 x1 +y1
′ y −x′ x −y ′ y
ZZ
1 +y1 −x2 −y2 1 2 2 2 2
=e ik 2z J(x′1 , y1′ , x′2 , y2′ )e−ik z dx′1 dy1′ dx′2 dy2′ .
(6.21)
If we assume that the source is fully spatially incoherent, and has an intensity distribution I(x, y), then its mutual
intensity function is given by
114
where the delta functions indicate that the field in each point is only correlated with itself. Plugging this into Eq.
(6.21) gives
x2 2 2 2 x′1 (x1 −x2 )+y1
′ (y −y )
ZZ
1 +y1 −x2 −y2 1 2
Jz (x1 , y1 , x2 , y2 ) = eik 2z I(x′1 , y1′ )e−ik z dx′1 dy1′
(6.23)
x2 2 2 2
1 +y1 −x2 −y2 x1 − x2 y1 − y2
= eik 2z F{I(x, y)} , .
λz λz
Therefore, if we have two points (x1 , y1 ), (x2 , y2 ) in the observation plane, then the correlation of the fields depends
only on the distance between the two points, and it is given by the Fourier transform of the intensity distribution
of the source.
To describe a partially coherent field in a plane, we define the 5D mutual coherence function Γ
which describes that correlation between all pairs of points (x1 , y1 ), (x2 , y2 ) for all time differences τ .
If the field is quasi-monochromatic, we don’t care about the time difference, so we define the 4D
mutual intensity function J. To propagate a partially coherent field, we decompose it into coherent
modes that can each be propagated coherently, and then added incoherently to find the intensity. To
decompose a partially coherent field into coherent modes, we first separate the different wavelengths,
so that for each wavelength we have a quasi-monochromatic field with some mutual intensity function
Jλ . Then, each Jλ can be decomposed into coherent modes ϕn due to Mercer’s theorem. If we apply in
this way the Fraunhofer propagation to a quasi-monochromatic spatially incoherent extended source,
we can derive the Van Cittert-Zernike theorem, which states that the mutual intensity function in
the far field is given by the Fourier transform of the intensity distribution of the source.
6.2 Imaging
In Chapter 2 we analyzed image formation using the ray model of light. We saw that for an ideal thin lens, the con-
dition for imaging is given by the thin lens equation f1 = s1o + s1i (see Eq. (2.12)). In Section 3.5.4 we saw that with
the wave model we can describe image formation more accurately: each object point becomes a spread out Point
Spread Function (PSF) in the image. In the case of an ideal aberration-free imaging system (i.e. diffraction-limited
imaging), this PSF is an Airy disk. Because the Airy disk has a finite size, the imaging resolution is fundamentally
limited because of the wave nature of light.
In this section, we’re going to study in more detail how imaging systems can be analyzed using the wave model.
In particular, we’re going to see how the thin lens equation can be derived using Fresnel propagations, and we’ll
derive that an ideal thin lens performs a Fourier transform.
115
fronts (i.e. surfaces of constant phase) propagate outward. A field e−ikr propagates inwards, because as t increases,
r must decrease to keep the phase constant. Since we want to evaluate the field just after the lens, we plug z = 0
into Eq. (6.25), and we use Eq. (6.5) to apply the Fresnel approximation, which gives
Since we typically don’t care about a constant global factor, we can ignore the factor e−ikf /f to find the transmission
function for an ideal thin lens
x2 +y 2
T (x, y) = e−ik 2f . (6.28)
Note that in this expression, we ignore the fact that the lens has a finite aperture. In a more realistic model,
the transmission function would contain a step function Ap(x, y) that defines the aperture, and it can contain
aberrations which are given by some phase function eiW (x,y) . The aperture function and the aberration function
can be put together in a single function P (x, y)
x2 +y 2
T (x, y) = Ap(x, y) · eiW (x,y) · e−ik 2f
x2 +y 2
(6.29)
= P (x, y) · e−ik 2f
The reason why aberrations are described by a multiplication with a phase function, is because in the ray model,
aberrations describe how the directions of rays deviate from their ideal direction, and in the wave model, the prop-
agation direction is determined by the phase gradient (think of how the propagation direction of a plane wave is
determined by the phase function eik·r ). Therefore, to describe a change in the directions of rays (i.e. aberrations),
we must change local phase gradients, which means we must multiply by some smooth phase function eiW (x,y) .
We can now analyze the single thin lens imaging system using the following three steps: Fresnel propagate an
object field U (x, y) to the lens plane, multiply it with the lens transmission function T (x, y), and Fresnel propagate
the transmitted field to the image plane. To avoid clutter, we will use the shorthand notation for quadratic phase
factors as defined in Eq. (6.8), and use the formula for Fresnel propagation as given in Eq. (6.9). Note that the
thin lens transmission function of Eq. (6.29) can be written as T = P · Q−f
• Fresnel propagate the field from object plane to lens plane: U → Qso F{U Qso }
• Multiply the field with the lens transmission function: Qso F{U Qso } → P Q−f Qso F{U Qso }
• Fresnel propagate the field from the lens plane to the detector plane: P Q−f Qso F{U Qso } →
Qsi F{P Q−f Qso F{U Qso }Qsi }.
Also see Fig. 6.4. Now we can ask: what condition must hold for so , si , f such that we’ve formed an image of U ?
If we can ignore aberrations and the effect of a finite aperture, we can set P = 1. Furthermore, if we record an
image, we care only about the amplitude (or intensity) of the field, not the phase. Now note what happens when
we choose so , si , f such that the product Q−f Qso Qsi becomes 1: the field in the detector plane becomes
Taking a double Fourier transform of a function yields the same function except mirrored. And since Qsi and
Qso are quadratic phase factors which are not measured in the intensity, we find that choosing so , si , f such that
Q−f Qso Qsi = 1 is the imaging condition, because this yields a final field which is the same as the object field
(except mirrored). When we look at the definition of Q in Eq. (6.8), we see that this condition is equivalent to
requiring that f1 = s1o + s1i : so we’ve recovered the thin lens equation using wave optics!
If we now take P into account again, we obtain the final expression for the field in the image plane
116
Figure 6.4: To find the thin lens equation using the wave model, we Fresnel propagate the object field U from the
object plane to the lens plane, we multiply with the lens transmission function, and we Fresnel propagate from the
lens plane to the image plane.
If we measure only intensity, the phase factor Qsi becomes irrelevant. We are then left with U Qso , which we
Fourier transform, multiply with the aberration/aperture function P , and Fourier transform again. Recall from
Section 3.5.5 that basically we convolve U Qso with the Fourier transform of P
From what we know about image formation with the PSF (see Eq. (3.119)), we can identify F{P } as the PSF.
Indeed, if there are no aberrations, then P only describes the lens aperture, and the Fourier transform of a circular
aperture is the Airy disk, which we know to be the PSF of an aberration-free imaging system, so this all checks
out. But what do we do with the quadratic phase factor Qso with which we multiply the object field U ? It doesn’t
seem to appear in Eq. (3.119). Ignoring Qso is usually justified for the following reason: in a somewhat decent
imaging system, the PSF is fairly localized. Therefore, when we convolve U Qso with the PSF, only points that are
very close to each other start to interfere with each other in the image plane due to the finite width of the PSF.
Since the phase function Qso varies very smoothly, two nearby are multiplied with almost the same phase, which
means that the way they interfere almost doesn’t change: it’s as if there were no phase multiplication Qso to begin
with (assuming we only care about image amplitude, not phase). Therefore, it is commonly understood that the
image field Uim is given by a convolution of the object field U with the PSF, which is the Fourier transform of the
aberration/aperture function P
Uim = U ⊗ F {P }
(6.33)
= U ⊗ PSF.
117
derivation of this result. We do this by propagating the field in the front focal plane U−f (x, y) to the lens plane
using Angular Spectrum propagation (with the Fresnel approximation) to find the field U0 (x, y) that enters the
lens. The field that is transmitted by the lens is then given by U0 (x, y)Q−f , and this field is Fresnel propagated by
the focal distance f to find Uf (x, y).
• Angular spectrum propagate from front focal plane to lens plane: If the field in the front focal plane
z = −f has a Fourier transform Û (νx , νy ), then according to the Angular Spectrum method (see Eq. (6.2)),
the field at the lens plane z = 0 has a Fourier transform
Û0 (νx , νy ) = Û−f (νx , νy )eif kz
q
1
2πif λ2
−νx2 −νy2
= Û−f (νx , νy )e
2πif
2 +ν 2
νx y
(6.34)
λ 1−λ2 2
≈ Û−f (νx , νy )e
2πif 2 2
=e λ Û−f (νx , νy )e−πiλf (νx +νy ) ,
where in the third line we applied the Fresnel approximation.
x2 +y 2
• Multiply the field with the lens transmission function: U0 (x, y) → U0 (x, y)Q−f = U0 (x, y)e−ik 2f .
• Fresnel propagate the transmitted field from the lens plane to the back focal plane:
Uf (x, y) = Qf F{U0 (x, y)Q−f Qf }
2 +y 2
ik x 2f x y
=e Û0 , see Eq. (6.9)
λf λf
(6.35)
x2 +y 2 2πif x y πi 2 2
= eik 2f e λ Û−f , e− λf (x +y ) plug in Eq. (6.34)
λf λf
2πif x y
=e λ Û−f , use k = 2π/λ.
λf λf
2πif
Since e λ is a global phase factor that does not depend on (x, y), we can ignore it, and we find that
x y
Uf (x, y) = Û−f λf , λf , i.e. the field in the back focal plane z = f is given by the Fourier transform of the
field in the front focal plane z = −f .
Note that in this derivation, we assumed the lens transmission function to be Q−f , i.e. we ignore any effects due
to the finite lens aperture and lens aberrations which would be described with an additional function P (x, y) (see
Eq. (6.29)).
Figure 6.5: To demonstrate that an ideal thin lens performs a Fourier transform, we propagate the field in the front
focal plane to the lens plane using the Angular Spectrum method (Fourier space multiplication), we multiply the
field with the lens transmission function (real space multiplication), and we Fresnel propagate the transmitted field
to the back focal plane.
118
The reason why this result is important, is because it allows for optical Fourier filtering. Suppose you put a
sample in the front focal plane of a first lens, and you place a second lens whose front focal plane coincides with
the back focal of the first lens. In the back focal plane of the second lens you put the detector (see Fig. 6.6). In
this setup, the first lens performs a Fourier transform of the field that is transmitted by the sample, and the second
lens applies another Fourier transform. If the two lenses have the same focal length f , then the sample and the
detector are a distance 4f apart: hence this system is also called a 4f-system. In such a system, one can physically
filter the Fourier transform in the back focal plane of the first lens: specific spatial frequencies can be blocked or
transmitted, or even phase shifted (which is important for e.g. Zernike phase contrast microscopy). A convincing
demonstration of Fourier filtering is the Abbe-Porter experiment, where the Fourier transform of a wire mesh can
be filtered such that only the horizontal or vertical wires are imaged.
Figure 6.6: In a 4f -system, the Fourier transform of the object field can be physically accessed to generate a
Fourier-filtered image. In reality, the final image would be mirrored, as opposed to what is shown here.
119
Chapter 7
References:
• D. G. Voelz ‘Computational Fourier Optics: A MATLAB Tutorial’
Videos:
• ‘Numerically simulating the propagation of coherent optical fields (Fourier optics)’: youtu.be/z-2tdkJ0Yzc
In Section 6.1 we saw two ways to propagate a coherent field: by plane wave decomposition (Angular Spectrum
method), or by summing impulse responses (Fresnel / Fraunhofer diffraction integrals). In either case, the propa-
gation formulas use Fourier transforms. Therefore, if we want to simulate the propagation of fields numerically, we
must first know how to numerically perform a Fourier transform. This will be discussed in Section (7.1).
In Eq. (6.2) we multiply the Fourier transform Û (νx , νy ) with a function eikz z that depends on the Fourier coordi-
x y
nates (νx , νy ), and in Eqs. (6.9) and (6.10) we evaluate the Fourier transform in the coordinates νx = λz , νy = λz .
Therefore, once we’ve calculated the numerical Fourier transform of an array, it’s important that we associate the
appropriate physical axes to it. This will be discussed in Section 7.2. We will then be able to numerically implement
Angular Spectrum propagation and Fresnel/Fraunhofer propagation, as is done in Sections 7.3 and 7.4. We will
conclude in Section 7.5 with some comments on why/when one should prefer one propagation method over the
other.
• fft2(.) performs a 2-dimensional Fast Fourier Transform (FFT), assuming the origin lies in the corner of
the array, and assuming periodic boundary conditions.
• fftshift(.) rearranges the array so that its center is moved to the corner, where the origin lies.
• Therefore, to perform the 2D Fourier transform with the origin in the center of the array, one has to apply
ifftshift(fft2(fftshift(.))) to the array.
• Since we’re going to very often perform 2D Fourier transforms with the origin in the center of the array, it may
be convenient to define a new function F(.)=ifftshift(fft2(fftshift(.))). However, there may be some
applications where the first fftshift(.) is irrelevant: this shift only affects the phase of the Fourier transform,
so in case we’re only interested in the intensity of the Fourier transformed field, this operation may be omitted
for the sake of computational efficiency. This may be especially useful in iterative algorithms that use lots of
Fourier transforms, but discard the phase of the Fourier transformed array. Such algorithms include phase
retrieval algorithms and Iterative Fourier Transform Algorithms (‘IFTAs’) for designing Diffractive Optical
Elements (‘DOEs’).
120
To verify that this procedure works, one can write a short MATLAB code that computes the Fourier transform of
a circular aperture, which we know should be an Airy disk. The output of this code is shown in Fig. 7.1.
To answer this question, we need to know what a Fast Fourier Transform computes numerically, and compare
it to the physical Fourier transform. For simplicity we will consider the 1-dimensional case. The physical Fourier
transform is Z
F{U (x)}(ν) = U (x)e−2πixν dx. (7.1)
121
If U (x) is defined only for N discrete values of x (as is the case when we define an array in MATLAB), then we
can approximate the integral as a sum
N
X −1
F{U (x)}(ν) ≈ U (xn )e−2πixn ν ∆x. (7.2)
n=0
Note that there are not restrictions yet on ν: it’s still a continuous variable, even though x is discretized.
Here, both n and k are discrete variables that run from 0 to N − 1. So both the input vector U [n] and the output
vector FFT{U [n]}[k] consist of N entries. Comparing Eqs. (7.2) and (7.3) gives
nk
xn νk = . (7.4)
N
If we define the array U [n] such that the nth pixel corresponds to the physical coordinate xn = n∆x, then plugging
this into Eq. (7.4) gives
k
νk = . (7.5)
N ∆x
Given that k runs from 0 to N − 1 (just like n), we find that the spatial frequency runs from νmin = 0 to
1 1 1
νmax = ∆x − N ∆x in steps of ∆ν = N ∆x . Therefore, if we define L = N ∆x as the range of the x-axis, we can write
the input and output coordinate vectors in MATLAB notation as
1 1 1
x = 0 : ∆x : L − ∆x and ν = 0 : : − . (7.6)
L ∆x L
Note that here, the first entry of the array corresponds to the origin of the coordinate system. However, we saw
in the previous section that we can use the function fftshift(.) to shift the origin to the center of the array.
Therefore, when using fftshift(.), the coordinate vectors are
L L 1 1 1 1
x=− : ∆x : − ∆x and ν = − : : − . (7.7)
2 2 2∆x L 2∆x L
Note the following relation between the coordinate vectors x in position space and ν in frequency space:
• The resolution ∆x in position space determines the field of view νmax in frequency space: a higher resolution
leads to a larger field of view.
• The field of view xmax in position space determines the resolution ∆ν in frequency space: a larger field of
view leads to a higher resolution.
This can be understood intuitively: the field of view in x determines the slowest oscillation (i.e. smallest ν) that
can be defined, while the resolution in x determines the fastest oscillation (i.e. largest ν that can be defined). This
can be verified by playing with the sampling N and field of view L in code 7.1.
122
4 rad=30e−6; % Radius of aperture (in meters)
5 lambda=500e−9; % Wavelength of light (in meters)
6 z=10e−6; %Propagation distance (in meters)
7
8 %% Coordinate vectors/grids
9 dx=L/N; % Length of one pixel
10 x=−L/2:dx:L/2−dx; % Coordinate vector (in meters)
11 fx=−1/(2*dx):1/L:1/(2*dx)−1/L; % Spatial frequency vector (in 1/meters)
12
13 [X,Y]=meshgrid(x,x); % Coordinate grids (real space)
14 [FX,FY]=meshgrid(fx,fx); % Coordinate grids (Fourier space)
15
16 %% Define aperture
17 R=sqrt(X.ˆ2+Y.ˆ2); % Radial coordinate
18 Aperture=double(R<rad); % Aperture
19
20 %% Calculate propagated field
21
22 F Aperture=ifftshift(fft2(fftshift(Aperture))); %Fourier transform (i.e. angular spectrum) of ...
aperture
23 k=2*pi/lambda; % Wavenumber
24
25 H=exp(1i*z*sqrt(kˆ2−(2*pi*FX).ˆ2−(2*pi*FY).ˆ2)); % Transfer function eˆ{i k z z}
26
27 propfield=iF(F(Aperture).*H); % Calculate the propagated field
28
29 % Plot the propagated field
30 imagesc(x*1e6,x*1e6,abs(propfield))
31 colormap('gray')
32 xlabel('x(\mum)'), ylabel('y(\mum)')
33 title(['z = ' num2str(z*1e6) '\mum'])
34 axis image
35 colorbar
36 caxis([0 1.5])
Figure 7.2: Outputs of code 7.2 for different propagation distances z. The wavelength is λ = 500nm.
123
9 dx=L/N; % Length of one pixel
10 x=−L/2:dx:L/2−dx; % Coordinate vector (in meters)
11 fx=−1/(2*dx):1/L:1/(2*dx)−1/L; % Spatial frequency vector (in 1/meters)
12
13 [X,Y]=meshgrid(x,x); % Coordinate grids (real space)
14 [FX,FY]=meshgrid(fx,fx); % Coordinate grids (Fourier space)
15
16 %% Define aperture
17 R=sqrt(X.ˆ2+Y.ˆ2); % Radial coordinate
18 Aperture=double(R<rad); % Aperture
19
20 %% Calculate propagated field
21
22 k=2*pi/lambda; % Wavenumber
23 Q=exp(1i*k*(X.ˆ2+Y.ˆ2)/(2*z)); % Quadratic phase factor
24
25 propfield=F(Aperture.*Q)/N; % Calculate the propagated field
26 propx=fx*lambda*z; % Coordinate vector in the final plane
27
28 % Plot the propagated field
29 imagesc(propx*1e6,propx*1e6,abs(propfield))
30 colormap('gray')
31 xlabel('x(\mum)')
32 ylabel('y(\mum)')
33 title(['z = ' num2str(z*1e6) '\mum'])
34 colorbar
35 axis image
Figure 7.3: Outputs of code 7.3 for different propagation distances z. The wavelength is λ = 500nm. Note that
the axes of the plots change with the propagation distance z, because the Fourier transform is evaluated in the
x y
coordinates λz , λz .
For the Angular Spectrum method, the phase will oscillate more rapidly (I use the word ‘oscillate’ instead of ‘vary’,
because the phase is defined modulo 2π) when z increases. For the Fresnel propagation method, the phase will
oscillate more rapidly when z decreases. Sampling problems occur when the phase varies so rapidly, that two
adjacent pixels cannot capture the oscillation. For example, if the phase changes by 2π from one pixel to the next,
then due to discrete sampling the phase appears constant, while in reality it changed by 2π. When such sampling
issues occur, the simulation result becomes unreliable. Therefore, the Angular Spectrum method is numerically
124
more reliable for short propagation distances, while for longer propagation distances the Fresnel propagation method
is more reliable. Another important difference between the two methods is that for the Fresnel propagation method,
the axes in the final plane depend on z, whereas for the Angular Spectrum method they do not.
125
Part III
126
Chapter 8
3D imaging
Videos:
• ‘How does a hologram work? (in 1 minute)’: youtu.be/mYIW6 cOwW0
• ‘3D imaging and lensless imaging: light field camera/display, holography, and phase retrieval’:
youtu.be/on9L1EAnWC4 Timestamps 0:00 - 20:04
I think that one of the most fascinating applications of optics is the creation of three-dimensional images. Even the
average layperson with no detailed knowledge of optical theory can marvel at the wonders of a hologram: how is
it possible that when you look through a flat, two-dimensional film, you can see a three-dimensional scene? When
you look through the hologram from different angles, it is as if you view the 3D scene from different directions.
Moreover, if the scene contains a magnifying glass, it appears fully operational, magnifying whatever object is
behind it, no matter from which direction you look through the magnifying glass!
So how do we achieve this sensation of seeing a three-dimensional scene that isn’t actually there? One can identify
two different kinds of relevant factors: biological and optical. The biological factors have to do with how the brain
interprets the signals that our eyes send them. If we understand what input the brain requires to see a three-
dimensional scene, then can we send those signals through our eyes, so that we experience the illusion of a 3D scene.
This is for example how 3D glasses work: we know that our brain uses parallax (i.e. each eye sees the scene from a
slightly different angle) to infer depth, so by showing each eye a different 2D image that was recorded at a slightly
different angle, we can induce the sensation of depth. Other techniques to create the illusion of depth may include
clever plays with lighting, shadows, and perspective.
However, holograms and certain types of 3D displays that are being developed don’t rely on deceiving the brain
with clever tricks. You don’t need to understand biology or neurology to understand how these devices operate:
you need to understand optics. You can see a three-dimensional scene through a hologram, because a hologram
re-creates the exact same light field that the 3D scene would emit. Therefore, if that light field reaches our eyes or
any other sort of detector, it would be impossible to distinguish it whether it came from the actual 3D scene or not.
This is why a hologram of a magnifying glass appears fully operational, and this is why we can record a video of a
hologram, viewing it from different angles, and still get the sensation that we’re recording a three-dimensional scene.
This brings us to the question: what does it mean to exactly re-create a light field, and how do we achieve
this? When considered the ray model of light in Chapter 2, we saw that we can describe light as rays with a certain
intensity traveling in certain directions. So to recreate the light field, we must reproduce the intensity and direction
of the light rays. In Chapter 3 we saw that a fully coherent light field can be described by a complex-valued function
with a certain amplitude and phase. So to recreate a fully coherent light field, we must reproduce its amplitude (i.e.
square root of intensity) and phase. Note that the phase of a field is closely related to the direction of propagation:
recall Eq. (3.22), which states that a plane wave propagating in the direction of k is described by the phase function
eik·r . That is, the local phase gradient of the field directly determines the local propagation direction of the field.
So to summarize
• To re-create a three-dimensional scene, we must reconstruct the full light field that it emits in some plane.
127
• The full light field consists of the light intensity/amplitude and the direction.
• Conventional detectors/displays only measure/reproduce the intensity of light, not the direction (or the
phase, in case of fully coherent light).
• For natural (incoherent) light, the direction of light is indicated by the direction of the rays.
• For coherent light, the direction of light is indicated by the phase of the complex-valued field.
In the following sections, we will explore how the recording and reproduction of the full light field is achieved for
natural (incoherent) light and coherent light.
An obvious drawback of this implementation is that a pinhole array blocks a lot of light. Moreover, the reso-
lution of the measured light field is limited by the distance between the pinholes, which would be rather large
compared to the distance between the pixels of the detector right behind the pinhole array. Therefore, instead of a
pinhole array, it is more practical to use an array of very small lenses, i.e. a microlens array (see Fig. 8.1). Putting
a detector in the back focal plane of the microlens array yields for each microlens information about the directions
of the light rays. When we know the positions and the directions of the light rays, we can computationally refocus
the image by tracing the rays to other planes. Therefore, with the light field camera, we can capture 3D information
of the recorded scene, and computationally refocus the recorded image.
128
Figure 8.1: By putting a detector in the back focal plane of a microlens array, we can measure per microlens in
which directions the detected rays were traveling.
So we now understand how a microlens array can record 3D information, by measuring the directions of the
rays. We can also use the microlens array in reverse: by putting an array of point sources in the front focal plane
of the microlens array, we can locally control the direction of light (by choosing which point source we switch on,
see Fig. 8.2) to generate a 3D scene. This technique can be used to create a 3D display.
129
Figure 8.2: With a microlens array, we can detect the directions of rays (light field camera), and we can steer the
directions of rays (light field display).
Videos:
• Using a microlens array to record 3D scenes: ‘Light-field Camera - Computerphile’: youtu.be/rEMP3XEgnws
• Using a microlens array to generate 3D scenes: ‘FOVI3D light-field displays with microlens arrays at Display
Week 2018’: youtu.be/GK4544D4PUo
Let’s capture this intensity pattern I(x, y) on a film, which we will call the hologram. Now, we can remove the 3D
scene, and illuminate the hologram with the same off-axis plane wave eikx x that we used to record it. Assuming
130
that the hologram can be considered a thin sample, the transmitted field is given by
Note that one of the terms of the transmitted field is U (x, y), which is exactly the complex-valued field that was
scattered by the 3D scene. Therefore, when we see this field, it is exactly as if we’re directly looking at the 3D scene.
There are also two other terms in the transmitted field, but these terms have factors of eikx x and e2ikx x , which
means these fields propagate in different directions and therefore miss our eyes if we view the hologram within the
appropriate range of angles.
We can also interpret intuitively how this hologram works, see Fig. 8.4. We stated at the beginning of this
chapter that to reproduce the full light field, we must reproduce the local direction of the light. The intensity
pattern given in Eq. (8.1) describes an interference pattern whose local period depends on the local phase gradient
of U (x, y). When we illuminate this interference pattern as in Eq. (8.2), it acts as a diffraction grating whose
period locally changes. We saw in Section 3.2.1 that the diffraction angles of a grating depends on its period, so
the hologram locally controls the direction of the diffracted light to reproduce the original light field.
131
Figure 8.4: Intuitive explanation of how an off-axis hologram works.
One limitation of this hologram is that it must be viewed with coherent light, whereas it would be much more
convenient if we could view it in natural light. One way to overcome this is by using a reflection hologram/volume
hologram/thick hologram/Bragg hologram. Such holograms are thicker, so they are better considered as a volume,
rather than a thin sheet in the xy-plane. The interference pattern that is recorded in this volume can filter out a
certain color of light (this effect is known as structural coloration, which e.g. gives a peacock’s feather its colors) so
that it becomes temporally coherent. This prevents color blur due to grating diffraction (which we saw in Section
3.2.2 could be used for spectrometry), which means the hologram can be viewed in natural light.
Another type of hologram which can be viewed in natural light is the rainbow hologram, see Fig. 8.5. In this
method, we first create the hologram of a light field in a horizontal slit (practically speaking, we only need to re-
produce the light field in a horizontal slit because our eyes are spaced horizontally apart). Then, we create another
hologram with a vertically tilted reference wave, such that when it is illuminated, it reproduces the field in the
horizontal slit. If we illuminate this hologram with white light, the color blur creates horizontal slits of light at
different (vertical ) heights. Since these slits don’t overlap, we don’t see color blur when we ‘look through’ one of
these slits.
132
Figure 8.5: Principle of rainbow holography.
Videos:
133
Chapter 9
References:
• PhD thesis: A.P. Konijnenberg, ‘Computational methods for phase retrieval: Non-iterative methods, Pty-
chography, and Diffractive Shearing Interferometry’ (2019)
Videos:
• ‘3D imaging and lensless imaging: light field camera/display, holography, and phase retrieval’:
youtu.be/on9L1EAnWC4 Timestamps 27:07 - 57:42
We saw in Chapter 8 that recording the direction of light rays in a certain plane allows one to capture three-
dimensional information of a scene. This allows for computational post-processing of an image: an image that is
recorded with a light field camera can be computationally refocused after it has been taken. When we consider
fully coherent imaging, the direction of light is characterized by the phase of the complex-valued light field. With
a hologram, we can capture and reproduce the phase of a field, thereby recreating a three-dimensional image of the
recorded scene. If, however, we don’t record the hologram with an analog film, but with a digital camera, then we
can perform computational post-processing, just like in the case of a light field camera. More specifically, once we
know the full field information (i.e. both the amplitude and phase of the complex-valued field function), then we
can apply all the propagation formulas that we discussed in Chapter 6. E.g. we can use the Angular Spectrum
propagation formula from Eq. (6.2) to computationally refocus a recorded image field, or we can use the imaging
formula from Eq. (6.33) to computationally correct for aberrations in the imaging system. Moreover, if we know
the amplitude and phase of the diffracted far field of a small sample (see Eq. (6.10)), then we can computationally
backpropagate the field to reconstruct the complex-valued transmission function of the sample (i.e. an ‘image’ of
the sample that also includes phase information): therefore, retrieving the phase of a diffracted far field enables
lensless imaging. This can be especially useful when we want to create images using wavelengths for which it’s
extremely difficult to create high-quality focusing optics. An example of such a wavelength regime is the Extreme
Ultraviolet (EUV) / Soft X-ray (SXR) regime.
However, instead of recording this intensity on a film, we record it with a digital camera, so that we can computa-
tionally post-process the measurement. Our aim is to computationally extract the complex-valued function U (x, y)
from this expression. To do this, we take the Fourier transform of the measured intensity I(x, y). We use the fact
134
that a multiplicative phase factor eikx x yields a shift in the Fourier transform:
Z
F{U (x)e ikx x
}(νx ) = U (x)eikx x e−2πixνx dx
Z
kx
= U (x)e−2πix(νx − 2π ) dx (9.2)
kx
= Û νx − .
2π
Moreover, the Fourier transform of a conjugated function yields a conjugated and mirrored Fourier transform (called
the twin image)
Z
F{U (x) }(νx ) = U (x)∗ e−2πixνx dx
∗
Z ∗
= U (x)e −2πix(−νx )
dx (9.3)
= Û (−νx )∗ .
Therefore, Fourier transforming the measured intensity I(x, y) of Eq. (9.1) yields
∗
kx kx
F{I(x, y)}(νx , νy ) = F 1 + |U (x, y)|2 (νx , νy ) + Û νx −
, νy + Û −νx − , νy . (9.4)
| {z } 2π 2π
Central band
| {z } | {z }
Side band (image) Side band (twin image)
If the shift kx (i.e. the tilt of the reference wave) is large enough compared to the width of sample’s spectrum
kx
Û (νx , νy ), then you can straightforwardly computationally isolate the side band Û νx − 2π , νy , shift it to the
origin to obtain Û (νx , νy ), and inverse Fourier transform it to reconstruct the complex-valued field U (x, y). This
complex-valued field can be used for e.g. computational refocusing and aberration correction. Or, the phase infor-
mation of U (x, y) can already be useful by itself to e.g. infer sample thickness.
In Code 9.2 below, we demonstrate such a holographic reconstruction. In Fig. 9.1, the function U (x, y) is plotted.
In Fig. 9.2, the measured intensity I(x, y) is shown, and its Fourier transform shows the presence of the central
band and two side bands. Isolating one side band results in the reconstruction shown in Fig. 9.3. Note that when
the side band is isolated, part of the spectrum of U (x, y) is clipped. Therefore, the reconstruction is not perfect.
However, in a realistic imaging system, the image field is already low-pass filtered due to the diffraction limit (see
e.g. Section 3.4 and Eq. (6.33)). Therefore, the side bands would be fully separated from the central band (if the
reference wave angle is sufficiently large), and no additional clipping of the spectrum will occur when isolating the
side band.
Listing 9.1: Definition of Fourier transform functions which have the origin in the center of the array
1 function [ Fx ] = F( x )
2 Fx=ifftshift(fft2(fftshift(x)));
3 end
135
6
7 % Define transmitted field (to be reconstructed from its far field intensity)
8 U=exp(1i*2*(0.7*X−Y>0.15))+exp(1i*((X.ˆ2+Y.ˆ2)<0.04));
9 U=U+exp(1i*(sqrt((X−0.05).ˆ2+(Y−0.05).ˆ2)<0.02))+exp(1i*(sqrt((X+0.05).ˆ2+(Y−0.05).ˆ2)<0.02));
10 U=(abs(X)<0.3).*(abs(Y)<0.3).*U;
11
12 %View object
13 subplot(1,2,1)
14 imagesc(abs(U))
15 title('Object amplitude')
16 axis off image
17 colorbar
18
19 subplot(1,2,2)
20 imagesc(angle(U))
21 title('Object phase')
22 axis off image
23 caxis([−pi pi])
24 colorbar
25
26 colormap('gray')
27
28 %% 2. Calculate intensity measurement
29
30 ReferenceWave=exp(1i*500*X);
31 I=abs(U+ReferenceWave).ˆ2;
32
33 subplot(1,2,1)
34 imagesc(I)
35 title('Measured intensity')
36 axis image
37 subplot(1,2,2)
38 imagesc(abs(F(I)).ˆ0.5) % raise to power 0.5 for better visibility
39 title('FFT of intensity (increased visibility)')
40 axis image
41
42 colormap('gray')
43
44 %% 3. Reconstruct object
45
46 % Fourier transform I: F(I)
47 % Isolate side band: .*(X<−0.5)
48 % Inverse Fourier transform: iF(.)
49 % Remove linear phase: .*ReferenceWave
50 U recon=iF(F(I).*(X<−0.5)).*ReferenceWave;
51
52 subplot(1,2,1)
53 imagesc(abs(U recon))
54 title('Reconstructed amplitude')
55 axis off image
56 colorbar
57
58 subplot(1,2,2)
59 imagesc(angle(U recon))
60 title('Reconstructed phase')
61 axis off image
62 colorbar
63
64 colormap('gray')
136
Figure 9.1: The complex-valued field U (x, y) that is to be reconstructed using off-axis holography.
Figure 9.2: The measured intensity (see Eq. (9.1)) and its Fourier transform, where the central band and the side
bands are visible.
137
Figure 9.3: Holographic reconstruction of U (x, y) as defined in Fig. 9.1, using Code 9.2.
Consider the following setup (see Fig. 9.4): we illuminate a small sample such that the transmitted field is U (x, y).
The measured far field intensity is according to Eq. (6.10) the squared modulus of the Fourier transform of U (x, y)
So we measure the (squared) amplitude of Û , but how do we find its phase? To find the phase, we need extra
information. Typically, this extra information comes in the form of a support constraint, which one can loosely
think of as the outline of the shape of the sample. Mathematically, the support constraint says that U (x, y) is 0
outside the support region R, and it can be non-zero inside the support. Thus, we can define the support function
s(x, y) as (
1 if (x, y) ∈ R,
s(x, y) = (9.6)
0 if (x, y) ∈
/ R.
So our phase retrieval problem can be formulated as follows: find U (x, y) that satisfies
• The intensity/modulus constraint: |Û (νx , νy )| = I(νx , νy ).
p
138
• First we give an intuitive guess for an iterative algorithm (called the Error Reduction (ER) algorithm) to
solve the problem. This algorithm is deeply flawed, but it serves as a good starting point for further analysis.
• We analyze our iterative algorithm from two different viewpoints: geometrically as a problem of finding the
intersection of two sets, and analytically as a cost function minimization problem.
• Finally, we present an algorithm (called the Hybrid Input-Output (HIO) algorithm) that is much less intuitive,
but significantly more effective than the ER algorithm.
Figure 9.4: Measurement setup for single shot phase retrieval: we measure the far field intensity of a sample with
a known support.
Û0 (νx , νy )
q
Û0 (νx , νy ) → Û0upd (νx , νy ) = I(νx , νy ). (9.8)
|Û0 (νx , νy )|
139
This gives an updated far-field Û0upd (νx , νy ). To find the corresponding updated estimate for U (x, y), we take the
inverse Fourier transform n o
U0upd (x, y) = F −1 Û0upd (νx , νy ) (x, y). (9.9)
So now we have an object estimate that satisfies the intensity constraint, but does it satisfy the support constraint?
In general, it does not (if it does, we would have solved the phase retrieval problem and we can quit). So what do
we do now? Just like with the intensity constraint, we can make the field satisfy the support constraint:
Now we have arrived at the same situation with which we started: we have some object guess that satisfies the
support constraint, but not necessarily the intensity constraint. Therefore, we can repeat the procedure to update
the guess. So we can formulate the following iteration for an iterative algorithm
1. Start with an object estimate Un (x, y) that satisfies the support constraint.
2. Fourier transform it to find the estimated far field Ûn (νx , νy ).
3. Make the estimated far field satisfy the intensity constraint
Ûn (νx , νy )
q
Ûnupd (νx , νy ) = I(νx , νy ). (9.11)
|Ûn (νx , νy )|
We can test this algorithm with Code 9.3. In Fig. 9.5 we see the complex-valued object that we’re trying to
reconstruct, and the support constraint that we use for the reconstruction. The reconstruction is shown in 9.6.
We see that the reconstruction somewhat moves towards the solution, but it doesn’t ultimately reach the quality
that we’d like it to achieve. So in the following sections, we’ll analyze this intuitive algorithm, and see if we can
understand its behavior.
Listing 9.3: MATLAB code to demonstrate the intuitive Error Reduction (ER) phase retrieval algorithm. Use
Codes 9.1 to define the Fourier transform functions F(.) and iF(.).
1 %% 1.Define object and support constraint
2
3 N=512; % Number of pixels (one side of the array)
4 x=linspace(−1,1,N);
5 [X,Y]=meshgrid(x,x);
6
7 % Support function
8 s=(abs(X)<0.3).*(abs(Y)<0.3);
9 s=s.*(X+Y<0.2); % Make support non−symmetric to eliminate twin−image solution; comment this line ...
and see what happens to the recostruction
10
11 % Define transmitted field (to be reconstructed from its far field intensity)
12 U=exp(1i*2*(0.7*X−Y>0.15))+exp(1i*((X.ˆ2+Y.ˆ2)<0.04));
13 U=U+exp(1i*(sqrt((X−0.05).ˆ2+(Y−0.05).ˆ2)<0.02))+exp(1i*(sqrt((X+0.05).ˆ2+(Y−0.05).ˆ2)<0.02));
14 U=s.*U;
15
16 %View object and support
17 subplot(1,3,1)
18 imagesc(abs(U))
19 title('Object amplitude')
20 axis off image
21 colorbar
22
140
23 subplot(1,3,2)
24 imagesc(angle(U))
25 title('Object phase')
26 axis off image
27 colorbar
28
29 subplot(1,3,3)
30 imagesc(s)
31 title('Support constraint')
32 axis off image
33 colorbar
34
35 colormap('gray')
36
37 %% 2. Define the intensity constraint
38 I=abs(F(U)).ˆ2;
39
40 %% 3. Run the phase retrieval algorithm
41
42 U est=ones(size(U)); % Define the initial guess
43
44 NumIt=200; % Define the number of iterations
45
46 for k=1:NumIt
47
48 FU est=F(U est);
49 FU est=sqrt(I).*exp(1i*angle(FU est));
50
51 U est upd=iF(FU est);
52
53 U est=s.* U est upd; %ER
54
55 subplot(1,2,1)
56 imagesc(abs(U est))
57 title('Reconstrcuted amplitude')
58 axis off image
59 colormap('gray')
60 colorbar
61
62 subplot(1,2,2)
63 imagesc(angle(U est))
64 title(['Reconstrcuted phase, iteration ' num2str(k)])
65 axis off image
66 colormap('gray')
67 colorbar
68
69 pause(0.1)
70 disp(k)
71
72 end
Figure 9.5: The complex-valued object U (x, y) (left and middle plots) that is to be reconstructed from its Fourier
amplitude and the support constraint s(x, y) (right plot).
141
Figure 9.6: Reconstruction of U (x, y) (see Fig. 9.5) using the intuitive ER algorithm as in Code 9.3. Note that the
reconstruction quality is lacking.
and we can define M as the set of all functions that satisfy the intensity/modulus constraint
q
M = f (x, y) : |fˆ(νx , νy )| = I(νx , νy ) . (9.15)
The solution U (x, y) to our phase retrieval problem, is then given by the intersection of these two sets
U (x, y) ∈ S ∩ M. (9.16)
We can interpret this problem graphically: if we visually represent each constraint set as a line, then we try to find
the point where these two lines intersect (see Fig. 9.7). If we start with an object estimate that satisfies the support
constraint, and enforce the intensity constraint as in Eqs. (9.8) and (9.9), then we can represent this graphically by
starting with a point on the S-line, and projecting it onto the M -line. When we then enforce the support constraint
as in Eq. (9.10), we project the point from the M -line back to the S-line. We can visualize how this sequence of
operations leads us closer and closer to the intersection of the two lines, which is the solution to our phase retrieval
problem. Because we alternatingly project the estimate onto the two constraint sets, this algorithm is also called
an alternating projections (AP) algorithm. If we denote the projection operators as PS and PM , the algorithm can
be summarized as
Un+1 = PS PM Un . (9.17)
But we saw in simulations that this algorithm typically doesn’t converge to the correct solution, but stagnates.
How do we explain this? The crucial point is the shape of the S- and M -lines. More specifically, it matters whether
they are convex or non-convex. A set is convex, when for any two points A and B in the set, any convex combination
λA + (1 − λ)B where λ ∈ [0, 1] (i.e. the straight line between A and B) lies in the set as well. Therefore, when we
draw S and M as straight lines, we imply that these sets are convex. But are they?
Let’s first consider the support constraint set. Let f (x, y) and g(x, y) be two functions that satisfy the support
constraint. Then any convex combination λf (x, y) + (1 − λ)g(x, y), where λ ∈ [0, 1], satisfies the support constraint
as well. Therefore, the set S is convex, so it is justified to graphically represent it as a straight line.
142
Now let’s consider the intensity constraint set. As an example, consider the intensity constraint |Û (νx , νy )| = 1.
Consider two functions fˆ(νx , νy ) = 1 and ĝ(νx , νy ) = −1, which both satisfy the intensity constraint. Note that the
convex combination 21 (fˆ(νx , νy ) + ĝ(νx , νy )) = 0 does not satisfy the intensity constraint. Therefore, the intensity
constraint set is in general not convex, and therefore we cannot graphically represent it as a straight line. Rather,
we should represent it as curved line, so that if you pick two points on the curve, the straight line connecting the
points does not overlap with the curve.
So if we draw the set S as a straight line, and the set M as curved line, and we project back and forth be-
tween the two sets, we see that the object estimate can stagnate at a point which is not the intersection of the two
sets. Therefore, the algorithm fails to find the correct solution.
Figure 9.7: Interpreting the Error Reduction algorithm in terms of projections onto sets.
Here, we use a discrete summation as opposed to a continuous Fourier integral, because the measurement is pixelized,
and when we run an algorithm on a computer, we use a discretized array p to define the fields. We want the amplitude
|Û (νx , νy )| of this estimated field to match the measured amplitude I(νx , νy ). Therefore, we can define the
143
following cost functional which we want to minimize
X q 2
L[Un (x, y)] = |Ûn (νx , νy )| − I(νx , νy ) . (9.19)
νx ,νy
The most straightforward algorithm to find the U (x, y) that minimizes this cost functional, is the gradient descent
∂L
algorithm. In this algorithm, we use the fact that the Wirtinger derivative ∂U (x,y) ∗ indicates the direction of
1
steepest ascent (i.e. it indicates in which direction you’d have to move U (x, y) to get the highest increase in L).
∂L
Therefore, by calculating ∂U (x,y)∗ and moving in the opposite direction with some step size µ, you iteratively find
new estimates Un (x, y) for which L[Un (x, y)] becomes progressively smaller
∂L
Un+1 (x, y) = Un (x, y) − µ . (9.20)
∂Un (x, y)∗
∂L
So let’s calculate ∂U (x,y)∗ :
∂L X q
∂|Ûn (νx , νy )|
∗
=2 |Ûn (νx , νy )| − I(νx , νy ) apply chain rule to Eq. (9.19)
∂U (x, y) νx ,νy
∂Un (x, y)∗
X Ûn (νx , νy ) ∂ Ûn (νx , νy )∗
q p
=2 |Ûn (νx , νy )| − I(νx , νy ) ∗
use |Û | = Û Û ∗ and apply chain rule
νx ,νy 2|Ûn (νx , νy )| ∂Un (x, y)
p !
X I(νx , νy ) ∂ Ûn (νx , νy )∗
= Ûn (νx , νy ) − Ûn (νx , νy ) cancel terms
νx ,νy |Ûn (νx , νy )| ∂Un (x, y)∗
p !
X I(νx , νy )
= Ûn (νx , νy ) − Ûn (νx , νy ) s(x, y)e2πi(xνx +yνy ) ∆x ∆y use Eq. (9.18)
νx ,νy | Û (ν
n x y , ν )|
( p )
−1 I(νx , νy )
= s(x, y)F Ûn (νx , νy ) − Ûn (νx , νy ) (x, y) use definition of inverse Fourier transform
|Ûn (νx , νy )|
( p )
−1 I(νx , νy )
= Un (x, y) − s(x, y)F Ûn (νx , νy ) (x, y) use U = F −1 {Û }, and Un (x, y)s(x, y) = Un (x, y)
|Ûn (νx , νy )|
(9.21)
Note that this is the same update that we defined in Section 9.2.1: we Fourier transform the object estimate Un , we
substitute its amplitude with the measured amplitude, inverse Fourier transform the result, and enforce the support
constraint.
So the intuitive algorithm of Section 9.2.1, which we saw in Section 9.2.2 is an Alternating Projections algorithm, we
now see is also a Gradient Descent algorithm. We saw that by interpreting the algorithm as alternatingly projecting
between two sets, we could explain the stagnation of the algorithm by noting that the intensity constraint set is
non-convex. How can we explain the stagnation of the algorithm in the context of cost function minimization? The
important observation to make is that the gradient descent algorithm finds locally the direction of steepest descent.
So if the cost functional has a local minimum, the algorithm can get trapped there, and fail to converge to the global
minimum (see Fig. 9.8), which is where the actual solution to the phase retrieval problem lies.
1 R. Hunger, An introduction to complex differentials and complex differentiability (Munich University of Technology, Inst. for Circuit
144
Figure 9.8: Interpreting the Error Reduction algorithm in terms of cost function minimization.
Un+1 (x, y) = Unupd (x, y) · s(x, y) + (Un (x, y) − βUnupd (x, y)) · (1 − s(x, y)), (9.24)
where β is some feedback parameter, which we typically choose to be around 0.9. I.e. instead of setting the estimate
outside the support constraint equal to 0, we set it equal to some feedback function Un (x, y) − βUnupd (x, y). This
step is unphysical, because we know that the field outside the support constraint should be 0, but mathematically it
helps the algorithm to avoid stagnation and converge to the correct solution. This can be verified with simulations,
by changing in Code 9.3 the line where the support constraint is enforced with Code 9.4. The reconstruction result
is shown in Fig. 9.9.
Listing 9.4: The line to be updated in Code 9.3 to run the HIO algorithm
1 % U est=s.* U est upd; %ER
2 beta=0.9; % Define the feedback parameter
3 U est=s.* U est upd+(1−s).*(U est−beta* U est upd); %HIO
2 Fienup, James R. ‘Reconstruction of an object from the modulus of its Fourier transform.’ Optics letters 3.1 (1978): 27-29.
145
Figure 9.9: Object reconstruction obtained with the HIO algorithm of Code 9.4. Note the significant improvement
compared to the ER reconstruction in Fig. 9.6.
The way I presented this algorithm may feel unsatisfying: no explanation was given on how one might come
up with it, nor have I explained why the algorithm works. The reason why I nonetheless consider this choice
of presentation appropriate, is because the inventor of this algorithm, J.R. Fienup, recalled the following about
inventing the algorithm around 19803 :
I also tried other variations, and tried mixing and matching different operations from different approaches
to handling the values where the output image either satisfies or violates the constraints. This was not
the beautiful mathematics of an Einstein that predicted what would happen long before an experiment
was performed; this was the trial and error approach that Edison used to invent a practical light bulb:
keep trying different things (guided by physics, mathematics, and intuition) until you find something that
works; and then refine that.
Therefore, it seems fair to say that there is no elegant way to ‘derive’ the algorithm. However, two decades after
the algorithm’s invention, Bauschke et al. came with the following interpretation of the algorithm4 : whereas the
intuitive ER algorithm uses projections onto the constraint sets, the more successful HIO algorithm uses reflections
around the constraints if we choose β = 1. To see that this is the case, first note how a reflection operator R can
be expressed in terms of a projection operator P. To reflect a point around a line, we can project the point onto
the line (apply P), take the difference between the projection and the original point (P − I, where I is the identity
operator), and then add that difference to the projected point
R = P + (P − I)
(9.25)
= 2P − I.
Now let’s write the HIO update of Eq. (9.24) in terms of the intensity/modulus projection operator PM and the
support projection operator PS for β = 1
Un+1 = PS PM Un + (I − PS )(Un − PM Un )
= PS PM Un + Un − PM Un − PS Un + PS PM Un
(2PS − I)(2PM − I) + I (9.26)
= Un
2
RS RM + I
= Un .
2
Therefore, for β = 1, the HIO iteration tells you to reflect the estimate around the constraint sets, and take the
average with the current estimate.
3 Fienup,James R. ”Phase retrieval algorithms: a personal tour.” Applied optics 52.1 (2013): 45-56.
4 Bauschke, H. H., Combettes, P. L., Luke, D. R. (2002). Phase retrieval, error reduction algorithm, and Fienup variants: a view
from convex optimization. JOSA A, 19(7), 1334-1345.
146
The insight that the successful HIO algorithm can be interpreted in terms of projections and reflection, led to
the development of other phase retrieval algorithms that also utilize projections and reflections in some way. Ex-
amples include the Relaxed Averaged Alternating Reflections (RAAR) algorithm, and the Difference Map (DM)
algorithm. However, when comparing all these different phase retrieval algorithms, S. Marchesini concluded5
Many algorithms surprisingly failed, and only the ones shown in Fig. 9 succeeded. HIO appears to be
the most effective algorithm [...]
• Global phase shift: if the field U (x, y) is multiplied by a global phase eiϕ , then the measured intensity will
not change. Therefore, U (x, y)eiϕ will always be a valid alternative solution.
• Object shift: if the field U (x, y) is shifted, then its Fourier transform will be multiplied with a linear phase
function (see e.g. Eq. (9.2)), so the measured intensity will not change. Therefore, if shifting U (x, y) does
not violate the support constraint, it will be an alternative solution.
• Twin image: if the far field Û (νx , νy ) is conjugated, the measured intensity will not change. We saw in
Eq. (9.3) that conjugating Û (νx , νy ) will yield in object space the twin image U (−x, −y)∗ . Therefore, if
U (−x, −y)∗ does not violate the support constraint (i.e. the enforced support s(x, y) is centro-symmetric),
then the twin image U (−x, −y)∗ will be an alternative solution.
Moreover, when evaluating the quality of the reconstructed field, keep in mind that the phase of a field is undefined
when its amplitude is 0. Therefore, it is to be expected that in regions where the field has smaller amplitude, the
phase image will appear more noisy.
9.3 Ptychography
The history of ptychography started as early as 1970 when the term was coined by Hegerl and Hoppe in the context
of crystallography6 . At the time, it was a method to reconstruct periodic samples (crystals) from their diffraction
patterns. Later, Rodenburg and Bates extended this method by introducing the Wigner Distribution Deconvolution
Method (WDDM), which can non-iteratively reconstruct non-periodic samples7 . However, this method required the
continuous scanning of an illumination spot (the ‘probe’) across the sample, and measuring for each probe position
the diffraction pattern. Therefore, the method required a long acquisition time, it required storing lots of measured
data, and computationally inverting the data to find the object reconstruction was therefore very computationally
expensive, even though it’s non-iterative. The insight that led to the modern conception of ptychography as an
iterative phase retrieval method, came when Rodenburg realized he could combine the benefits of iterative single-
shot phase retrieval method with the non-iterative WDDM8
Following a rather stimulating conference organized by Spence et al. (2001) on [the] solution of the
nonperiodic phase problem, I was encouraged to revisit the ptychographical data set, but with the goal
to try to incorporate into it all the advantages of the iterative technique. [. . . ] Given that it was
possible to achieve an iteratively reconstructed image from a single diffraction pattern, and given that it
is known that it is possible to solve the phase problem directly from two or more diffraction patterns via
ptychography, it seemed logical to suppose that a hybrid method existed that could exploit an optimized
combination of these methods.
5 Marchesini, S. (2007). Invited article: A unified evaluation of iterative projection algorithms for phase retrieval. Review of scientific
87-184.
147
The single-shot iterative phase retrieval algorithms such as HIO required only a single intensity measurement and
a known support constraint, but they were not very robust when noise was present. The WDDM was a more
robust and reliable phase retrieval method because it’s non-iterative, but it required the continuous scanning of an
illumination spot/probe. The iterative ptychographic algorithm which was proposed in 20049 was more robust than
single-shot phase retrieval, and it had the advantage compared to WDDM that the probe didn’t have to be scanned
continuously. The probe positions only had to be chosen such that there is significant (60-90%) overlap between
adjacent probes. Iterative ptychographic phase retrieval gained significantly in popularity when it was found out
how to simultaneously reconstruct the object as well as the illumination probe10 (prior to that the illumination
probe had to be known in advance).
The ptychographic phase retrieval algorithm has been analyzed in many different ways (e.g. in terms of cost
function minimization, projection and reflection operators, sequential vs. global updates), and it has been extended
in many different ways (so that it can deal with e.g. probe position errors, partial coherence, thick samples). An
entire PhD thesis would still not be sufficient to cover all these aspects completely11 . Therefore, in the following
sections we will only cover the basics to get you started: how to reconstruct the object when the probe is known,
and how to simultaneously reconstruct both the object and the probe.
We measure the intensity of the diffracted far-field, which according to Eq. (6.10) is given by the Fourier transform
of the transmitted near-field ψ
IX (k) = |ψ̂X (k)|2 . (9.28)
Here, k denotes the 2D Fourier coordinate vector. Now the question is: given a known probe P (x), and the set of
intensity measurements IX (k) for different probe positions X, how can we reconstruct O(x)?
9 Rodenburg, J. M., Faulkner, H. M. (2004). A phase retrieval algorithm for shifting illumination. Applied physics letters, 85(20),
4795-4797.
10 Thibault, P., Dierolf, M., Bunk, O., Menzel, A., Pfeiffer, F. (2009). Probe retrieval in ptychographic coherent diffractive imaging.
Interferometry’ (2019)
148
Figure 9.10: Ptychography setup: an illumination probe P (x) is shifted to different probe positions X across an
object O(x). We measure the far field intensities IX (k) of the transmitted field ψX (x)
We can use the same intuitive line of reasoning as presented in Section 9.2.1. We have some estimated object
Oest (x). With the known probe, we can calculate for the probe position X the estimated far field ψ̂X,est (k)
We
p can make it satisfy the intensity constraint by substituting the estimated amplitude with the measured amplitude
IX (k). Inverse Fourier transforming the result gives the updated estimated transmitted field
( )
upd −1 ψ̂X,est (k) p
ψX,est (x) = F IX (k) (x). (9.30)
|ψ̂X,est (k)|
We now need to use this updated estimated transmitted field to update the estimated object Oest (x). Intuitively,
because the transmitted field is given by the product of the object and probe, one might be tempted to ‘divide out’
the probe from the estimated transmitted field
1
upd
Oest (x) → Oest (x) + ψX,est (x) − ψX,est (x) not good. (9.31)
P (x − X)
1
However, there’s a problem with this. Because of the factor P (x−X) , the object estimate is updated more in positions
x where the probe is weaker. This is the opposite of what we would want: if a region of the object is only weakly
illuminated (i.e. P (x − X) is small), then the intensity measurement IX (k) which is used to update Oest (x) contains
little information about that region of O(x), so the update for that region is less reliable. Therefore, we’d want the
object estimate to be updated more in regions where the probe P (x − X) is stronger. Therefore, we instead apply
the following update
P (x − X)∗ upd
Oest (x) → Oest (x) + ψX,est (x) − ψX,est (x) good. (9.32)
max |P (x − X)|
x
Here, we take the complex conjugate of P (x − X) to divide the probe’s phase out of the estimated field’s estimate,
and we multiply with P (x − X)∗ to increase the update step size in regions where the probe is stronger. We divide
by the maximum amplitude of the probe to normalize the update step size, so that the algorithm doesn’t diverge.
This algorithm can be tested with the following code
149
Listing 9.5: MATLAB code to test the ptychographic reconstruction algorithm for object reconstruction, given a
known probe. Use Codes 9.1 to define the Fourier transform functions F(.) and iF(.)
1 N=256; % Number of pixels (one side of the array)
2
3 x=linspace(−1,1,N);
4 [X,Y]=meshgrid(x,x);
5 R=sqrt(X.ˆ2+Y.ˆ2);
6
7 % Define object
8 O=(abs(X)<0.3).*(abs(Y)<0.3).*(exp(1i*2*(0.7*X−Y>0.15))+exp(1i*((X.ˆ2+Y.ˆ2)<0.04)));
9 O=O+exp(1i*(sqrt((X−0.05).ˆ2+(Y−0.05).ˆ2)<0.02))+exp(1i*(sqrt((X+0.05).ˆ2+(Y−0.05).ˆ2)<0.02));
10 % Define probe phase
11 Diffuser=mod(0.1*(sin(150*R)+cos((Y*10+X*30).ˆ2−0.8*(X*75−0.2))+cos((Y*10−X*33).ˆ2−0.4*(X*50−0.2))),1);
12 P=exp(2*pi*1i*Diffuser).*(R≤0.1);
13
14 %% Calculate intensity patterns
15
16 % Define probe positions
17 xshiftvec=[−15 −8 0 7 16];
18 yshiftvec=[−16 −9 0 8 17];
19 [ypos,xpos]=meshgrid(yshiftvec,xshiftvec);
20
21 % Matrix to be filled with diffraction patterns
22 Imat=zeros(N,N,length(xshiftvec),length(yshiftvec));
23
24 for m=1:length(xshiftvec)
25 for n = 1:length(yshiftvec)
26 P shift=circshift(P,[ypos(m,n) xpos(m,n)]);
27 I=abs(F(P shift.*O)).ˆ2;
28 Imat(:,:,m,n)=I;
29 end
30 end
31
32 %% Run reconstruction algorithm
33
34 O est=ones(size(O)); % Initial object guess
35
36 for k=1:10
37
38 for m=1:length(xshiftvec)
39 for n=1:length(yshiftvec)
40
41 I=Imat(:,:,m,n);
42 P shift=circshift(P,[ypos(m,n) xpos(m,n)]);
43 psi est=P shift.* O est;
44 FO est=F(psi est);
45 psi est upd=iF(sqrt(I).*exp(1i*angle(FO est)));
46 O est=O est+conj(P shift).*(psi est upd−psi est)./max(abs(P shift(:)));
47
48 subplot(1,2,1)
49 imagesc(angle(O est))
50 axis off image
51 subplot(1,2,2)
52 imagesc((abs(O est)))
53 axis off image
54 colormap('gray')
55 title(['k=' num2str(k)])
56 pause(0.1)
57
58 end
59 end
60
61 disp(k)
62
63 end
150
Figure 9.11: The object and probe that are used to test the ptychographic reconstruction algorithms.
Figure 9.12: Ptychographically reconstructed object using Code 9.5, where the probe is assumed to be known.
Compare to the actual object in Fig. 9.11
151
9.3.2 Simultaneous object and probe reconstruction
We saw in the previous section how we can reconstruct the object O(x) if the probe P (x) is known. But what if
there is some uncertainty about the exact form of P (x)? Or what if we don’t know P (x) at all? In that case we’d
like to reconstruct O(x) and P (x) simultaneously. How do we achieve that?
One observation that can be made, is that mathematically there’s no fundamental difference between O(x) and
P (x): it doesn’t matter whether we keep O(x) stationary and move P (x) around or vice versa, because shifting
the entire transmitted field only yields a linear phase factor in the Fourier transform, which drops out when we
take the squared modulus (i.e. intensity). Because of this symmetry between O(x) and P (x), we can apply to both
O(x) and P (x) a similar update as in Eq. (9.32) (though one should keep in mind that if we shift P (x) by X, it
would be equivalent to shifting O(x) by −X). This means that we use an estimated probe to update the object
estimate, and we use the estimated object to update the probe estimate. This was the basic idea behind ePIE12
(extended Ptychographic Iterative Engine). This approach can work when we start with reasonably good probe and
object estimates. However, if the initial guesses are too far off, the algorithm will likely fail to converge. When we
saw in Section 9.2.1 that the intuitive, straightforward ER algorithm fails for single-shot phase retrieval, we instead
considered the less straightforward but more effective HIO algorithm as presented in Section 9.2.4. Similarly, we
will now consider a less straightforward ptychographic algorithm that is better able to simultaneously reconstruct
both the object and the probe, even when the initial guesses are poor.
Since the HIO algorithm was best interpreted in terms of projections onto (and reflections around) constraint
sets, it makes sense to start with defining the constraints sets for ptychography, and their projection operators. For
ptychography, we can define the following two constraints:
• Intensity constraint:
p the collection of transmitted fields ψX (x) must be such that their Fourier transforms
satisfy |ψ̂X (x)| = IX (x).
• Factorization constraint: the collection of transmitted fields ψX (x) must be such that they can be factorized
into ψX (x) = O(x)P (x − X).
So we can define the two projection operators13
• PM : If we have some estimated object Oest (x) and probe Pest (x) from which we calculate the set of estimated
transmitted fields ψX,est (x), then we apply thep intensity/modulus constraint by substituting the Fourier
amplitude |ψ̂X (k)| with the measured amplitude IX (k) as in Eq. (9.30). This results in the set of updated
upd
estimated transmitted fields ψX (x).
upd
• PF : If we have some updated estimated transmitted fields ψX
p
(x) obtained via Oest (x), Pest (x), and IX (k),
then we can update the estimated object by factorizing the transmitted fields as
P upd
ψ (x)Pest (x − Xk )∗
Oest (x) = kPXk
upd
2
, (9.33)
k |Pest (x − Xk )|
We may want to add a small constant in the denominator to prevent division by 0. We use these updated
probe and object estimates to calculate a new set of estimated transmitted fields.
We can rewrite the HIO update operator for β = 1 as (see Eq. (9.26), with RS → RM , and RM → RF )
I + PM (2PF − I) − PF (9.35)
The algorithm can be tested with the codes below (the main code is Code 9.6, and the other codes are functions it
calls). The reconstruction results are shown in Fig. 9.13.
12 Maiden, A. M., Rodenburg, J. M. (2009). An improved ptychographical phase retrieval algorithm for diffractive imaging. Ultrami-
152
1 function [ psi mat ] = Calc PsiMat(O,P,ypos,xpos)
2 % Uses the object, probe, and probe positions to calculate the set of
3 % transmitted fields psi, and put them in a matrix
4
5 [N,M]=size(ypos);
6 [x,y]=size(O);
7
8 psi mat=zeros(x,y,N,M);
9
10
11 for m=1:M
12 for n = 1:N
13 P shift=circshift(P,[ypos(m,n) xpos(m,n)]);
14 psi=P shift.*O;
15
16 psi mat(:,:,m,n)=psi;
17 end
18 end
19
20 end
153
22
23 % Factorize out probe P est upd from exit fields psi
24 Numerator=0;
25 Denominator=0;
26 for m=1:M
27 for n = 1:N
28
29 psi=psi mat(:,:,m,n);
30 O shift=circshift(O est upd,−[ypos(m,n) xpos(m,n)]);
31 psi shift=circshift(psi,−[ypos(m,n) xpos(m,n)]);
32 Numerator=Numerator+conj(O shift).*psi shift;
33 Denominator=Denominator+abs(O shift).ˆ2;
34
35 end
36 end
37 P est upd=Numerator./(Denominator+1e−6*max(Denominator(:)));
38
39 end
Listing 9.6: MATLAB code to test the ptychographic reconstruction algorithm for simultaneous object and probe
reconstruction. Use Codes 9.1 to define the Fourier transform functions F(.) and iF(.)
1 %% Define object and probe
2
3 N=256;
4
5 x=linspace(−1,1,N);
6 [X,Y]=meshgrid(x,x);
7 R=sqrt(X.ˆ2+Y.ˆ2);
8
9 % Define object
10 O=(abs(X)<0.3).*(abs(Y)<0.3).*(exp(1i*2*(0.7*X−Y>0.15))+exp(1i*((X.ˆ2+Y.ˆ2)<0.04)));
11 O=O+exp(1i*(sqrt((X−0.05).ˆ2+(Y−0.05).ˆ2)<0.02))+exp(1i*(sqrt((X+0.05).ˆ2+(Y−0.05).ˆ2)<0.02));
12 % Define probe phase
13 Diffuser=mod(0.1*(sin(150*R)+cos((Y*10+X*30).ˆ2−0.8*(X*75−0.2))+cos((Y*10−X*33).ˆ2−0.4*(X*50−0.2))),1);
14 P=exp(2*pi*1i*Diffuser).*(R≤0.1);
15
16 %% Calculate intensity patterns
17
18 % Define probe positions
19 xshiftvec=[−15 −8 0 11 16];
20 yshiftvec=[−16 −10 0 8 17];
21 [ypos,xpos]=meshgrid(yshiftvec,xshiftvec);
22
23 % Matrix to be filled with diffraction patterns
24 Imat=zeros(N,N,length(xshiftvec),length(yshiftvec));
25
26 for m=1:length(xshiftvec)
27 for n = 1:length(yshiftvec)
28 P shift=circshift(P,[ypos(m,n) xpos(m,n)]);
29 I=abs(F(P shift.*O)).ˆ2;
30 Imat(:,:,m,n)=I;
31 end
32 end
33
34
35 %% Reconstruction algorithm
36
37 O est=ones(size(O));
38 P est=ones(size(O));
39 psi mat=Calc PsiMat(O est,P est,ypos,xpos);
40
41 for k=1:500
42
43 % 1. Calculate Proj fac(psi)
44 psi mat O=Calc PsiMat(O est,P est,ypos,xpos);
45
46 % 2. Calculate psi+Proj int(2*Proj fac(psi)−psi)−Proj fac(psi)
154
47 psi mat=psi mat+...
48 Proj Intensity(2* psi mat O−psi mat,Imat)−...
49 psi mat O;
50
51 % 3. Calculate updated probe and object estimate
52 [O est,P est]=FactorizeOAndP(psi mat,P est,ypos,xpos);
53
54 subplot(2,2,1)
55 imagesc(abs(O est))
56 axis off image
57 title(['Object amplitude, Iteration ' num2str(k)])
58 subplot(2,2,2)
59 imagesc(angle(O est))
60 axis off image
61 title('Object phase')
62 subplot(2,2,3)
63 imagesc(abs(P est))
64 axis off image
65 title('Probe amplitude')
66 subplot(2,2,4)
67 imagesc(angle(P est))
68 axis off image
69 title('Probe phase')
70 colormap('gray')
71
72 pause(0.1)
73
74 disp(['Iteration' num2str(k)])
75
76 end
155
Figure 9.13: Ptychographically reconstructed object and probe using Code 9.6. Note that the algorithm converged
to an alternative solution where the probe is identified as the object and vice versa. As a consequence, we found
the twin images of the object and probe (compare to the actual object and probe in Fig. 9.11).
156