PDC BixioRimoldi
PDC BixioRimoldi
PDC BixioRimoldi
www.cambridge.org
Information on this title: www.cambridge.org/9781107116450
© Cambridge University Press 2016
This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2016
Printed in the United Kingdom by TJ International Ltd. Padstow Cornwall
A catalog record for this publication is available from the British Library
Library of Congress Cataloging in Publication data
Rimoldi, Bixio.
Principles of digital communication : a top-down approach / Bixio Rimoldi,
School of Computer and Communication Sciences, Ecole Polytechnique
Fédérale de Lausanne (EPFL), Switzerland.
pages cm
Includes bibliographical references and index.
ISBN 978-1-107-11645-0 (Hardback : alk. paper)
1. Digital communications. 2. Computer networks. I. Title.
TK5103.7.R56 2015
621.382–dc23 2015015425
ISBN 978-1-107-11645-0 Hardback
Additional resources for this publication at www.cambridge.org/rimoldi
Cambridge University Press has no responsibility for the persistence or accuracy of
URLs for external or third-party internet websites referred to in this publication,
and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.
This book is dedicated to my parents,
for their boundless support and trust,
and to the late Professor James L. Massey,
whose knowledge, wisdom, and generosity
have deeply touched generations of students.
Contents
Preface page xi
Acknowledgments xviii
List of symbols xx
List of abbreviations xxii
vii
viii Contents
Bibliography 284
Index 286
Preface
1
We have six periods of 45 minutes per week, part of which we have devoted to exercises,
for a total of 14 weeks.
xi
xii Preface
In terms of style, I have paid due attention to proofs. The value of a rigorous
proof goes beyond the scientific need of proving that a statement is indeed true.
From a proof we can gain much insight. Once we see the proof of a theorem, we
should be able to tell why the conditions (if any) imposed in the statement are
necessary and what can happen if they are violated. Proofs are also important
because the statements we find in theorems and the like are often not in the exact
form needed for a particular application. Therefore, we might have to adapt the
statement and the proof as needed.
An instructor should not miss the opportunity to share useful tricks. One of
my favorites is the trick I learned from Professor Donald Snyder (Washington
University) on how to label the Fourier transform of a rectangle. (Most students
remember that the Fourier transform of a rectangle is a sinc but tend to forget
how to determine its height and width. See Appendix 5.10.)
The remainder of this preface is about the text organization. We follow a top-
down approach, but a more precise name for the approach is top-down-reversed
with successive refinements. It is top-down in the sense of Figure 1.7 of Chapter
1, which gives a system-level view of the focus of this book. (It is also top-down
in the sense of the OSI model depicted in Figure 1.1.) It is reversed in the sense
that the receiver is treated before the transmitter. The logic behind this reversed
order is that we can make sensible choices about the transmitter only once we are
able to appreciate their impact on the receiver performance (error probability,
implementation costs, algorithmic complexity). Once we have proved that the
receiver and the transmitter decompose into blocks of well-defined tasks (Chapters
2 and 3), we refine our design, changing the focus from “what to do” to “how to
do it effectively” (Chapters 5 and 6). In Chapter 7, we refine the design of the
second layer to take into account the specificity of passband communication. As a
result, the second layer splits into the second and the third layer of Figure 1.7.
In Chapter 2 we acquaint ourselves with the receiver-design problem for channels
that have a discrete output alphabet. In doing so, we hide all but the most
essential aspect of a channel, specifically that the input and the output are related
stochastically. Starting this way takes us very quickly to the heart of digital
communication, namely the decision rule implemented by a decoder that mini-
mizes the error probability. The decision problem is an excellent place to begin
as the problem is new to students, it has a clean-cut formulation in terms of
minimizing an objective function (the error probability), the derivations rely only
on basic probability theory, the solution is elegant and intuitive (the maximum
a posteriori probability decision rule), and the topic is at the heart of digital
communication. After a general start, the receiver design is specialized for the
discrete-time AWGN (additive white Gaussian noise) channel that plays a key
role in subsequent chapters. In Chapter 2, we also learn how to determine (or
upper bound) the probability of error and we develop the notion of a sufficient
statistic, needed in the following chapter. The appendices provide a review of
relevant background material on matrices, on how to obtain the probability density
function of a variable defined in terms of another, on Gaussian random vectors,
and on inner product spaces. The chapter contains a large collection of exercises.
In Chapter 3 we make an important transition concerning the channel used to
communicate, specifically from the rather abstract discrete-time channel to the
Preface xiii
more realistic continuous-time AWGN channel. The objective remains the same,
i.e. develop the receiver structure that minimizes the error probability. The theory
of inner product spaces, as well as the notion of sufficient statistic developed in
the previous chapter, give us the tools needed to make the transition elegantly and
swiftly. We discover that the decomposition of the transmitter and the receiver, as
done in the top two layers of Figure 1.7, is general and natural for the continuous-
time AWGN channel. This constitutes the end of the first pass over the top two
layers of Figure 1.7.
Up until Chapter 4, we assume that the transmitter has been given to us.
In Chapter 4, we prepare the ground for the signal-design. We introduce the
design parameters that we care about, namely transmission rate, delay, bandwidth,
average transmitted energy, and error probability, and we discuss how they relate
to one another. We introduce the notion of isometry in order to change the
signal constellation without affecting the error probability. It can be applied to
the encoder to minimize the average energy without affecting the other system
parameters such as transmission rate, delay, bandwidth, error probability; alterna-
tively, it can be applied to the waveform former to vary the signal’s time/frequency
features. The chapter ends with three case studies for developing intuition. In each
case, we fix a signaling family, parameterized by the number of bits conveyed by
a signal, and we determine the probability of error as the number of bits grows to
infinity. For one family, the dimensionality of the signal space stays fixed, and the
conclusion is that the error probability grows to 1 as the number of bits increases.
For another family, we let the signal space dimensionality grow exponentially and,
in so doing, we can make the error probability become exponentially small. Both
of these cases are instructive but have drawbacks that make them unworkable
solutions as the number of bits becomes large. The reasonable choice seems to
be the “middle-ground” solution that consists in letting the dimensionality grow
linearly with the number of bits. We demonstrate this approach by means of what
is commonly called pulse amplitude modulation (PAM). We prefer, however, to
call it symbol-by-symbol on a pulse train because PAM does not convey the idea
that the pulse is used more than once and people tend to associate PAM to a
certain family of symbol alphabets. We find symbol-by-symbol on a pulse train
to be more descriptive and more general. It encompasses, for instance, phase-shift
keying (PSK) and quadrature amplitude modulation (QAM).
Chapter 5 discusses how to choose the orthonormal basis that characterizes the
waveform former (Figure 1.7). We discover the Nyquist criterion as a means to
construct an orthonormal basis that consists of the T -spaced time translates of a
single pulse, where T is the symbol interval. Hence we refine the n-tuple former that
can be implemented with a single matched filter. In this chapter we also learn how
to do symbol synchronization (to know when to sample the matched filter output)
and introduce the eye diagram (to appreciate the importance of a correct symbol
synchronization). Because of its connection to the Nyquist criterion, we also derive
the expression for the power spectral density of the communication signal.
In Chapter 6, we design the encoder and refine the decoder. The goal is to expose
the reader to a widely used way of encoding and decoding. Because there are
several coding techniques – numerous enough to justify a graduate-level course –
we approach the subject by means of a case study based on convolutional coding.
xiv Preface
The minimum error probability decoder incorporates the Viterbi algorithm. The
content of this chapter was selected as an introduction to coding and to introduce
the reader to elegant and powerful tools, such as the previously mentioned Viterbi
algorithm and the tools to assess the resulting bit-error probability, notably detour
flow graphs and generating functions.
The material in Chapter 6 could be covered after Chapter 2, but there are some
drawbacks in doing so. First, it unduly delays the transition from the discrete-time
channel model of Chapter 2 to the more realistic continuous-time channel model
of Chapter 3. Second, it makes more sense to organize the teaching into a first
pass where we discover what to do (Chapters 2 and 3), and a refinement where
we focus on how to do it effectively (Chapters 5, 6, and 7). Finally, at the end of
Chapter 2, it is harder to motivate the students to invest time and energy into
coding for the discrete-time AWGN channel, because there is no evidence yet that
the channel plays a key role in practical systems. Such evidence is provided in
Chapter 3. Chapters 5 and 6 could be done in the reverse order, but the chosen
order is preferable for continuity reasons with respect to Chapter 4.
The final chapter, Chapter 7, is where the third layer emerges as a refinement
of the second layer to facilitate passband communication.
The following diagram summarizes the main thread throughout the text.
minimizes the error probability
We derive the receiver that
Each chapter contains one or more appendices, with either background or com-
plementary material.
I should mention that I have made an important concession to mathematical
rigor. This text is written for people with the mathematical background of an
engineer. To be mathematically rigorous, the integrals that come up in dealing
with Fourier analysis should be interpreted in the Lebesgue sense.2 In most under-
graduate curricula, engineers are not taught Lebesgue integration theory. Hence
some compromise has to be made, and here is one that I find very satisfactory. In
Appendix 5.9, I introduce the difference between the Riemann and the Lebesgue
integrals in an informal way. I also introduce the space of L2 functions and the
notion of L2 equivalence. The ideas are natural and can be understood with-
out technical details. This gives us the language needed to rigorously state the
sampling theorem and Nyquist criterion, and the insight to understand why the
technical conditions that appear in those statements are necessary. The appendix
also reminds us that two signals that have the same Fourier transform are L2
equivalent but not necessarily point-wise equal. Because we introduce the Lebesgue
integral in an informal way, we are not in the position to prove, say, that we
can swap an integral and an infinite sum. In some way, having a good reason
for skipping such details is a blessing, because dealing with all technicalities can
quickly become a major distraction. These technicalities are important at some
level and unimportant at another level. They are important for ensuring that the
theory is consistent and a serious graduate-level student should be exposed to
them. However, I am not aware of a single case where they make a difference in
dealing with finite-support functions that are continuous and have finite-energy,
especially with the kind of signals we encounter in engineering. Details pertaining
to integration theory that are skipped in this text can be found in Gallager’s book
[2], which contains an excellent summary of integration theory for communication
engineers. Lapidoth [3] contains many details that are not found elsewhere. It is
an invaluable text for scholars in the field of digital communication.
The last part of this preface is addressed to instructors. Instructors might
consider taking a bottom-up approach with respect to Figure 1.7. Specifically, one
could start with the passband AWGN channel model and, as the first step in the
development, reduce it to the baseband model by means of the up/down converter.
In this case the natural second step is to reduce the baseband channel to the
discrete-time channel and only then address the communication problem across the
discrete-time channel. I find such an approach to be pedagogically less appealing
as it puts the communication problem last rather than first. As formulated by
Claude Shannon, the father of modern digital communication, “The fundamental
problem of communication is that of reproducing at one point either exactly or
approximately a message selected at another point”. This is indeed the problem
that we address in Chapter 2. Furthermore, randomness is the most important
aspect of a channel. Without randomness, there is no communication problem.
The channels considered in Chapter 2 are good examples to start with, because
they model randomness without additional distractions. However, the choice of
2
The same can be said for the integrals involving the noise, but our approach avoids
such integrals. See Section 3.2.
xvi Preface
3
. . . or when we watch a video. But a book can be more useful as a reference, because it
is easier to find what you are looking for in a book than on a video, and a book can be
annotated (personalized) more easily.
4
http://nb.mit.edu.
Preface xvii
which I post the reading assignments (essentially all sections). When students
have a question, they go to the site, highlight the relevant part, and type a
question in a pop-up window. The questions are summarized on a list that can be
sorted according to various criteria. Students can “vote” on a question to increase
its importance. Most questions are answered by students, and as an incentive
to interact on Nota Bene, I give a small bonus for posting pertinent questions
and/or for providing reasonable answers.5 The teaching assistants (TAs) and
myself monitor the site and we intervene as needed. Before I go to class, I take a
look at the questions, ordered by importance; then in class I “fill the gaps” as I
see fit.
Most of the class time is spent doing exercises. I encourage the students to help
each other by working in groups. The TAs and myself are there to help. This way,
I see who can do what and where the difficulties lie. Assessing the progress this
way is more reliable than by grading exercises done at home. (We do not grade
the exercises, but we do hand out solutions.) During an exercise session, I often
go to the board to clarify, to help, or to complement, as necessary.
In terms of my own satisfaction, I find it more interesting to interact with the
students in this way, rather than to give ex-cathedra lectures that change little from
year to year. The vast majority of the students also prefer the flipped classroom:
They say so and I can tell that it is the case from their involvement. The exercises
are meant to be completed during the class time,6 so that at home the students
can focus on the reading. By the end of the semester7 we have covered almost all
sections of the book. (Appendices are left to the student’s discretion.) Before a new
reading assignment, I motivate the students to read by telling them why the topic
is important and how it fits into the big picture. If there is something unusual,
e.g. a particularly technical passage, I tell them what to expect and/or I give a
few hints. Another advantage of the flipped classroom is never falling behind the
schedule. At the beginning of the semester, I know which sections will be assigned
which week, and prepare the exercises accordingly. After the midterm, I assign a
MATLAB project to be completed in groups of two and to be presented during the
last day of class. The students like this very much.8
5
A pertinent intervention is worth half a percent of the total number of points that can
be acquired over the semester and, for each student, I count at most one intervention
per week. This limits the maximum amount of bonus points to 7% of the total.
6
Six periods of 45 minutes at EPFL.
7
Fourteen weeks at EPFL.
8
The idea of a project was introduced with great success by my colleague, Rüdiger
Urbanke, who taught the course during my sabbatical.
Acknowledgments
This book is the result of a slow process, which began around the year 2000, of
seemingly endless revisions of my notes written for Principles of Digital Commu-
nication – a sixth-semester course that I have taught frequently at EPFL. I would
like to acknowledge that the notes written by Professors Robert Gallager and
Amos Lapidoth, for their MIT course Introduction to Digital Communication, as
well as the notes by Professor James Massey, for his ETHZ course Applied Digital
Information Theory, were of great help to me in writing the first set of notes that
evolved into this text. Equally helpful were the notes written by EPFL Professor
Emre Telatar, on matrices and on complex random variables; they became the
core of some appendices on background and on complementary material.
A big thanks goes to the PhD students who helped me develop new exercises
and write solutions. This includes Mani Bastani Parizi, Sibi Bhaskaran, László
Czap, Prasenjit Dey, Vasudevan Dinkar, Jérémie Ezri, Vojislav Gajic, Michael
Gastpar, Saeid Haghighatshoar, Hamed Hassani, Mahdi Jafari Siavoshani, Javad
Ebrahimi, Satish Korada, Shrinivas Kudekar, Stéphane Musy, Christine Neuberg,
Ayfer Özgür, Etienne Perron, Rajesh Manakkal, and Philippe Roud. Some exer-
cises were created from scratch and some were inspired from other textbooks. Most
of them evolved over the years and, at this point, it would be impossible to give
proper credit to all those involved. The first round of teaching Principles of Digital
Communication required creating a number of exercises from scratch. I was very
fortunate to have Michael Gastpar (PhD student at the time and now an EPFL
colleague) as my first teaching assistant. He did a fabulous job in creating many
exercises and solutions.
I would like to thank my EPFL students for their valuable feedback. Pre-final
drafts of this text were used at Stanford University and at UCLA, by Professors
Ayfer Özgür and Suhas Diggavi, respectively. Professor Rüdiger Urbanke used
them at EPFL during two of my sabbatical leaves. I am grateful to them for their
feedback and for sharing with me their students’ comments.
I am grateful to the following collaborators who have read part of the
manuscript and whose feedback has been very valuable: Emmanuel Abbe, Albert
Abilov, Nicolae Chiurtu, Michael Gastpar, Matthias Grossglauser, Paolo Ienne,
Alberto Jimenez-Pacheco, Olivier Lévêque, Nicolas Macris, Stefano Rosati, Anja
Skrivervik, and Adrian Tarniceriu.
xviii
Acknowledgments xix
I am particularly indebted to the following people for having read the whole
manuscript and for giving me a long list of suggestions, while noting the typos and
mistakes: Emre Telatar, Urs Niesen, Saeid Haghighatshoar, and Sepand Kashani-
Akhavan.
Warm thanks go to Françoise Behn who learned LATEX to type the first version
of the notes, to Holly Cogliati-Bauereis for her infinite patience in correcting my
English, to Emre Telatar for helping with LATEX-related problems, and to Karol
Kruzelecki and Damir Laurenzi for helping with computer issues.
Finally, I would like to acknowledge many interesting discussions with various
colleagues, in particular those with Emmanuel Abbe, Michael Gastpar, Amos
Lapidoth, Upamanyu Madhow, Emre Telatar, and Rüdiger Urbanke. I would also
like to thank Rüdiger Urbanke for continuously encouraging me to publish my
notes. Without his insistence and his jokes about my perpetual revisions, I might
still be working on them.
List of symbols
A, B, . . . Sets.
N Set of natural numbers: {1, 2, 3, . . . }.
Z Set of integers: {. . . , −2, −1, 0, 1, 2, . . . }.
R Set of real numbers.
C Set of complex numbers.
H := {0, . . . , m − 1} Message set.
C := {c0 , . . . , cm−1 } Codebook (set of codewords).
W := {w0 (t), . . . , wm−1 (t)} Set of waveform signals.
V Vector space or inner product space.
u:A→B Function u with domain A and range B.
H∈H Random message (hypothesis) taking value in H.
N (t) Noise.
NE (t) Baseband-equivalent noise.
R(t) Received (random) signal.
Y = (Y1 , . . . , Yn ) Random n-tuple observed by the decoder.
√
j −1.
{} Set of objects.
AT Transpose of the matrix A. It may be applied to an
n-tuple a.
A† Hermitian transpose of the matrix A. It may be
applied to an n-tuple a.
E [X] Expected value of X.
a, b Inner product between a and b (in that order).
a Norm of the vector a.
|a| Absolute value of a.
a := b a is defined as b.
1{S} Indicator function. Its value is 1 if the statement S
is true and 0 otherwise.
1A (x) Same as 1{x ∈ A}.
xx
List of symbols xxi
E Average energy.
KN (t + τ, t), KN (τ ) Autocovariance of N (t).
Used to denote the end of theorems, definitions,
examples, proofs, etc.
{·} Real part of the enclosed quantity.
{·} Imaginary part of the enclosed quantity.
∠ Phase of the complex-valued number that follows.
List of abbreviations
AM amplitude modulation.
bps bits per second.
BSS binary symmetric source.
DSB-SC double-sideband modulation with suppressed carrier.
iid independent and identically distributed.
l. i. m. limit in L2 norm.
LNA low-noise amplifier.
MAP maximum a posteriori.
Mbps megabits per second.
ML maximum likelihood.
MMSE minimum mean square error.
PAM pulse amplitude modulation.
pdf probability density function.
pmf probability mass function.
PPM pulse position modulation.
PSK phase-shift keying.
QAM quadrature amplitude modulation.
SSB single-sideband modulation.
WSS wide-sense stationary.
xxii
1 Introduction and objectives
1
2 1. Introduction and objectives
Sending Receiving
Process Process
Presentation PH Presentation
Session SH Session
(segment)
Transport TH Transport
(packet)
Network NH Network
(frame)
Data Link DH DT Data Link
(bits)
Physical DH NH TH SH PH AH Data DT Physical
Physical Medium
Typically, there is no direct physical link between the two application layers.
Instead, the communication between application layers goes through a shared
network, which creates a number of challenges. To begin with, there is no guarantee
of privacy for anything that goes through a shared network. Furthermore, networks
carry data from many users and can get congested. Hence, if possible, the data
should be compressed to reduce the traffic. Finally, there is no guarantee that the
sending and the receiving computers represent letters the same way. Hence, the
application header and the data need to be communicated by using a universal
language. The presentation layer handles the encryption, the compression, and
the translation to/from a universal language. The presentation layer also needs a
protocol to talk to the peer presentation layer at the destination. The protocol is
implemented by means of the presentation header (PH).
For the presentation layers to talk to each other, we need to make sure that
the two hosting computers are connected. Establishing, maintaining, and ending
communication between physical devices is the job of the session layer . The session
layer also manages access rights. Like the other layers, the session layer uses a
protocol to interact with the peer session layer. The protocol is implemented by
means of the session header (SH).
The layers we have discussed so far would suffice if all the machines of interest
were connected by a direct and reliable link. In reality, links are not always reliable.
Making sure that from an end-to-end point of view the link appears reliable
is one of the tasks of the transport layer . By means of parity check bits, the
transport layer verifies that the communication is error-free and if not, it requests
retransmission. The transport layer has a number of other functions, not all of
which are necessarily required in any given network. The transport layer can break
long sequences into shorter ones or it can multiplex several sessions between the
same two machines into a single one. It also provides flow control by queueing up
data if the network is congested or if the receiving end cannot absorb it sufficiently
fast. The transport layer uses the transport header (TH) to communicate with the
peer layer. The transport header followed by the data handed down by the session
layer is called a segment.
Now assume that there are intermediate nodes between the peer processes of
the transport layer. In this case, the network layer provides the routing service.
Unlike the above layers, which operate on an end-to-end basis, the network layer
and the layers below have a process also at intermediate nodes. The protocol
of the network layer is implemented in the network header (NH). The network
header contains, among other things, the source and the destination address.
The network header followed by the segment (of the transport layer) is called a
packet.
The next layer is the data link (DL) layer. Unlike the other layers, the DL puts
a header at the beginning and a trailer at the end of each packet handed down
by the network layer. The result is called a frame. Some of the overhead bits are
parity-check bits meant to determine if errors have occurred in the link between
nodes. If the DL detects errors, it might ask to retransmit or drop the frame
altogether. If it drops the frame, it is up to the transport layer, which operates on
an end-to-end basis, to request retransmission.
4 1. Introduction and objectives
The physical layer – the subject of this text – is the bottom layer of the OSI
stack. The physical layer creates a more-or-less reliable “bit pipe” out of the
physical channel between two nodes. It does so by means of a transmitter/receiver
pair, called modem,1 on each side of the physical channel. We will learn that the
physical-layer designer can trade reliability for complexity and delay.
In summary, the OSI model has the following characteristics. Although the
actual data transmission is vertical, each layer is programmed as if the transmission
were horizontal. For a process, whatever is not part of its own header is considered
as being actual data. In particular, a process makes no distinction between the
headers of the higher layers and the actual data segment. For instance, the pre-
sentation layer translates, compresses, and encrypts whatever it receives from the
application layer, attaches the PH, and sends the result to its peer presentation
layer. The peer in turn reads and removes the PH and decrypts, decompresses,
and translates the data which is then passed to the application layer. What the
application layer receives is identical to what the peer application layer has sent
up to a possible language translation. The DL inserts a trailer in addition to
a header. All layers, except the transport and the DL layer, assume that the
communication to the peer layer is error-free. If it can, the DL layer provides
reliability between successive nodes. Even if the reliability between successive
nodes is guaranteed, nodes might drop packets due to queueing overflow. The
transport layer, which operates at the end-to-end level, detects missing segments
and requests retransmission.
It should be clear that a layering approach drastically simplifies the tasks of
designing and deploying communication infrastructures. For instance, a program-
mer can test the application layer protocol with both applications running on the
same computer – thus bypassing all networking problems. Likewise, a physical-
layer specialist can test a modem on point-to-point links, also disregarding net-
working issues. Each of the tasks of compressing, providing reliability, privacy,
authenticity, routing, flow control, and physical-layer communication requires spe-
cific knowledge. Thanks to the layering approach, each task can be accomplished by
people specialized in their respective domain. Similarly, equipment from different
manufacturers work together, as long as they respect the protocols.
The OSI architecture is a generic model that does not prescribe a specific
protocol. The Internet uses the TCP/IP protocol stack, which is essentially com-
patible with the OSI architecture but uses five instead of seven layers [4]. The
reduction is mainly obtained by combining the OSI application, presentation, and
session layers into a single layer called the application layer. The transport layer
1
Modem is the result of contracting the terms mod ulator and demodulator. In analog
modulation, such as frequency modulation (FM) and amplitude modulation (AM), the
source signal modulates a parameter of a high-frequency oscillation, called the carrier
signal. In AM it modulates the carrier’s amplitude and in FM it modulates the carrier’s
frequency. The modulated signal can be transmitted over the air and in the absence of
noise (which is never the case) the demodulator at the receiver reconstructs an exact
copy of the source signal. In practice, due to noise, the reconstruction only approximates
the source signal. Although modulation and demodulation are misnomers in digital
communication, the term modem has remained in use.
1.2. The topic of this text and some historical perspective 5
w(t)
a5
p(t)
a4
1 a3
t t
T0 a2
a1
a0
chosen to be close to ηBT , where η is some positive number that depends on the
definition of duration and bandwidth. A good value is η = 2.
As an example, consider Figure 1.2. On the left of the figure is a pulse p(t)
that we use as a building block for the communication
3 signal.2 On the right is an
example of a pulse train of the form w(t) = i=0 si p(t−iT0 ), obtained from shifted
and scaled replicas of p(t). We communicate by scaling the pulse replica p(t − iT0 )
by the information-carrying symbol si . If we could substitute p(t) with a narrower
pulse, we could fit more such pulses in a given time interval and therefore we
could send more information-carrying symbols. But a narrower pulse uses more
bandwidth. Hence there is a limit to the pulse width. For a given pulse width,
there is a limit to the number of pulses that we can pack in a given time interval if
we want the receiver to be able to retrieve the symbol sequence from the received
pulse train. Nyquist’s result implies that we can fit essentially 2BT non-interfering
pulses in a time interval of T seconds if the bandwidth is not to exceed B Hz.
In trying to determine the maximum number of bits that can be conveyed with
one signal, Hartley introduced two constraints that make good engineering sense.
First, in a practical realization, the symbols cannot take arbitrarily large values in
R (the set of real numbers). Second, the receiver cannot estimate a symbol with
infinite precision. This suggests that, to avoid errors, symbols should take values
in a discrete subset of some interval [−A, A]. If ±Δ is the receiver’s precision in
determining the amplitude of a pulse, then symbols should take a value in some
alphabet {a0 , a1 , . . . , am−1 } ⊂ [−A, A] such that |ai − aj | ≥ 2Δ when i = j. This
A
implies that the alphabet size can be at most m = 1 + Δ (see Figure 1.3).
n
There are m distinct n-length sequences that can be formed with symbols taken
from an alphabet of size m. Now suppose that we want to communicate a sequence
2
A pulse is not necessarily rectangular. In fact, we do not communicate via rectangular
pulses because they use too much bandwidth.
1.2. The topic of this text and some historical perspective 7
2Δ
−A - A
s s s s s s-R
a0 a1 a2 a3 a4 a5
of k bits. There are 2k distinct such sequences and each such sequence should be
mapped into a distinct symbol sequence (see Figure 1.4). This implies
2k ≤ mn . (1.1)
example 1.1 There are 24 = 16 distinct binary sequences of length k = 4 and
there are 42 = 16 distinct symbol sequences of length n = 2 with symbols taking
value in an alphabet of size m = 4. Hence we can associate a distinct length-2
symbol sequence to each length-4 bit sequence. The following is an example with
symbols taken from the alphabet {a0 , a1 , a2 , a3 }.
bit sequence symbol sequence
0000 a 0 a0
0001 a 0 a1
0010 a 0 a2
0011 a 0 a3
0100 a 1 a0
.. ..
. .
1111 a3 a3
A k
Inserting m = 1 + Δ and n = 2BT in (1.1) and solving for T yields
k A
≤ 2B log2 1 + (1.2)
T Δ
as the highest possible rate in bits per second that can be achieved reliably with
bandwidth B, symbol amplitudes within ±A, and receiver accuracy ±Δ.
Unfortunately, (1.2) does not provide a fundamental limit to the bit rate, because
there is no fundamental limit to how small Δ can be made.
The missing ingredient in Hartley’s calculation was the noise. In 1926 Johnson,
also at Bell Labs, realized that every conductor is affected by thermal noise. The
idea that the received signal should be modeled as the sum of the transmitted signal
plus noise became prevalent through the work of Wiener (1942). Clearly the noise
8 1. Introduction and objectives
prevents the receiver from retrieving the symbols’ values with infinite precision,
which is the effect that Hartley wanted to capture with the introduction of Δ, but
unfortunately there is no way to choose Δ as a function of the noise. In fact, in
the presence of thermal noise, error-free communication becomes impossible. (But
we can make the error probability as small as desired.)
Prior to the publication of Shannon’s revolutionary 1948 paper, the common
belief was that the error probability induced by the noise could be reduced
only by increasing the signal’s power (e.g. by increasing A in the example of
Figure 1.3) or by reducing the bit rate (e.g. by transmitting the same bit multiple
times). Shannon proved that the noise can set a limit to the number of bits per
second that can be transmitted reliably, but as long as we communicate below that
limit, the error probability can be made as small as desired without modifying the
signal’s power and bandwidth. The limit to the bit rate is called channel capacity.
For the setup of interest to us it is the right-hand side of
k P
≤ B log2 1 + , (1.3)
T N0 B
where P is the transmitted signal’s power and N0 /2 is the power spectral density
of the noise (assumed to be white and Gaussian). If the bit rate of a system is
above channel capacity then, no matter how clever the design, the error probability
is above a certain value. The theory that leads to (1.3) is far more subtle and far
more beautiful than the arguments leading to (1.2); yet, the two expressions are
strikingly similar.
What we mentioned here is only a special case of a general formula derived
by Shannon to compute the capacity of a broad class of channels. As he did for
channels, Shannon also posed and answered fundamental questions about sources.
For the purpose of this text, there are two lessons that we should retain about
sources. (1) The essence of a source is its randomness. If a listener knew exactly
what a speaker is about to say, there would be no need to listen. Hence a source
should be modeled by a random variable (or a sequence thereof). In line with
the topic of this text, we assume that the source is digital, meaning that the
random variable takes values in a discrete set. (See Appendix 1.8 for a brief
summary of various kind of sources.) (2) For every such source, there exists a
source encoder that converts the source output into the shortest (in average)
binary string and a source decoder that reconstructs the source output from the
encoder output. The encoder output, for which no further compression is possible,
has the same statistic as a sequence of unbiased coin flips, i.e. it is a sequence of
independent and uniformly distributed bits. Clearly, we can minimize the storage
and/or communicate more efficiently if we compress the source into the shortest
binary string. In this text, we are not concerned with source coding but, for the
above-mentioned reasons, we model the source as a generator of independent and
uniformly distributed bits.
Like many of the inventors mentioned above, Shannon worked at Bell Labs.
His work appeared one year after the invention of the solid-state transistor, by
Bardeen, Brattain, and Shockley, also at Bell Labs. Figure 1.5 summarizes the
various milestones.
1.3. Problem formulation and preview 9
at d
ic ive
n
io
u n ce
m Re
y
or
ve om nd
In ered
he
ed
C a
he
T
nt
io d
v
or r
nt blis
av Tub ad itte
n n e
ap co
ve
he t o
io r a ur
tio
ed
gr Dis
T sis
y
Pu
at e T ict
R m
d
nt
ca
m te
ed
u m c e ns
h
rm h P
ni
Te ism
n
c u t a n ra
irt i e I n
o n eor
u
om e
fo f t he
C Inv
Va Dis es T
Lo io W nve
t
In o s t
lin ne
h
le
of i o n er
of r
ire ag
- v
h lte
e
Te ll’s
ng a
h nt nt
W om
irt e E
B eF
e
ph
w
tr
B Inv se
ax
ec
le
ad
oi
El
W
M
N
R
t
1820
1830
1840
1865
1876
1887
1895
1900
1910
1917
1924
1942
1948
Figure 1.5. Technical milestones leading up to information theory.
3
Individual noise sources do not necessarily have Gaussian statistics. However, due to
the central limit theorem, their aggregate contribution is often quite well approximated
by a Gaussian random process.
1.3. Problem formulation and preview 11
exercise to check that this physical channel is linear and time-invariant. Thus it
can be modeled by a linear filter as shown in Figure 1.6.4 Additional filtering may
occur due to the limitations of some of the components at the sender and/or at
the receiver. For instance, this is the case for a linear amplifier and/or an antenna
for which the amplitude response over the frequency range of interest is not flat
and/or the phase response is not linear. The filter in Figure 1.6 accounts for all
linear time-invariant transformations that act upon the communication signals as
they travel from the sender to the receiver. The channel model of Figure 1.6 is
meaningful for both wireline and wireless communication channels. It is referred
to as the bandlimited Gaussian channel.
Mathematically, a transmitter implements a one-to-one mapping between the
message set and a set of signals. Without loss of essential generality, we may let
the message set be H = {0, 1, . . . , m − 1} for some integer m ≥ 2. For the channel
model of Figure 1.6, the signal set W = {w0 (t), w1 (t), . . . , wm−1 (t)} consists of
continuous and finite-energy signals. We think of the signals as stimuli used by the
transmitter to excite the channel input. They are chosen in such a way that the
receiver can tell, with high probability, which channel input produced an observed
channel output.
Even if we model the source as producing an index from H = {0, 1, . . . , m − 1}
rather than a sequence of bits, we can still measure the communication rate in
terms of bits per second (bps). In fact the elements of the message set can be labeled
with distinct binary sequences of length log2 m. Every time that we communicate
a message, we equivalently communicate log2 m bits. If we can send a signal from
the set W every T seconds, then the message rate is 1/T [messages per second]
and the bit rate is (log2 m)/T [bits per second].
Digital communication is a field that has seen many exciting developments and
is still in vigorous expansion. Our goal is to introduce the reader to the field,
with emphasis on fundamental ideas and techniques. We hope that the reader will
develop an appreciation for the trade-offs that are possible at the transmitter, will
understand how to design (at the building-block level) a receiver that minimizes
the error probability, and will be able to analyze the performance of a point-to-
point communication system.
We will discover that a natural way to design, analyze, and implement a trans-
mitter/receiver pair for the channel of Figure 1.6 is to think in terms of the modules
shown in Figure 1.7. As in the OSI layering model, peer modules are designed
as if they were connected by their own channel. The bottom layer reduces the
passband channel to the more basic baseband-equivalent channel. The middle layer
further reduces the channel to a discrete-time channel that can be handled by the
encoder/decoder pair.
We conclude this section with a very brief overview of the chapters.
Chapter 2 addresses the receiver-design problem for discrete-time observations,
in particular in relationship to the channel seen by the top layer of Figure 1.7, which
is the discrete-time additive white Gaussian noise (AWGN) channel. Throughout
4
If the scattering and reflecting objects move with respect to the transmitting/receiving
antenna, then the filter is time-varying. We do not consider this case.
12 1. Introduction and objectives
6
Messages
?
Encoder Decoder
6
n-Tuples
T
R ?
R
A
Waveform n-Tuple E
N
Former Former C
S
E
M
I 6 I
V
T Baseband-Equivalent E
T Signals R
E ?
R
Up- Down-
Converter Converter
Passband 6
Signals
R(t)
-
6
N (t)
The signals used by the transmitter are chosen to facilitate the receiver’s decision.
One of the performance criteria is the error probability, and we can design systems
that have such a small error probability that for all practical purposes it is zero.
The situation is quite different in analog communication. As there is a continuum
of signals that the transmitter could possibly send, there is no chance for the
receiver to reconstruct an exact replica of the transmitted signal from the noisy
received signal. It no longer makes sense to talk about error probability. If we say
that an error occurs every time that there is a difference between the transmitted
signal and the reconstruction provided by the receiver, then the error probability
is always 1.
b1
b0 b3
b2
The difference, which may still seem insignificant at this point, is made signifi-
cant by the notion of channel capacity. For every channel, there is a rate, called
channel capacity, with the following meaning. Digital communication across the
channel can be made as reliable as desired at any rate below channel capacity. At
rates above channel capacity, it is impossible to reduce the error probability below
a certain value. Now we can see where the difference between analog and digital
communication becomes fundamental. For instance, if we want to communicate
at 1 gigabit per second (Gbps) from Zurich to Los Angeles by using a certain
type of cable, we can cut the cable into pieces of length L, chosen in such a
way that the channel capacity of each piece is greater than 1 Gbps. We can then
design a transmitter and a receiver that allow us to communicate virtually error-
free at 1 Gbps over distance L. By concatenating many such links, we can cover
any desired distance at the same rate. By making the error probability over each
1.5. Notation 15
link sufficiently small, we can meet the desired end-to-end probability of error.
The situation is very different in analog communication, where every piece of
cable contributes to a degradation of the reconstruction.
Need another example? Compare faxing a text to sending an e-mail over the
same telephone line. The fax uses analog technology. It treats the document as
a continuum of gray levels (in two dimensions). It does not differentiate between
text or images. The receiver prints a degraded version of the original. And if we
repeat the operation multiple times by re-faxing the latest reproduction it will not
take long until the result is dismal. E-mail on the other hand is a form of digital
communication. It is almost certain that the receiver reconstructs an identical
replica of the transmitted text.
Because we can turn a continuous-time source into a discrete one, as described
in Appendix 1.8, we always have the option of doing digital rather than analog
communication. In the conversion from continuous to discrete, there is a deteriora-
tion that we control and can make as small as desired. The result can, in principle,
be communicated over unlimited distance and over arbitrarily poor channels with
no further degradation.
1.5 Notation
In Chapter 2 and Chapter 3 we use a discrete-time and a continuous-time channel
model, respectively. Accordingly, the signals we use to communicate are n-tuples in
Chapter 2 and functions of time in Chapter 3. The transition from one set of signals
to the other is made smoothly via the elegant theory of inner product spaces. This
requires seeing both n-tuples and functions as vectors of an appropriate inner
product space, which is the reason we have opted to use the same fonts for both
kind of signals. (Many authors use bold-faced fonts for n-tuples.)
Some functions of time are referred to as waveforms. These are functions that
typically represent voltages or currents within electrical circuits. An example of a
waveform is the signal we use to communicate across a continuous-time channel.
Pulses are waveforms that serve as building blocks for more complex waveforms.
An example of a pulse is the impulse response of a linear time-invariant filter
(LTI). From a mathematical point of view it is by no means essential to make
a distinction between a function, a waveform, and a pulse. We use these terms
because they are part of the language used by engineers and because it helps us
associate a physical meaning with the specific function being discussed.
In this text, a generic function such as g : I → B, where I ⊆ R is the domain
and B is the range, is typically a function of time or a function of frequency.
Engineering texts underline the distinction by writing g(t) and g(f ), respectively.
This is an abuse of notation, which can be very helpful. We will make use of this
abuse of notation as we see fit. By writing g(t) instead of g : I → B, we are
effectively seeing t as representing I, rather than representing a particular value
of I. To refer to a particular moment in time, we use a subscripts such as in t0 .
So, g(t0 ) refers to the value that the function g takes at t = t0 . Similarly, g(f )
refers to a function of frequency and g(f0 ) is the value that g takes at f = f0 .
16 1. Introduction and objectives
5
A copy of the book was generously offered by our dean, Martin Vetterli, to each professor
as a 2011 Christmas gift.
1.6. A few anecdotes 17
that two out of three messages arrived within a day during the warm months and
that only one in three arrived in winter. This was the situation when F. B. Morse
proposed to the French government a telegraph that used electrical wires. Morse’s
proposal was rejected because “No one could interfere with telegraph signals in
the sky, but wire could be cut by saboteurs” [5, Chapter 5].
In 1833 the lawyer and philologist John Pickering, referring to the American
version of the French telegraph on Central Wharf (a Chappe-like tower commu-
nicating shipping news with three other stations in a 12-mile line across Boston
Harbor) asserted that “It must be evident to the most common observer, that no
means of conveying intelligence can ever be devised, that shall exceed or even equal
the rapidity of the Telegraph, for, with the exception of the scarcely perceptible
relay at each station, its rapidity may be compared with that of light itself”. In
today’s technology we can communicate over optical fiber at more than 1012 bits
per second, which may be 12 orders of magnitude faster than the telegraph referred
to by Pickering. Yet Pickering’s flawed reasoning may have seemed correct to most
of his contemporaries.
The electrical telegraph eventually came and was immediately a great success,
yet some feared that it would put newspapers out of business. In 1852 it was
declared that “All ideas of connecting Europe with America, by lines extending
directly across the Atlantic, is utterly impracticable and absurd”. Six years later
Queen Victoria and President Buchanan were communicating via such a line.
Then came the telephone. The first experimental applications of the “electrical
speaking telephone” were made in the US in the 1870s. It quickly became a great
success in the USA, but not in England. In 1876 the chief engineer of the General
Post Office, William Preece, reported to the British Parliament: “I fancy the
descriptions we get of its use in America are a little exaggerated, though there
are conditions in America which necessitate the use of such instruments more
than here. Here we have a superabundance of messengers, errand boys and things
of that kind . . . I have one in my office, but more for show. If I want to send a
message – I use a sounder or employ a boy to take it”.
Compared to the telegraph, the telephone looked like a toy because any child
could use it. In comparison, the telegraph required literacy. Business people first
thought that the telephone was not serious. Where the telegraph dealt in facts
and numbers, the telephone appealed to emotions. Seeing information technology
as a threat to privacy is not new. Already at the time one commentator said, “No
matter to what extent a man may close his doors and windows, and hermetically
seal his key-holes and furnace-registers, with towels and blankets, whatever he may
say, either to himself or a companion, will be overheard”.
In summary, the printing press has been criticized for promoting barbarism; the
electrical telegraphy for being vulnerable to vandalism, a threat to newspapers,
and not superior to the French telegraph; the telephone for being childish, of
no business value, and a threat to privacy. We could of course extend the list
with comments about typewriters, cell phones, computers, the Internet, or about
applications such as e-mail, SMS, Wikipedia, Street View by Google, etc. It would
be good to keep some of these examples in mind when attempts to promote new
ideas are met with resistance.
18 1. Introduction and objectives
So when we consider a file as being the source signal, the source can be modeled as
a discrete-time random process taking values in the finite alphabet {0, 1, . . . , 255}.
Alternatively, we can consider the file as a sequence of bits, in which case the
stochastic process takes values in {0, 1}.
For another example, consider the sequence of pixel values produced by a digital
camera. The color of a pixel is obtained by mixing various intensities of red, green,
and blue. Each of the three intensities is represented by a certain number of bits.
One way to exchange images is to exchange one pixel at a time, according to some
predetermined way of serializing the pixel’s intensities. Also in this case we can
model the source as a discrete-time process.
A discrete-time sequence taking values in a finite alphabet can always be con-
verted into a binary sequence. The resulting average length depends on the source
statistic and on how we do the conversion. In principle we could find the minimum
average length by analyzing all possible ways of making the conversion. Surpris-
ingly, we can bypass this tedious process and find the result by means of a simple
formula that determines the so-called entropy (of the source). This was a major
result in Shannon’s 1948 paper.
example 1.3 A discrete memoryless source is a discrete source with the addi-
tional property that the output symbols are independent and identically distributed.
For a discrete memoryless source that produces symbols taking values in an m-
letter alphabet the entropy is
m
− pi log2 pi ,
i=1
Any book on information theory will prove the stated relationship between the
entropy of a memoryless source and the minimum average number of bits needed
to represent a source symbol. A standard reference is [19].
If the output of the encoder that produces the shortest binary sequence can no
longer be compressed, it means that it has entropy 1. One can show that to have
entropy 1, a binary source must produce independent and uniformly distributed
symbols. Such a source is called a binary symmetric source (BSS). We conclude
that the binary output of a source encoder can either be further compressed
or it has the same statistic as the output of a BSS. This is the main reason a
communication-link designer typically assumes that the source is a BSS.
20 1. Introduction and objectives
1.9 Exercises
Note: The exercises in this first chapter are meant to test if the reader has the
expected knowledge in probability theory.
exercise 1.1 (Probabilities of basic events) Assume that X1 and X2 are inde-
pendent random variables that are uniformly distributed in the interval [0, 1]. Com-
pute the probability of the following events. Hint: For each event, identify the
corresponding region inside the unit square.
(a) 0 ≤ X1 − X2 ≤ 13 .
(b) X13 ≤ X2 ≤ X12 .
(c) X2 − X1 = 12 .
1.9. Exercises 21
(a) A box contains m white and n black balls. Suppose k balls are drawn. Find
the probability of drawing at least one white ball.
(b) We have two coins; the first is fair and the second is two-headed. We pick one
of the coins at random, toss it twice and obtain heads both times. Find the
probability that the coin is fair.
exercise 1.4 (Playing darts) Assume that you are throwing darts at a target.
We assume that the target is one-dimensional, i.e. that the darts all end up on a
line. The “bull’s eye” is in the center of the line, and we give it the coordinate 0.
The position of a dart on the target can then be measured with respect to 0. We
assume that the position X1 of a dart that lands on the target is a random variable
that has a Gaussian distribution with variance σ12 and mean 0. Assume now that
there is a second target, which is further away. If you throw a dart to that target,
the position X2 has a Gaussian distribution with variance σ22 (where σ22 > σ12 ) and
mean 0. You play the following game: You toss a “coin” which gives you Z = 1
with probability p and Z = 0 with probability 1 − p for some fixed p ∈ [0, 1]. If
Z = 1, you throw a dart onto the first target. If Z = 0, you aim for the second
target instead. Let X be the relative position of the dart with respect to the center
of the target that you have chosen.
• A source: The source (not represented in the figure) produces the message to
be transmitted. In a typical application, the message consists of a sequence
of bits but this detail is not fundamental for the theory developed in this
chapter. It is fundamental that the source chooses one “message” from a set of
possible messages. We are free to choose the “label” we assign to the various
messages and our choice is based on mathematical convenience. For now the
mathematical model of a source is as follows. If there are m possible choices,
we model the source as a random variable H that takes values in the message
set H = {0, 1, . . . , (m − 1)}. More often than not, all messages are assumed to
have the same probability but for generality we allow message i to occur with
probability PH (i). The message set H and the probability distribution PH are
assumed to be known to the system designer.
23
24 2. First layer
i∈H ci ∈ C ⊂ X n Y ∈ Yn Ĥ ∈ H
- Transmitter - Channel - Receiver -
• A channel: The system designer needs to be able to cope with a broad class of
channel models. A quite general way to describe a channel is by specifying its
input alphabet X (the set of signals that are physically compatible with the
channel input), the channel output alphabet Y, and a statistical description
of the output given the input. Unless otherwise specified, in this chapter the
output alphabet Y is a subset of R. A convenient way to think about the channel
is to imagine that for each letter x ∈ X that we apply to the channel input, the
channel outputs the realization of a random variable Y ∈ Y of statistic that
depends on x. If Y is a discrete random variable, we describe the probability
distribution (also called probability mass function, abbreviated to pmf) of Y
given x, denoted by PY |X (·|x). If Y is a continuous random variable, we describe
the probability density function (pdf) of Y given x, denoted by fY |X (·|x). In
a typical application, we need to know the statistic of a sequence Y1 , . . . , Yn
of channel outputs, Yk ∈ Y, given a sequence X1 , . . . , Xn of channel inputs,
Xk ∈ X , but our typical channel is memoryless, meaning that
n
PY1 ,...,Yn |X1 ,...,Xn (y1 , . . . , yn |x1 , . . . , xn ) = PYi |Xi (yi |xi )
i=1
example 2.3 (Channel) The channel model that we will use frequently in this
chapter is the one that maps a signal c ∈ Rn into Y = c+Z, where Z is a Gaussian
random vector of independent and identically distributed components. As we will
see later, this is the discrete-time equivalent of the continuous-time additive white
Gaussian noise (AWGN) channel.
The chapter is organized as follows. We first learn the basic ideas behind hypoth-
esis testing, the field that deals with the problem of guessing the outcome of a
random variable based on the observation of another random variable (or random
vector). Then we study the Q function as it is a very valuable tool in dealing with
communication problems that involve Gaussian noise. At that point, we will be
ready to consider the problem of communicating across the discrete-time additive
white Gaussian noise channel. We will first consider the case that involves two
messages and scalar signals, then the case of two messages and n-tuple signals,
and finally the case of an arbitrary number m of messages and n-tuple signals.
Then we study techniques that we use, for instance, to tell if we can reduce the
dimensionality of the channel output signals without undermining the receiver
performance. The last part of the chapter deals with techniques to bound the
error probability when an exact expression is unknown or too difficult to evaluate.
A point about terminology and symbolism needs to be clarified. We are using
ci (and not si ) to denote the signal used for message i because the signals of this
chapter will become codewords in subsequent chapters.
26 2. First layer
1
We assume that Y is a continuous random variable (or continuous random vector). If
it is discrete, then we use PY |H (·|i) instead of fY |H (·|i).
2
P r{·} is a short-hand for probability of the enclosed event.
2.2. Hypothesis testing 27
λy0 −λ0
when H = 0, Y ∼ PY |H (y|0) = e ,
y!
λy
when H = 1, Y ∼ PY |H (y|1) = 1 e−λ1 ,
y!
where 0 ≤ λ0 < λ1 . We read the above as follows: “When H = 0, the observable
Y is Poisson distributed with intensity λ0 . When H = 1, Y is Poisson distributed
with intensity λ1 ”. Once again, the problem of deciding the value of H from the
observable Y is a standard hypothesis testing problem.
PH (i)fY |H (y|i)
PH|Y (i|y) = ,
fY (y)
where fY (y) = i PH (i)fY |H (y|i). In the above expression PH|Y (i|y) is the pos-
terior (also called a posteriori probability of H given Y ). By observing Y = y, the
probability that H = i goes from the prior PH (i) to the posterior PH|Y (i|y).
If the decision is Ĥ = i, the probability that it is the correct decision is the
probability that H = i, i.e. PH|Y (i|y). As our goal is to maximize the probability
of being correct, the optimum decision rule is
where arg maxi g(i) stands for “one of the arguments i for which the function g(i)
achieves its maximum”. The above is called the maximum a posteriori (MAP) deci-
sion rule. In case of ties, i.e. if PH|Y (j|y) equals PH|Y (k|y) equals maxi PH|Y (i|y),
then it does not matter if we decide for Ĥ = k or for Ĥ = j. In either case, the
probability that we have decided correctly is the same.
Because the MAP rule maximizes the probability of being correct for each
observation y, it also maximizes the unconditional probability Pc of being cor-
rect. The former is PH|Y (Ĥ(y)|y). If we plug in the random variable Y instead
of y, then we obtain a random variable. (A real-valued function of a random
variable is a random variable.) The expected value of this random variable is the
(unconditional) probability of being correct, i.e.
Pc = E[PH|Y (Ĥ(Y )|Y )] = PH|Y (Ĥ(y)|y)fY (y)dy. (2.2)
y
called the maximum likelihood (ML) decision rule. The name stems from the fact
that fY |H (y|i), as a function of i, is called the likelihood function.
28 2. First layer
Notice that the ML decision rule is defined even if we do not know PH . Hence
it is the solution of choice when the prior is not known. (The MAP and the ML
decision rules are equivalent only when the prior is uniform.)
The special case in which we have to make a binary decision, i.e. H = {0, 1}, is
both instructive and of practical relevance. We begin with it and generalize in the
next section.
As there are only two alternatives to be tested, the MAP test may now be
written as
Ĥ = 1
fY |H (y|1)PH (1) ≥ fY |H (y|0)PH (0)
.
fY (y) < fY (y)
Ĥ = 0
The above notation means that the MAP test decides for Ĥ = 1 when the left is
bigger than or equal to the right, and decides for Ĥ = 0 otherwise. Observe that
the denominator is irrelevant because fY (y) is a positive constant – hence it will
not affect the decision. Thus an equivalent decision rule is
Ĥ = 1
≥
fY |H (y|1)PH (1) f (y|0)PH (0).
< Y |H
Ĥ = 0
The above test is depicted in Figure 2.2 assuming y ∈ R. This is a very important
figure that helps us visualize what goes on and, as we will see, will be helpful to
compute the probability of error.
The above test is insightful as it shows that we are comparing posteriors after
rescaling them by canceling the positive number fY (y) from the denominator.
However, there are alternative forms of the test that, depending on the details, can
be computationally more convenient. An equivalent test is obtained by dividing
fY |H (y|1)PH (1)
fY |H (y|0)PH (0)
Ĥ = 0 Ĥ = 1
both sides with the non-negative quantity fY |H (y|0)PH (1). This results in the
following binary MAP test:
Ĥ = 1
fY |H (y|1) ≥ PH (0)
Λ(y) = = η. (2.4)
fY |H (y|0) < PH (1)
Ĥ = 0
The left side of the above test is called the likelihood ratio, denoted by Λ(y),
whereas the right side is the threshold η. Notice that if PH (0) increases, so does
the threshold. In turn, as we would expect, the region {y : Ĥ(y) = 0} becomes
larger.
When PH (0) = PH (1) = 1/2 the threshold η is unity and the MAP test becomes
a binary ML test:
Ĥ = 1
≥
fY |H (y|1) f (y|0).
< Y |H
Ĥ = 0
or, equivalently,
Pe (0) = P r{Λ(Y ) ≥ η|H = 0}. (2.6)
Whether it is easier to work with the right side of (2.5) or that of (2.6) depends
on whether it is easier to work with the conditional density of Y or of Λ(Y ). We
will see examples of both cases.
Similar expressions hold for the probability of error conditioned on H = 1,
denoted by Pe (1). Using the law of total probability, we obtain the (unconditional)
error probability
Pe = Pe (1)PH (1) + Pe (0)PH (0).
In deriving the probability of error we have tacitly used an important technique
that we use all the time in probability: conditioning as an intermediate step.
Conditioning as an intermediate step may be seen as a divide-and-conquer strategy.
The idea is to solve a problem that seems hard by breaking it up into subproblems
30 2. First layer
that (i) we know how to solve and (ii) once we have the solution to the sub-
problems we also have the solution to the original problem. Here is how it works
in probability. We want to compute the expected value of a random variable Z.
Assume that it is not immediately clear how to compute the expected value of
Z, but we know that Z is related to another random variable W that tells us
something useful about Z: useful in the sense that for every value w we are able
to compute the expected value of Z given W = w. Then, via the law of total
expectation, we compute: E [Z] = w E [Z|W = w] PW (w). The same principle
applies for probabilities. (This is not a coincidence: The probability of an event is
the expected value of the indicator
function of that event.) For probabilities, the
expression is P r(Z ∈ A) = w P r(Z ∈ A|W = w)PW (w). It is called the law of
total probability.
Let us revisit what we have done in light of the above comments and what else we
could have done. The computation of the probability of error involves two random
variables, H and Y , as well as an event {H = Ĥ}. To compute the probability
of error (2.5) we have first conditioned on all possible values of H. Alternatively,
we could have conditioned on all possible values of Y . This is indeed a viable
alternative. In fact we have already done so (without saying it) in (2.2). Between
the two, we use the one that seems more promising for the problem at hand. We
will see examples of both.
Now we go back to the m-ary hypothesis testing problem. This means that H =
{0, 1, . . . , m − 1}.
Recall that the MAP decision rule, which minimizes the probability of making
an error, is
where fY |H (·|i) is the probability density function of the observable Y when the
hypothesis is i and PH (i) is the probability of the ith hypothesis. This rule is well
defined up to ties. If there is more than one i that achieves the maximum on the
right side of one (and thus all) of the above expressions, then we may decide for
any such i without affecting the probability of error. If we want the decision rule
to be unambiguous, we can for instance agree that in case of ties we choose the
largest i that achieves the maximum.
When all hypotheses have the same probability, then the MAP rule specializes
to the ML rule, i.e.
R0 R1
Rm−1
Ri
We will always assume that fY |H is either given as part of the problem formulation
or that it can be figured out from the setup. In communication, we typically know
the transmitter and the channel. In this chapter, the transmitter is the map from H
to C ⊂ X n and the channel is described by the pdf fY |X (y|x) known for all x ∈ X n
and all y ∈ Y n . From these two, we immediately obtain fY |H (y|i) = fY |X (y|ci ),
where ci is the signal assigned to i.
Note that the decision (or decoding) function Ĥ assigns an i ∈ H to each y ∈ Rn .
As already mentioned, it can be described by the decision (or decoding) regions
Ri , i ∈ H, where Ri consists of those y for which Ĥ(y) = i. It is convenient to
think of Rn as being partitioned by decoding regions as depicted in Figure 2.3.
We use the decoding regions to express the error probability Pe or, equivalently,
the probability Pc = 1 − Pe of deciding correctly. Conditioned on H = i we have
Pe (i) = 1 − Pc (i)
=1− fY |H (y|i)dy.
Ri
ξ
ξ
r= sin(θ)
θ x
ξ2 ξ2
To prove (f), we use (e) and the fact that e− 2 sin2 θ ≤ e− 2 for θ ∈ [0, π2 ]. Hence
π2
1 ξ2 1 ξ2
Q(ξ) ≤ e− 2 dθ = e− 2 .
π 0 2
A plot of the Q function and its bounds is given in Figure 2.4.
α2
101 √ 1 e− 2
2πα
100
2
Q(α) 1 − α2
2
e
−1
10
2
α2 √ 1 α
1+α2 2πα
e− 2
−2
10
Q(α)
10−3
10−4
10−5
10−6
10−7
0 1 2 3 4 5
α
i∈H c i ∈ Rn Y = ci + Z Ĥ
- Transmitter - - Receiver -
6
Z ∼ N (0, σ 2 In )
ci ∈ Rn . The channel adds a random (noise) vector Z which is zero-mean and has
independent and identically distributed Gaussian components of variance σ 2 . In
short, Z ∼ N (0, σ 2 In ). The observable is Y = ci + Z.
We begin with the simplest possible situation, specifically when there are only
two equiprobable messages and the signals are scalar (n = 1). Then we generalize
to arbitrary values for n and finally we consider arbitrary values also for the
cardinality m of the message set.
34 2. First layer
Let the message H ∈ {0, 1} be equiprobable and assume that the transmitter maps
H = 0 into c0 ∈ R and H = 1 into c1 ∈ R. The output statistic for the various
hypotheses is as follows:
H=0: Y ∼ N (c0 , σ 2 )
H=1: Y ∼ N (c1 , σ 2 ).
An equivalent way to express the output statistic for each hypothesis is
1 (y − c0 )2
fY |H (y|0) = √ exp −
2πσ 2 2σ 2
1 (y − c1 )2
fY |H (y|1) = √ exp − .
2πσ 2 2σ 2
We compute the likelihood ratio
Ĥ = 1
c1 − c0 c20 − c21 ≥
y + ln η.
σ2 2σ 2 <
Ĥ = 0
The progress consists of the fact that the receiver no longer computes an expo-
nential function of the observation. It has to compute ln(η), but this is done once
and for all.
Without loss of essential generality, assume c1 > c0 . Then we can divide both
sides by c1σ−c
2
0
(which is positive) without changing the outcome of the above
comparison. We can further simplify by moving the constants to the right. The
result is the simple test
1, y≥θ
ĤMAP (y) =
0, otherwise,
2.4. Receiver design for the discrete-time AWGN channel 35
fY |H (y|0) fY |H (y|1)
y
c0 θ c1
Figure 2.6. When PH (0) = PH (1), the decision threshold θ is the midpoint
between c0 and c1 . The shaded area represents the probability of error
conditioned on H = 0.
where
σ2 c 0 + c1
θ= ln η + .
c1 − c0 2
Notice that if PH (0) = PH (1), then ln η = 0 and the threshold θ becomes the
midpoint c0 +c
2
1
(Figure 2.6).
We now determine the error probability.
∞
Pe (0) = P r{Y > θ|H = 0} = fY |H (y|0)dy.
θ
This is the probability that a Gaussian random variable with mean c0 and vari-
ance σ 2 exceeds the threshold θ. From
our review on theQ function
we know
c1 −θ
immediately that Pe (0) = Q θ−c 0
. Similarly, P (1) = Q . Finally, Pe =
θ−c
c −θ
σ e σ
PH (0)Q σ + PH (1)Q σ .
0 1
The most common case is when PH (0) = PH (1) = 1/2. Then θ−c σ
0
= c1σ−θ =
c1 −c0 d
2σ = 2σ , where d is the distance between c0 and c1 . In this case, Pe (0) =
Pe (1) = Pe , where
d
Pe = Q .
2σ
This result can be obtained straightforwardly without side calculations. As shown
in Figure 2.6, the threshold is the middle point between c0 and c1 and Pe = Pe (0) =
d
Q( 2σ ). This result should be known by heart.
As in the previous subsection, we assume that H takes values in {0, 1}. What is
new is that the signals are now n-tuples for n ≥ 1. So when H = 0, the transmitter
sends some c0 ∈ Rn and when H = 1, it sends c1 ∈ Rn . The noise added by the
channel is Z ∼ N (0, σ 2 In ) and independent of H.
From here on, we assume that the reader is familiar with the definitions and basic
results related to Gaussian random vectors. (See Appendix 2.10 for a review.) We
also assume familiarity with the notions of inner product, norm, and affine plane.
36 2. First layer
(See Appendix 2.12 for a review.) The inner product between the vectors u and v
will be denoted by u, v, whereas u = u, u denotes the norm of u. We will
make extensive use of these notations.
Even though for now the vector space is over the reals, in Chapter 7 we will
encounter complex vector spaces. Whether the vector space is over R or over C,
the notation is almost identical. For instance, if a and b are (column) n-tuples in
Cn , then a, b = b† a, where † denotes conjugate transpose. The equality holds
even if a and b are in Rn , but in this case the conjugation is inconsequential and
we could write a, b = bT a, where T denotes transpose. By default, we will use
the more general notation for complex vector spaces. An equality that we will use
frequently, therefore should be memorized, is
where {·} denotes the real part of the enclosed complex number. Of course we
can drop the {·} for elements of a real vector space.
As done earlier, to derive a MAP decision rule, we start by writing down the
output statistic for each hypothesis
H=0: Y = c0 + Z ∼ N (c0 , σ 2 In )
H=1: Y = c1 + Z ∼ N (c1 , σ 2 In ),
or, equivalently,
1 y − c0 2
H=0: Y ∼ fY |H (y|0) = exp −
(2πσ 2 )n/2 2σ 2
1 y − c1 2
H=1: Y ∼ fY |H (y|1) = exp − .
(2πσ 2 )n/2 2σ 2
fY |H (y|1) y − c0 2 − y − c1 2
Λ(y) = = exp .
fY |H (y|0) 2σ 2
y − c0 2 − y − c1 2
ln Λ(y) = (2.9)
2σ 2
c − c c 2 − c 2
1 0 0 1
= y, + . (2.10)
σ2 2σ 2
From (2.10), the MAP rule can be written as
Ĥ = 1
c1 − c0 c0 2 − c1 2 ≥
y, + ln η. (2.11)
σ2 2σ 2 <
Ĥ = 0
2.4. Receiver design for the discrete-time AWGN channel 37
Notice the similarity with the corresponding expression of the scalar case. As for
the scalar case, we move the constants to the right and normalize to obtain
Ĥ = 1
≥
y, ψ θ, (2.12)
<
Ĥ = 0
where
c1 − c0
ψ=
d
is the unit-length vector that points in the direction c1 − c0 , d = c1 − c0 is the
distance between the signals, and
σ2 c1 2 − c0 2
θ= ln η +
d 2d
is the decision threshold. Hence the decision regions R0 and R1 are delimited by
the affine plane
y ∈ Rn : y, ψ = θ .
For definiteness, we are assigning the points of the delimiting affine plane to R1 ,
but this is an arbitrary decision that has no effect on the error probability because
the probability that Y is on any given affine plane is zero.
We obtain additional geometrical insight by considering those y for which (2.9)
is constant. The situation is depicted in Figure 2.7, where the signed distance p is
positive if the delimiting affine plane lies in the direction pointed by ψ with respect
to c0 and q is positive if the affine plane lies in the direction pointed by −ψ with
respect to c1 . (In the figure, both p and q are positive.) By Pythagoras’ theorem
applied to the two right triangles with common edge, for all y on the affine plane,
y − c0 2 − y − c1 2 equals p2 − q 2 .
R1 r c 1
R0 AA q
A
A
p A
A
1
−c
r
c0 H
HH A
y
y H A
−c H A
0 H
H
Ar y
A affine plane
A
Figure 2.7. Affine plane delimiting R0 and R1 .
38 2. First layer
y − c0 2 − y − c1 2 = p2 − q 2
y − c0 2 − y − c1 2 = 2σ 2 ln η.
d σ 2 ln η
p= +
2 d
d σ 2 ln η
q= − .
2 d
When PH (0) = PH (1) = 12 , the delimiting affine plane is the set of y ∈ Rn for
which (2.9) equals 0. These are the points y that are at the same distance from
c0 and from c1 . Hence, R0 contains all the points y ∈ Rn that are closer to c0
than to c1 .
A few additional observations are in order.
• The vector ψ is not affected by the prior but the threshold θ is. Hence the prior
affects the position but not the orientation of the delimiting affine plane. As
one would expect, the plane moves away from c0 when PH (0) increases. This is
consistent with our intuition that the decoding region for a hypothesis becomes
larger as the probability of that hypothesis increases.
• The above-mentioned effect of the prior is amplified when σ 2 increases. This is
also consistent with our intuition that the decoder relies less on the observation
and more on the prior when the observation becomes noisier.
• Notice the similarity of (2.9) and (2.10) with (2.7). This suggests a tight
relationship between the scalar and the vector case. We can gain additional
insight by placing the origin of a new coordinate system at c0 +c 2
1
and by
c1 −c0
letting the first coordinate be in the direction of ψ = d , where again
d = c1 − c0 . In this new coordinate system, H = 0 is mapped into the
vector c̃0 = (− d2 , 0, . . . , 0)T and H = 1 is mapped into c̃1 = ( d2 , 0, . . . , 0)T .
If ỹ = (ỹ1 , . . . , ỹn )T is the channel output in this new coordinate system,
ỹ, ψ = ỹ1 . This shows that for a binary decision, the vector case is essentially
the scalar case embedded in an n-dimensional space.
As for the scalar case, we compute the probability of error by conditioning
on H = 0 and H = 1 and then remove the conditioning by averaging: Pe =
Pe (0)PH (0) + Pe (1)PH (1).
When H = 0, Y = c0 + Z and the MAP decoder makes the wrong decision when
Z, ψ ≥ p, i.e. when the projection of Z onto the directional unit vector ψ has
(signed) length that is equal to or greater than p. That this is the condition for
an error should be clear from Figure 2.7, but it can also be derived by inserting
Y = c0 + Z into Y, ψ ≥ θ and using (2.8). Since Z, ψ is a zero-mean Gaussian
random variable of variance σ 2 (see Appendix 2.10), we obtain
p d σ ln η
Pe (0) = Q =Q + .
σ 2σ d
2.4. Receiver design for the discrete-time AWGN channel 39
R0 R1
c0 c1
s s
@
@
@
c2 @
s @
R2
y ∈ R falls outside the decoding region R0 . This is the case if the noise Z ∈ R is
larger than d/2, where d = ci − ci−1 , i = 1, . . . , 5. Thus
d d
Pe (0) = P r Z > =Q .
2 2σ
By symmetry, Pe (5) = Pe (0). For i ∈ {1, 2, 3, 4}, the probability of error when
H = i is the probability that the event {Z ≥ d2 } ∪ {Z < − d2 } occurs. This event
is the union of disjoint events. Its probability is the sum of the probabilities of the
individual events. Hence
d d d d
Pe (i) = P r Z ≥ ∪ Z<− = 2P r Z ≥ = 2Q , i ∈ {1, 2, 3, 4}.
2 2 2 2σ
Finally, Pe = 26 Q 2σ d
+ 46 2Q 2σd
= 53 Q 2σ d
. We see immediately how to
generalize. For a PAM constellation of m points (m positive integer), the error
probability is
2 d
Pe = 2 − Q .
m 2σ
d
-
s s s s s s - y
c0 c1 c2 c3 c4 c5
R0 R1 R2 R3 R4 R5
example 2.6 (m-QAM) Figure 2.10 shows the signal set {c0 , c1 , c2 , c3 } ⊂ R2 for
4-ary quadrature amplitude modulation (QAM). We consider signals as points in
R2 . (We could choose to consider signals as points in C, but we have to postpone
this view until we know how to deal with complex valued noise.) The noise is
Z ∼ N (0, σ 2 I2 ) and the observable, when H = i, is Y = ci + Z. We assume that
the receiver implements an ML decision rule, which for the AWGN channel means
minimum-distance decoding. The decoding region for c0 is the first quadrant, for
c1 the second quadrant, etc. When H = 0, the decoder makes the correct decision
if {Z1 > − d2 } ∩ {Z2 ≥ − d2 }, where d is the minimum distance among signal
points. This is the intersection of independent events. Hence the probability of the
intersection is the product of the probability of each event, i.e.
d 2
d d d
Pc (0) = P r Z1 ≥ − ∩ Z2 ≥ − 2
=Q − = 1−Q .
2 2 2σ 2σ
2.5. Irrelevance and sufficient statistic 41
y2 y2 z2 y
s
6 6 d
2
6
c1 c0 -s - z1
s d s 6
2 d
2
- y1 ? - y1
d
2
s s
c2 c3
d d
Pe (0) = P r Z1 ≤ − ∪ Z2 ≤ −
2 2
d d d d
= P r Z1 ≤ − + P r Z2 ≤ − − Pr Z1 ≤ − ∩ Z2 ≤ −
2 2 2 2
d d
= 2Q − Q2 .
2σ 2σ
Notice that, in determining Pc (0) (Example 2.6), we compute the probability of
the intersection of independent events (which is the product of the probability of
the individual events) whereas in determining Pe (0) without passing through Pc (0)
(this example), we compute the probability of the union of events that are not
disjoint (which is not the sum of the probability of the individual events).
Z1 Z2
?
? Y2
H - - -
-
Receiver Ĥ
-
Y1
example 2.12 Regardless of the distribution on H, the binary test (2.4) depends
on Y only through the likelihood ratio Λ(Y ). Hence H → Λ(Y ) → Y must hold,
which makes the likelihood ratio a sufficient statistic. Notice that Λ(y) is a scalar
even when y is an n-tuple.
The following result is a useful tool in verifying that a function T (y) is a sufficient
statistic. It is proved in Exercise 2.22.
We will often use the notion of indicator function. Recall that if A is an arbitrary
set, the indicator function 1{x ∈ A} is defined as
1, x ∈ A
1{x ∈ A} =
0, otherwise.
Here is a simple and extremely useful bound. Recall that for general events A, B
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
≤ P (A) + P (B) .
More generally, using induction, we obtain the union bound
m m
P Ai ≤ P (Ai ), (UB)
i=1 i=1
where Rci denotes the complement of Ri . If we are able to evaluate the above
integral for every i, then we are able to determine the probability of error exactly.
The bound that we derive is useful if we are unable to evaluate the above integral.
For i = j define
Bi,j = y : PH (j)fY |H (y|j) ≥ PH (i)fY |H (y|i) .
Bi,j is the set of y for which the a posteriori probability of H given Y = y is at
least as high for j as it is for i. Roughly speaking,3 it contains the ys for which a
MAP decision rule would choose j over i.
3
A y for which the a posteriori probability is the same for i and for j is contained in
both Bi,j and Bj,i .
2.6. Error probability bounds 45
To see that the above inclusion holds, consider an arbitrary y ∈ Rci . By definition,
there is at least one k ∈ H such that PH (k)fY |H (y|k) ≥ PH (i)fY |H (y|i). Hence
y ∈ Bi,k .
The reader may wonder why we do not have equality in (2.15). To see that
equality may or may not apply, consider a y that belongs to Bi,l for some l. It
could be so because PH (l)fY |H (y|l) = PH (i)fY |H (y|i) (notice the equality sign).
To simplify the argument, let us assume that for the chosen y there is only one
such l. The MAP decoding rule does not prescribe whether y should be in the
decoding region of i or l. If it is in that of i, then equality in (2.15) does not hold.
If none of the y for which PH (l)fY |H (y|l) = PH (i)fY |H (y|i) for some l has been
assigned to Ri then we have equality in (2.15). In one sentence, we have equality
if all the ties have been resolved against i.
We are now in the position to upper bound Pe (i). Using (2.15) and the union
bound we obtain
Pe (i) = P r {Y ∈ Rci |H = i} ≤ P r Y ∈ Bi,j |H = i
j:j=i
≤ P r Y ∈ Bi,j |H = i (2.16)
j:j=i
= fY |H (y|i)dy.
j:j=i Bi,j
The gain is that it is typically easier to integrate over Bi,j than over Rci .
For instance, when the channel is AWGN and the decision rule is ML, Bi,j is
the set of points in Rn that are at least as close to cj as they are to ci . Figure 2.12
depicts this situation.
In this case,
cj − ci
fY |H (y|i)dy = Q ,
Bi,j 2σ
and the union bound yields the simple expression
cj − ci
Pe (i) ≤ Q .
2σ
j:j=i
cj ci
Bi,j
Figure 2.12. The shape of Bi,j for AWGN channels and ML decision.
46 2. First layer
for a general fY |H . Notice that the above integral is the probability of error under
H = i when there are only two hypotheses, the other hypothesis is H = j, and
the priors are proportional to PH (i) and PH (j).
example 2.15 (m-PSK) Figure 2.13 shows a signal set for 8-ary PSK (phase-
shift keying). m-PSK is defined for all integers m ≥ 2. Formally, the signal
transmitted when H = i , i ∈ H = {0, 1, . . . , m − 1}, is
⎛
⎞
cos 2πim
c i = Es ⎝
⎠ .
sin 2πim
√
For now Es is just the radius of the PSK constellation. As we will see, Es = ci 2
is (proportional to) the energy required to generate ci .
c2 R2
R3 R1
c3 c1
c4 c0 R4 R0
c5 c7
R5 R7
c6 R6
The above expression is rather complicated. Let us see what we obtain through the
union bound.
2.6. Error probability bounds 47
c3
B4,3
R4 c4 B4,3 ∩ B4,5
B4,5
c5
The above expression can be used to upper and lower bound Pe (i). In fact, if we
lower bound the last term by setting it to zero, we obtain the upper bound that we
have just derived. To the contrary, if we upper bound the last term, we obtain a
lower bound to Pe (i). To do so, observe that Rci is the union of (m − 1) disjoint
cones, one of which is Bi,i−1 ∩ Bi,i+1 (see again Figure 2.14). The integral of
fY |H (·|i) over those cones is Pe . If all those integrals gave the same result (which
is not the case) the result would be Pm−1e (i)
. From the figure, the integral of fY |H (·|i)
over Bi,i−1 ∩ Bi,i+1 is clearly smaller than that over the other cones. Hence its
value must be less than Pm−1
e (i)
. Mathematically,
Pe (i)
P r{Y ∈ (Bi,i−1 ∩ Bi,i+1 )|H = i} ≤ .
m−1
48 2. First layer
Inserting in the previous expression, solving for Pe (i) and using the fact that
Pe (i) = Pe yields the desired lower bound
!
Es π m−1
Pe ≥ 2Q sin .
σ2 m m
m
The ratio between the upper and the lower bound is the constant m−1 . For m large,
the bounds become very tight.
The way we upper-bounded P r{Y ∈ Bi,i−1 ∩ Bi,i+1 |H = i} is not the only way to
proceed. Alternatively, we could use the fact that Bi,i−1 ∩ Bi,i+1 is included in Bi,k
where k is the index of the codeword opposed to ci . (In Figure 2.14, B4,3√ ∩ B4,5 ⊂
B4,0 .) Hence P r{Y ∈ Bi,i−1 ∩ Bi,i+1 |H = i} ≤ P r{Y ∈ Bi,k |H = i} = Q Es /σ .
This goes to zero as Es /σ 2 → ∞. It implies that the lower bound obtained this way
becomes tight as Es /σ 2 becomes large.
It is not surprising that the upper bound to Pe (i) becomes tighter as m or
Es /σ 2 (or both) become large. In fact it should be clear that under those conditions
P r{Y ∈ Bi,i−1 ∩ Bi,i+1 |H = i} becomes smaller.
PAM, QAM, and PSK are widely used in modern communications systems. See
Section 2.7 for examples of standards using these constellations.
and we have used this bound for the AWGN channel. With the bound, instead of
having to compute
P r{Y ∈ Rci |H = i} = fY |H (y|i)dy,
Rci
which requires integrating over a possibly complicated region Rci , we have only to
compute
P r{Y ∈ Bi,j |H = i} = fY |H (y|i)dy.
Bi,j
The latter integral is simply Q(di,j /σ), where di,j is the distance between ci and
c −c
the affine plane bounding Bi,j . For an ML decision rule, di,j = i 2 j .
What if the channel is not AWGN? Is there a relatively simple expression for
P r{Y ∈ Bi,j |H = i} that applies for general channels? Such an expression does
2.6. Error probability bounds 49
exist. It is the Bhattacharyya bound that we now derive.4 We will need it only for
those i for which PH (i) > 0. Hence, for the derivation that follows, we assume that
it is the case.
The definition of Bi,j may be rewritten in either of the following two forms
#
PH (j)fY |H (y|j) PH (j)fY |H (y|j)
y: ≥1 = y: ≥1
PH (i)fY |H (y|i) PH (i)fY |H (y|i)
except that the above fraction is not defined when fY |H (y|i) vanishes. This excep-
tion apart, we see that
#
PH (j)fY |H (y|j)
1{y ∈ Bi,j } ≤
PH (i)fY |H (y|i)
is true when y is inside Bi,j ; it is also true when outside because the left side
vanishes and the right is never negative. We do not have to worry about the
exception because we will use
#
PH (j)fY |H (y|j)
fY |H (y|i)1{y ∈ Bi,j } ≤ fY |H (y|i)
PH (i)fY |H (y|i)
#
PH (j) $
= fY |H (y|i)fY |H (y|j),
PH (i)
which is obviously true when fY |H (y|i) vanishes.
We are now ready to derive the Bhattacharyya bound:
P r{Y ∈ Bi,j |H = i} = fY |H (y|i)dy
y∈Bi,j
= fY |H (y|i)1{y ∈ Bi,j }dy
y∈Rn
#
PH (j) $
≤ fY |H (y|i)fY |H (y|j) dy. (2.17)
PH (i) y∈Rn
What makes the last integral appealing is that we integrate over the entire Rn . The
above bound takes a particularly simple form when there are only two hypotheses
of equal probability. In this case,
$
Pe (0) = Pe (1) = Pe ≤ fY |H (y|0)fY |H (y|1) dy. (2.18)
y∈Rn
As shown in Exercise 2.32, for discrete memoryless channels the bound further
simplifies.
4
There are two versions of the Bhattacharyya bound. Here we derive the one that has
the simpler derivation. The other version, which is tighter by a factor 2, is derived in
Exercises 2.29 and 2.30.
50 2. First layer
As the name indicates, the union Bhattacharyya bound combines (2.16) and
(2.17), namely
#
PH (j) $
Pe (i) ≤ P r{Y ∈ Bi,j |H = i} ≤ fY |H (y|i)fY |H (y|j) dy.
PH (i) y∈Rn
j:j=i j:j=i
X Y
1−p
0 0
1 1
1−p
where in (a) we used the fact that the first factor under the square root vanishes if y
contains 0s and the second vanishes if y contains 1s. Hence the only non-vanishing
term in the sum is the one for which yi = Δ for all i. The same bound applies for
H = 1. Hence Pe ≤ 12 pn + 12 pn = pn .
If we use the tighter version of the union Bhattacharyya bound, which as men-
tioned earlier is tighter by a factor of 2, then we obtain
1 n
Pe ≤ p .
2
For the binary erasure channel and the two codewords c0 and c1 we can
actually compute the exact probability of error. An error can occur only if
Y = (Δ, Δ, . . . , Δ)T , and in this case it occurs with probability 12 . Hence,
2.7. Summary 51
1 1
Pe = P r{Y = (Δ, Δ, . . . , Δ)T } = pn .
2 2
The Bhattacharyya bound is tight for the scenario considered in this example!
2.7 Summary
The maximum a posteriori probability (MAP) rule is a decision rule that does
exactly what the name implies – it maximizes the a posteriori probability – and in
so doing it maximizes the probability that the decision is correct. With hindsight,
the key idea is quite simple and it applies even when there is no observable. Let
us review it.
Assume that a coin is flipped and we have to guess the outcome. We model the
coin by the random variable H ∈ {0, 1}. All we know is PH (0) and PH (1). Suppose
that PH (0) ≤ PH (1). Clearly we have the highest chance of being correct if we
guess Ĥ = 1 every time we perform the experiment of flipping the coin. We will
be correct if indeed H = 1, and this has probability PH (1). More generally, for an
arbitrary number m of hypotheses, we choose (one of) the i that maximizes PH (·)
and the probability of being correct is PH (i).
It is more interesting when there is some “side information”. The side informa-
tion is obtained when we observe the outcome of a related random variable Y .
Once we have made the observation Y = y, our knowledge about the distribution
of H gets updated from the prior distribution PH (·) to the posterior distribution
PH|Y (·|y). What we have said in the previous paragraphs applies with the posterior
instead of the prior.
In a typical example PH (·) is constant whereas for the observed y, PH|Y (·|y) is
strongly biased in favor of one hypothesis. If it is strongly biased, the observable
has been very informative, which is what we hope of course.
Often PH|Y is not given to us, but we can find it from PH and fY |H via Bayes’
rule. Although PH|Y is the most fundamental quantity associated to a MAP test
and therefore it would make sense to write the test in terms of PH|Y , the test is
typically written in terms of PH and fY |H because these are the quantities that
are specified as part of the model.
Ideally a receiver performs a MAP decision. We have emphasized the case in
which all hypotheses have the same probability as this is a common assumption
in digital communication. Then the MAP and the ML rule are identical.
The following is an example of how the posterior becomes more and more
selective as the number of observations increases. The example also shows that
the posterior becomes less selective if the observations are more “noisy”.
example 2.17 Assume H ∈ {0, 1} and PH (0) = PH (1) = 1/2. The outcome of
H is communicated across a binary symmetric channel (BSC) of crossover proba-
bility p < 12 via a transmitter that sends n 0s when H = 0 and n 1s when H = 1.
%n X = {0, 1}, output alphabet Y = X , and transition
The BSC has input alphabet
probability pY |X (y|x) = i=1 pY |X (yi |xi ) where pY |X (yi |xi ) equals 1 − p if yi = xi
and p otherwise. (We obtain a BSC, for instance, if we place an appropriately
chosen 1-bit quantizer at the output of the AWGN channel used with a binary
52 2. First layer
PH|Y (0|y)
1 PH|Y (0|y)
0.5
0.5
0 k 0 k
0 1 0 10 20 30 40 50
(a) p = 0.25, n = 1. (b) p = 0.25, n = 50.
PH|Y (0|y)
1 PH|Y (0|y)
0.5
0.5
0 k 0 k
0 1 0 10 20 30 40 50
(c) p = 0.47, n = 1. (d) p = 0.47, n = 50.
Figure 2.16 depicts the behavior of PH|Y (0|y) as a function of the number k of 1s
in y. For the top two figures, p = 0.25. We see that when n = 50 (top right figure),
the prior is very biased in favor of one or the other hypothesis, unless the number k
of observed 1s is nearly n/2 = 25. Comparing to n = 1 (top left figure), we see that
many observations allow the receiver to make a more confident decision. This is
true also for p = 0.47 (bottom row), but we see that with the crossover probability
p close to 1/2, there is a smoother transition between the region in favor of one
hypothesis and the region in favor of the other. If we make only one observation
(bottom left figure), then there is only a slight difference between the posterior for
H = 0 and that for H = 1. This is the worse of the four cases (fewer observations
through noisier channel). The best situation is of course the one of figure (b) (more
observations through a more reliable channel).
The following theorem lists a number of handy facts about unitary matrices.
Most of them are straightforward. Proofs can be found in [12, page 67].
proving that λ must be positive, because it is the ratio of two positive numbers.
If A is positive semidefinite, then the numerator of λ = u† Au/u† u can vanish.
A = U DV † ,
(a) The columns of V are the eigenvectors of A† A. The last n − k columns span
the null space of A.
(b) The columns of U are eigenvectors of AA† . The first k columns span the
range of A.
(c) If m ≥ n then
⎛ √ √ ⎞
diag( λ1 , . . . , λn )
D = ⎝ ................... ⎠,
0(n−m)×n
Note 1: Recall that the set of non-zero eigenvalues of AB equals the set of non-zero
eigenvalues of BA, see e.g. [12, Theorem 1.3.29]. Hence the non-zero eigenvalues
in (c) and (d) are the same.
Observe that
1 λj †
u†i uj = v † A† Avj = v vj = δij , 1 ≤ i, j ≤ k.
λi λj i λi i
i.e. A = U DV † . For i = 1, 2, . . . , m,
AA† ui = U DV † V D† U † ui
= U DD† U † ui = ui λi ,
where in the last equality we use the fact that U † ui contains 1 at position i and 0
elsewhere, and DD† = diag(λ1 , λ2 , . . . , λk , 0, . . . , 0). This shows that λi is also an
eigenvalue of AA† . We have also shown that {vi : i = k + 1, . . . , n} spans the null
space of A and from (2.22) we see that {ui : i = 1, . . . , k} spans the range of A.
fX (g −1 (y))
fY (y) = . (2.23)
|g (g −1 (y))|
fX ( y−b
a )
example 2.27 If g(x) = ax + b then fY (y) = |a| .
y = g(x)
fY (y) A
x
fX (x)
1
fX1 ,X2 (x1 , x2 ) = fR,Θ (r, θ),
| det J|
r r2
fR,θ (r, θ) = exp − .
2π 2
Since fR,Θ (r, θ) depends only on r, we infer that R and Θ are independent
random variables and that Θ is uniformly distributed in [0, 2π). Hence
1
θ ∈ [0, 2π)
fΘ (θ) = 2π
0 otherwise
and
r2
re− 2 r≥0
fR (r) =
0 otherwise.
1 (w − m)2
fW (w) = √ exp − .
2πσ 2 2σ 2
Because a Gaussian random variable is completely specified by its mean m and
variance σ 2 , we use the short-hand notation N (m, σ 2 ) to denote its pdf. Hence
W ∼ N (m, σ 2 ).
An n-dimensional random vector X is a mapping X : Ω → Rn . It can be
seen as a collection X = (X1 , X2 , . . . , Xn )T of n random variables. The pdf
of X is the joint pdf of X1 , X2 , . . . , Xn . The expected value of X, denoted by
E[X], is the n-tuple (E[X1 ], E[X2 ], . . . , E[Xn ])T . The covariance matrix of X is
KX = E[(X − E[X])(X − E[X])T ]. Notice that XX T is an n × n random matrix,
i.e. a matrix of random variables, and the expected value of such a matrix is, by
definition, the matrix whose components are the expected values of those random
variables. A covariance matrix is always Hermitian. This follows immediately from
the definitions.
The pdf of a vector W = (W1 , W2 , . . . , Wn )T that consists of independent and
identically distributed (iid) ∼ N (0, 1) components is
n
1 w2
fW (w) = √ exp − i (2.25)
i=1
2π 2
1 wT w
= exp − . (2.26)
(2π)n/2 2
Z = AW, (2.27)
z2
z1
0 1 3
Figure 2.18.
Next we show that if a random vector has density as in (2.28), then it can be
written as in (2.27). Let Z ∈ Rm be such a random vector and let KZ be its
5
It is possible to play tricks and define a function that can be considered as being the
density of a Gaussian random vector of singular covariance matrix. But what we gain
in doing so is not worth the trouble.
64 2. First layer
KZ = U ΛU † , (2.29)
the diagonal matrix obtained by raising the diagonal elements of Λ to the power
1 1
α. Then Z = U Λ 2 W = AW with A = U Λ 2 nonsingular. It remains to be
shown that fW (w) is as on the right-hand side of (2.26). It must be, because the
transformation from Rn to Rn that sends W to Z = AW is one-to-one. Hence the
density of fW (w) that leads to fZ (z) is unique. It must be (2.26), because (2.28)
was obtained from (2.26) assuming Z = AW .
Many authors use (2.28) to define a Gaussian random vector. We favor (2.27)
because it is more general (it does not depend on the covariance being nonsingular),
and because from this definition it is straightforward to prove a number of key
results associated to Gaussian random vectors. Some of these are dealt with in
the examples that follow.
example 2.34 (Gaussian random variables are not necessarily jointly Gaussian)
Let Y1 ∼ N (0, 1), let X ∈ {±1} be uniformly distributed, and let Y2 = Y1 X. Notice
that Y2 has the same pdf as Y1 . This follows from the fact that the pdf of Y1 is an
even function. Hence Y1 and Y2 are both Gaussian. However, they are not jointly
Gaussian. We come to this conclusion by observing that Y = Y1 + Y2 = Y1 (1 + X)
is 0 with probability 1/2. Hence Y cannot be Gaussian.
To prove the first equality relating a and b we consider the distance between the
vertex γ (common to a and b) and its projection onto the extension of c. As shown
in the figure, this distance may be computed in two ways obtaining a sin β and
b sin(π − α), respectively. The latter may be written as b sin(α). Hence a sin β =
b sin(α), which is the first equality. The second equality is proved similarly.
Most readers are familiar with the notion of vector space from a linear algebra
course. Unfortunately, some linear algebra courses for engineers associate vectors
to n-tuples rather than taking the axiomatic point of view – which is what we need.
A vector space (or linear space) consists of the following (see e.g. [10, 11] for more).
(1) A field F of scalars.6
(2) A set V of objects called vectors.7
(3) An operation called vector addition, which associates with each pair of vectors
α and β in V a vector α + β in V, in such a way that
(i) it is commutative: α + β = β + α;
(ii) it is associative: α + (β + γ) = (α + β) + γ for every α, β, γ in V;
(iii) there is a unique vector, called the zero vector and denoted by 0, such
that α + 0 = α for all α in V;
(iv) for each α in V, there is a β in V such that α + β = 0.
(4) An operation called scalar multiplication, which associates with each vector
α in V and each scalar a in F a vector aα in V, in such a way that
(i) 1α = α for every α in V;
(ii) (a1 a2 )α = a1 (a2 α) for every a1 , a2 in F;
6
In this book the field is almost exclusively R (the field of real numbers) or C (the field
of complex numbers). In Chapter 6, where we talk about coding, we also work with the
field F2 of binary numbers.
7
We are concerned with two families of vectors: n-tuples and functions.
66 2. First layer
Given a vector space and nothing more, one can introduce the notion of a basis for
the vector space, but one does not have the tool needed to define an orthonormal
basis. Indeed the axioms of a vector space say nothing about geometric ideas such
as “length” or “angle”. To remedy this, one endows the vector space with the
notion of inner product.
definition 2.36 Let V be a vector space over C. An inner product on V is a
function that assigns to each ordered pair of vectors α, β in V a scalar α, β in C
in such a way that, for all α, β, γ in V and all scalars c in C,
(a) α + β, γ = α, γ + β, γ
cα, β = cα, β;
(b) β, α = α, β∗ ; (Hermitian symmetry)
(c) α, α ≥ 0 with equality if and only if α = 0.
It is implicit in (c) that α, α is real for all α ∈ V. From (a) and (b), we obtain
the additional properties
(d) α, β + γ = α, β + α, γ
α, cβ = c∗ α, β.
Notice that the above definition is also valid for a vector space over the field of
real numbers, but in this case the complex conjugates appearing in (b) and (2.36)
are superfluous. However, over the field of complex numbers they are necessary
for any α = 0, otherwise we could write
0 < jα, jα = −1α, α < 0,
where the first inequality
√ follows from condition (c) and the fact that jα is a
valid vector (j = −1), and the equality follows from (a) and (2.36) without the
complex conjugate. We see that the complex conjugate is necessary or else we can
create the contradictory statement 0 < 0.
On Cn there is an inner product that is sometimes called the standard inner
product. It is defined on a = (a1 , . . . , an ) and b = (b1 , . . . , bn ) by
a, b = aj b∗j .
j
On R , the standard inner product is often called the dot or scalar product and
n
example 2.37 The vector space Rn equipped with the dot product is an inner
product space and so is the vector space Cn equipped with the standard inner
product.
By means of the inner product, we introduce the notion of length, called norm,
of a vector α, via
α = α, α.
Proof Statements (a) and (b) follow immediately from the definitions. We
postpone the proof of the Cauchy–Schwarz inequality to Example 2.43 as at
that time we will be able to make a more elegant proof based on the concept of a
projection. To prove the triangle inequality we use (2.31) and the Cauchy–Schwarz
inequality applied to {α, β} ≤ |α, β| to prove that α + β2 ≤ (α + β)2 .
Notice that {α, β} ≤ |α, β| holds with equality if and only if α, β is a
non-negative real. The Cauchy–Schwarz inequality holds with equality if and only
if α and β are collinear. Both conditions for equality are satisfied if and only if
one of α, β is a non-negative multiple of the other. The parallelogram equality
follows immediately from (2.31) used twice, once with each sign.
68 2. First layer
*
-
*
@
@ β
β α +
α + @
α
β @ β
−
@
β
-
R
@
-
α α
At this point we could use the inner product and the norm to define the angle
between two vectors but we do not have any use for this. Instead, we will make
frequent use of the notion of orthogonality. Two vectors α and β are defined to
be orthogonal if α, β = 0.
example 2.39 This example and the two that follow are relevant for what we
do from Chapter 3 on. Let W = {w0 (t), . . . , wm−1 (t)} be a finite collection of
∞
functions from R to C such that −∞ |w(t)|2 dt < ∞ for all elements of W. Let
V be the complex vector space spanned by the elements of W, where the addition
of two functions and the multiplication of a function by a scalar are defined in
the obvious way. The reader should verify that the axioms of a vector space are
fulfilled. A vector space of functions will be called a signal space. The standard
inner product for functions from R to C is defined as
α, β = α(t)β ∗ (t)dt,
but it is not a given that V with the standard inner product forms an inner product
space. It is straightforward to verify that the axioms (a), (c), and (d) of Definition
2.36 are fulfilled for all elements of V but axiom (b) is not necessarily fulfilled
(see Example 2.40). If V is such that for all α ∈ V, α, α = 0 implies that α is
the zero vector, then V endowed with the standard inner product forms an inner
product space. All we have said in this example applies also for the real vector
spaces spanned by functions from R to R.
example 2.40 Let V be the set of functions from R to R spanned by the function
that is zero everywhere, except at 0 where it takes value 1. It can easily be checked
that this is a vector space. It contains all the functions that are zero everywhere,
except at 0 where they can take on any value in R. Its zero vector is the function
that is 0 everywhere, including at 0. For all α in V, the standard inner product
α, α equals 0. Hence V with the standard inner product is not an inner product
space.
The problem
highlighted with Example 2.40 is that for a general function
α : I → C, |α(t)|2 dt = 0 does not necessarily imply α(t) = 0 for all t ∈ I. It
2.12. Appendix: Inner product spaces 69
is important to be aware of this fact. However, this potential problem will never
arise in practice because all electrical signals are continuous. Sometimes we work
out examples using signals that have discontinuities (e.g. rectangles) but even
then the problem will not arise unless we use rather bizarre signals.
example 2.41 Let p(t) be a complex-valued square-integrable function (i.e.
|p(t)|2 dt < ∞) and let |p(t)|2 dt > 0. For instance, p(t) could be the rectangular
pulse 1{t ∈ [0, T ]} for some T > 0. The set V = {cp(t) : c ∈ C} with the standard
inner product forms an inner product space. (In V, only the zero-pulse has zero
norm.)
6
α α⊥β
p
- -
α|β β
Projection of α on β.
α,β
Solving for c we obtain c = β2 . Hence
α, β
α|β = β = α, ϕϕ and α⊥β = α − α|β ,
β2
β
where ϕ = β is β scaled to unit norm. Notice that the projection of α on β does
not depend on the norm of β. In fact, the norm of α|β is |α, ϕ|.
Any non-zero vector β ∈ V defines a hyperplane by the relationship
{α ∈ V : α, β = 0} .
70 2. First layer
p
0
Hyperplane defined by β.
{α ∈ V : α, β = c} .
ϕ
6
@
I
@ MBB
@ B
P
i
PP @ B
PP
@B
Affine plane defined by ϕ. 0
The vector β and scalar c that define a hyperplane are not unique, unless
we agree that we use only normalized vectors to define hyperplanes. By letting
β
ϕ = β , the above definition of affine plane may equivalently be written as
{α ∈ V : α, ϕ = βc
} or even as {α ∈ V : α − β
c
ϕ, ϕ = 0}. The first shows
c
that at an affine plane is the set of vectors that have the same projection β ϕ
on ϕ. The second form shows that the affine plane is a hyperplane translated by
c
the vector β ϕ. Some authors make no distinction between affine planes and
hyperplanes; in this case both are called hyperplane.
In the example that follows, we use the notion of projection to prove the
Cauchy–Schwarz inequality stated in Theorem 2.38.
6
α
α|β
p β
- -
-
α,β
β
i−1
α i = βi − βi , ψj ψj
j=1
αi
ψi = .
αi
We have assumed that β1 , . . . , βn is a linearly independent collection. Now assume
that this is not the case. If βj is linearly dependent of β1 , . . . , βj−1 , then at step
i = j the procedure will produce αi = ψi = 0. Such vectors are simply disregarded.
Figure 2.19 gives an example of the Gram–Schmidt procedure applied to a set
of signals.
⎛ ⎞
1 1 1 1 1
2 1 1 ⎝1⎠
1 1 1 1 0
⎛ ⎞
1 1 1 1 0
⎝0⎠
3 0, 0 2
2 1 2 2 2
2.13 Exercises
Exercises for Section 2.2
exercise 2.2 (The “Wetterfrosch”) Let us assume that a “weather frog” bases
his forecast of tomorrow’s weather entirely on today’s air pressure. Determining
a weather forecast is a hypothesis testing problem. For simplicity, let us assume
that the weather frog only needs to tell us if the forecast for tomorrow’s weather
is “sunshine” or “rain”. Hence we are dealing with binary hypothesis testing. Let
H = 0 mean “sunshine” and H = 1 mean “rain”. We will assume that both values
of H are equally likely, i.e. PH (0) = PH (1) = 12 . For the sake of this exercise,
suppose that on a day that precedes sunshine, the pressure may be modeled as a
random variable Y with the following probability density function:
A− A 2 y, 0 ≤ y ≤ 1
fY |H (y|0) =
0, otherwise.
Similarly, the pressure on a day that precedes a rainy day is distributed according to
B + B3 y, 0 ≤ y ≤ 1
fY |H (y|1) =
0, otherwise.
The weather frog’s purpose in life is to guess the value of H after measuring Y .
(d) Now assume that the weather forecaster does not know about hypothesis testing
and arbitrarily choses the decision rule Ĥγ (y) for some arbitrary γ ∈ R.
Determine, as a function of γ, the probability that the decision rule decides
Ĥ = 1 given that H = 0. This probability is denoted P r{Ĥ(Y ) = 1|H = 0}.
(e) For the same decision rule, determine the probability of error Pe (γ) as a
function of γ. Evaluate your expression at γ = θ.
(f ) Using calculus, find the γ that minimizes Pe (γ) and compare your result to θ.
(a) Find and draw the density fY |H (y|0) of the observable under hypothesis
H = 0, and the density fY |H (y|1) of the observable under hypothesis H = 1.
(b) Find the decision rule that minimizes the probability of error.
(c) Compute the probability of error of the optimal decision rule.
exercise 2.4 (Poisson parameter estimation) In this example there are two
hypotheses, H = 0 and H = 1, which occur with probabilities PH (0) = p0 and
PH (1) = 1 − p0 , respectively. The observable Y takes values in the set of non-
negative integers. Under hypothesis H = 0, Y is distributed according to a Poisson
law with parameter λ0 , i.e.
λy0 −λ0
PY |H (y|0) = e . (2.35)
y!
Under hypothesis H = 1,
λy1 −λ1
PY |H (y|1) = e . (2.36)
y!
This is a model for the reception of photons in optical communication.
(a) Derive the MAP decision rule by indicating likelihood and log likelihood ratios.
Hint: The direction of an inequality changes if both sides are multiplied by a
negative number.
(b) Derive an expression for the probability of error of the MAP decision rule.
(c) For p0 = 1/3, λ0 = 2 and λ1 = 10, compute the probability of error of the
MAP decision rule. You may want to use a computer program to do this.
(d) Repeat (c) with λ1 = 20 and comment.
exercise 2.5 (Lie detector) You are asked to develop a “lie detector” and
analyze its performance. Based on the observation of brain-cell activity, your
76 2. First layer
detector has to decide if a person is telling the truth or is lying. For the purpose
of this exercise, the brain cell produces a sequence of spikes. For your decision you
may use only a sequence of n consecutive inter-arrival times Y1 , Y2 , . . . , Yn . Hence
Y1 is the time elapsed between the first and second spike, Y2 the time between the
second and third, etc. We assume that, a priori, a person lies with some known
probability p. When the person is telling the truth, Y1 , . . . , Yn is an iid sequence of
exponentially distributed random variables with intensity α, (α > 0), i.e.
(a) Describe the decision rule of your lie detector for the special case n = 1. Your
detector should be designed so as to minimize the probability of error.
(b) What is the probability PL|T that your lie detector says that the person is
lying when the person is telling the truth?
(c) What is the probability PT |L that your test says that the person is telling the
truth when the person is lying.
(d) Repeat (a) and (b) for a general n. Hint: When Y1 , . . . , Yn is a collection
of iid random variables that are exponentially distributed with parameter
α > 0, then Y1 + · · · + Yn has the probability density function of the Erlang
distribution, i.e.
αn
fY1 +···+Yn (y) = y n−1 e−αy , y ≥ 0.
(n − 1)!
exercise 2.6 (Fault detector) As an engineer, you are required to design the
test performed by a fault detector for a “black-box” that produces a sequence of iid
binary random variables . . . , X1 , X2 , X3 , . . . . Previous experience shows that this
1
“black box” has an a priori failure probability of 1025 . When the “black box” works
properly, pXi (1) = p. When it fails, the output symbols are equally likely to be 0
or 1. Your detector has to decide based on the observation of the past 16 symbols,
i.e. at time k the decision will be based on Xk−16 , . . . , Xk−1 .
exercise 2.7 (Multiple choice exam) You are taking a multiple choice exam.
Question number 5 allows for two possible answers. According to your first
impression, answer 1 is correct with probability 1/4 and answer 2 is correct with
probability 3/4. You would like to maximize your chance of giving the correct
answer and you decide to have a look at what your neighbors on the left and right
have to say. The neighbor on the left has answered ĤL = 1. He is an excellent
student who has a record of being correct 90% of the time when asked a binary
question. The neighbor on the right has answered ĤR = 2. He is a weaker student
who is correct 70% of the time.
2.13. Exercises 77
(a) You decide to use your first impression as a prior and to consider ĤL and
ĤR as observations. Formulate the decision problem as a hypothesis testing
problem.
(b) What is your answer Ĥ?
exercise 2.8 (MAP decoding rule: Alternative derivation) Consider the binary
hypothesis testing problem where H takes values in {0, 1} with probabilites PH (0)
and PH (1). The conditional probability density function of the observation Y ∈ R
given H = i, i ∈ {0, 1} is given by fY |H (·|i). Let Ri be the decoding region for
hypothesis i, i.e. the set of y for which the decision is Ĥ = i, i ∈ {0, 1}.
(a) Find the decision rule that minimizes the probability of error. Hint: Write
down a short sample sequence (y1 , . . . , yk ) and determine its probability under
each hypothesis. Then generalize.
(b) Give a simple sufficient statistic for this decision. (For the purpose of this
question, a sufficient statistic is a function of y with the property that a
decoder that observes y can not achieve a smaller error probability than a
MAP decoder that observes this function of y.)
(c) Suppose that the observed sequence alternates between 0 and 1 except for one
string of ones of length s, i.e. the observed sequence y looks something like
exercise 2.10 (SIMO channel with Laplacian noise, exercise from [1]) One
of the two signals c0 = −1, c1 = 1 is transmitted over the channel shown in
78 2. First layer
Figure 2.20a. The two noise random variables Z1 and Z2 are statistically
independent of the transmitted signal and of each other. Their density functions are
1 −|α|
fZ1 (α) = fZ2 (α) = e .
2
Z1 y2
? 6
- m - Y1 a
X ∈ {c0 , c1 } s (1, 1)
- b
s
(y1 , y2 )
- m - Y2
- y1
6
Z2
(a) (b)
Figure 2.20.
exercise 2.12 (Properties of the Q function) Prove properties (a) through (d)
of the Q function defined in Section 2.3. Hint: For property (d), multiply and divide
inside the integral by the integration variable and integrate by parts. By upper- and
lower-bounding the resulting integral, you will obtain the lower and upper bound.
2.13. Exercises 79
x2 x2 x2
1
x1 x1 x1
−2 1 2 1
Figure 2.21.
exercise 2.13 (16-PAM vs. 16-QAM) The two signal constellations in Figure
2.22 are used to communicate across an additive white Gaussian noise channel.
Let the noise variance be σ 2 . Each point represents a codeword ci for some i.
Assume each codeword is used with the same probability.
a
-
s s s s s s s s s s s s s s s s- x
0
6x2
b
-
s s s s
s s s s
- x1
s s s s
s s s s
Figure 2.22.
(a) For each signal constellation, compute the average probability of error Pe as
a function of the parameters a and b, respectively.
(b) For each signal constellation, compute the average energy per symbol E as a
function of the parameters a and b, respectively:
16
E= PH (i)ci 2 . (2.39)
i=1
In the next chapter it will become clear in what sense E relates to the energy
of the transmitted signal (see Example 3.2 and the discussion that follows).
(c) Plot Pe versus σE2 for both signal constellations and comment.
80 2. First layer
exercise 2.14 (QPSK decision regions) Let H ∈ {0, 1, 2, 3} and assume that
when H = i you transmit the codeword ci shown in Figure 2.23. Under H = i, the
receiver observes Y = ci + Z.
y2
6
s c1
s s - y1
c2 c0
s c3
Figure 2.23.
(a) Draw the decoding regions assuming that Z ∼ N (0, σ 2 I2 ) and that PH (i) =
1/4, i ∈ {0, 1, 2, 3}.
(b) Draw the decoding regions (qualitatively) assuming Z ∼ N (0, σ 2 I2 ) and
PH (0) = PH (2) > PH (1) = PH (3). Justify your answer.
(c) Assume PH (i) = 1/4, i ∈ {0, 1, 2, 3} and that Z ∼ N (0, K), where
2again that
σ 0
K= . How do you decode now?
0 4σ 2
exercise 2.15 (Antenna array) The following problem relates to the design
of multi-antenna systems. Consider the binary equiprobable hypothesis testing
problem:
H = 0 : Y1 = A + Z 1 , Y 2 = A + Z2
H = 1 : Y1 = −A + Z1 , Y2 = −A + Z2 ,
(a) Show that the decision rule that minimizes the probability of error (based on
the observable Y1 and Y2 ) can be stated as
0
σ22 y1 + σ12 y2 ≷ 0.
1
(b) Draw the decision regions in the (Y1 , Y2 ) plane for the special case where
σ1 = 2σ2 .
(c) Evaluate the probability of the error for the optimal detector as a function of
σ12 , σ22 and A.
2.13. Exercises 81
x2
6
s s s
6
a
? - x1
b
s s s
Figure 2.24.
X ∈ {c0 , c1 } × - Y
6 6
A Z
Figure 2.25.
(a) Find the decision rule that the receiver should implement to minimize the
probability of error. Sketch the decision regions.
(b) Calculate the probability of error Pe , based on the above decision rule.
c0 = (1, 0)T
c1 = (−1, 0)T
c2 = (−1, 1)T .
(a) Show that the MAP decoder Ĥ(T (y)) that decides based on T (y) is equivalent
to the MAP decoder Ĥ(y) that operates based on y.
(b) Compute the probabilities P r{Y = 0 | T (Y ) = 0, H = 0} and P r{Y =
0 | T (Y ) = 0, H = 1}. Is it true that H → T (Y ) → Y ?
(a) Show that when the above conditions are satisfied, a MAP decision depends
on the observable Y only through T (Y ). In other words, Y itself is not
necessary. Hint: Work directly with the definition of a MAP decision rule.
84 2. First layer
fY (y)1{yB}
fY |Y ∈B = . (2.41)
f (y)dy
B Y
(a) Let the hypothesis be H ∈ H (of yet unspecified distribution) and let the
observable V ∈ V be related to H via an arbitrary but fixed channel PV |H .
Show that if V is not independent of H then there are distinct elements
i, j ∈ H and distinct elements k, l ∈ V such that
PV |H (k|i) > PV |H (k|j)
(2.42)
PV |H (l|i) < PV |H (l|j).
Hint: For every h ∈ H, v∈V PV |H (v|h) = 1.
(b) Under the condition of part (a), show that there is a distribution PH for
which the observable V affects the decision of a MAP decoder.
(c) Generalize to show that if the observables are U and V , and PU,V |H is fixed
so that H → U → V does not hold, then there is a distribution on H for
which V is not operationally irrelevant. Hint: Argue as in parts (a) and (b)
for the case U = u , where u is as described above.
exercise 2.24 (Antipodal signaling) Consider the signal constellation shown
in Figure 2.26.
x2
6
c1
a s
- x1
−a a
s −a
c0
Figure 2.26.
Assume that the codewords c0 and c1 are used to communicate over the discrete-
time AWGN channel. More precisely:
H=0: Y = c0 + Z,
H=1: Y = c1 + Z,
where Z ∼ N (0, σ 2 I2 ). Let Y = (Y1 , Y2 )T .
(a) Argue that Y1 is not a sufficient statistic.
(b) Give a different signal constellation with two codewords c̃0 and c̃1 such that,
when used in the above communication setting, Y1 is a sufficient statistic.
exercise 2.25 (Is it a sufficient statistic?) Consider the following binary
hypothesis testing problem
H=0: Y = c0 + Z
H=1: Y = c1 + Z,
86 2. First layer
exercise 2.26 (Union bound) Let Z ∼ N (c, σ 2 I2 ) be a random vector that takes
values in R2 , where c = (2, 1)T . Find a non-trivial upper bound to the probability
that Z is in the shaded region of Figure 2.27.
z2
1 c
z1
1
Figure 2.27.
exercise 2.27 (QAM with erasure) Consider a QAM receiver that outputs a
special symbol δ (called erasure) whenever the observation falls in the shaded area
shown in Figure 2.28 and does minimum-distance decoding otherwise. (This is
neither a MAP nor an ML receiver.) Assume that c0 ∈ R2 is transmitted and
that Y = c0 + N is received where N ∼ N (0, σ 2 I2 ). Let P0i , i = 0, 1, 2, 3 be the
probability that the receiver outputs Ĥ = i and let P0δ be the probability that it
outputs δ. Determine P00 , P01 , P02 , P03 , and P0δ .
y2
c1 c0 b
b−a
y1
c2 c3
Figure 2.28.
2.13. Exercises 87
Comment: If we choose b − a large enough, we can make sure that the probability
of the error is very small (we say that an error occurred if Ĥ = i, i ∈ {0, 1, 2, 3}
and H = Ĥ). When Ĥ = δ, the receiver can ask for a retransmission of H. This
requires a feedback channel from the receiver to the transmitter. In most practical
applications, such a feedback channel is available.
exercise 2.28 (Repeat codes and Bhattacharyya bound) Consider two equally
likely hypotheses. Under hypothesis H = 0, the transmitter sends c0 = (1, . . . , 1)T
and under H = 1 it sends c1 = (−1, . . . , −1)T , both of length n. The channel
model is AWGN with variance σ 2 in each component. Recall that the probability
of error for an ML receiver that observes the channel output Y ∈ Rn is
√
n
Pe = Q .
σ
Suppose now that the decoder has access only to the sign of Yi , 1 ≤ i ≤ n, i.e. it
observes
(a) Determine the MAP decision rule based on the observable W . Give a simple
sufficient statistic.
(b) Find the expression for the probability of error P̃e of the MAP decoder that
observes W . You may assume that n is odd.
(c) Your answer to (b) contains a sum that cannot be expressed in closed form.
Express the Bhattacharyya bound on P̃e .
(d) For n = 1, 3, 5, 7, find the numerical values of Pe , P̃e , and the Bhattacharyya
bound on P̃e .
H=0: Y ∼ fY |H (y|0)
H=1: Y ∼ fY |H (y|1).
√
(b) Prove that for a, b ≥ 0, min(a, b) ≤ ab ≤ a+b 2 . Use this to prove the tighter
version of the Bhattacharyya bound, i.e.
1 $
Pe ≤ fY |H (y|0)fY |H (y|1)dy.
2 y
(c) Compare the above bound to (2.19) when there are two equiprobable hypoth-
eses. How do you explain the improvement by a factor 12 ?
Let
y : PH (j)fY |H (y|j) ≥ PH (i)fY |H (y|i), j<i
Bi,j =
y : PH (j)fY |H (y|j) > PH (i)fY |H (y|i), j > i.
(a) Verify that Bi,j = Bj,i
c
. "
(b) Given H = i, the detector will make an error if and only if y ∈ j:j=i Bi,j .
M −1
The probability of error is Pe = i=0 Pe (i)PH (i). Show that:
M −1
Pe ≤ [P r{Y ∈ B i,j |H = i}PH (i) + P r{Y ∈ B j,i |H = j}PH (j)]
i=0 j>i
−1
. /
M
= fY |H (y|i)PH (i)dy + fY |H (y|j)PH (j)dy
i=0 j>i Bi,j Bi,j
c
−1
M
= min fY |H (y|i)PH (i), fY |H (y|j)PH (j) dy .
i=0 j>i y
So far, we have come across two DMCs, namely the BSC (binary symmetric
channel) and the BEC (binary erasure channel). The purpose of this problem is to
see that for DMCs, the Bhattacharyya bound takes a simple form, in particular
when the channel input alphabet X contains only two letters.
(a) Consider a transmitter that sends c0 ∈ X n and c1 ∈ X n with equal probability.
Justify the following chain of (in)equalities.
(a) $
Pe ≤ PY |X (y|c0 )PY |X (y|c1 )
y
0
1 n
1
(b)
= 2 PY |X (yi |c0,i )PY |X (yi |c1,i )
y i=1
n $
(c)
= PY |X (yi |c0,i )PY |X (yi |c1,i )
y1 ,...,yn i=1
(d) $
= PY |X (y1 |c0,1 )PY |X (y1 |c1,1 )
y1
$
... PY |X (yn |c0,n )PY |X (yn |c1,n )
yn
n $
(e)
= PY |X (y|c0,i )PY |X (y|c1,i )
i=1 y
!n(a,b)
(f ) $
= PY |X (y|a)PY |X (y|b) ,
a∈X ,b∈X ,a=b y
8
Here we are assuming that the output alphabet is discrete. Otherwise we use densities
instead of probabilities.
90 2. First layer
Notice that z depends only on the channel, whereas its exponent depends only
on c0 and c1 .
(c) Evaluate the channel parameter z for the following.
(i) The binary input Gaussian channel described by the densities
√
fY |X (y|0) = N (− E, σ 2 )
√
fY |X (y|1) = N ( E, σ 2 ).
(ii) The binary symmetric channel (BSC) with X = Y = {±1} and transition
probabilities described by
1 − δ, if y = x,
PY |X (y|x) =
δ, otherwise.
(iii) The binary erasure channel (BEC) with X = {±1}, Y = {−1, E, 1}, and
transition probabilities given by
⎧
⎨ 1 − δ, if y = x,
PY |X (y|x) = δ, if y = E,
⎩
0, otherwise.
exercise 2.33 (Bhattacharyya bound and Laplacian noise) Assuming two
equiprobable hypotheses, evaluate the Bhattacharyya bound for the following
(Laplacian noise) setting:
H=0: Y = −a + Z
H=1: Y = a + Z,
where a ∈ R+ is a constant and Z is a random variable of probability density
function fZ (z) = 12 exp (−|z|), z ∈ R.
exercise 2.34 (Dice tossing) You have two dice, one fair and one biased. A
friend tells you that the biased die produces a 6 with probability 14 , and produces
the other values with uniform probabilities. You do not know a priori which of the
two is a fair die. You chose with uniform probabilities one of the two dice, and
perform n consecutive tosses. Let Yi ∈ {1, . . . , 6} be the random variable modeling
the ith experiment and let Y = (Y1 , · · ·, Yn ).
(a) Based on the observable Y , find the decision rule to determine whether the
die you have chosen is biased. Your rule should maximize the probability that
the decision is correct.
2.13. Exercises 91
exercise 2.35 (ML receiver and union bound for orthogonal signaling) Let
H ∈ {1, . . . , m} be uniformly distributed and consider the communication problem
described by:
H=i: Y = ci + Z, Z ∼ N (0, σ 2 Im ),
where c1 , . . . , cm , ci ∈ Rm , is a set of constant-energy orthogonal codewords.
Without loss of generality we assume
√
ci = Eei ,
where ei is the ith unit vector in Rm , i.e. the vector that contains 1 at position i
and 0 elsewhere, and E is some positive constant.
(a) Describe the maximum-likelihood decision rule.
(b) Find the distances ci − cj , i = j.
(c) Using the union bound and the Q function, upper bound the probability Pe (i)
that the decision is incorrect when H = i.
Miscellaneous exercises
(a) On the same graph, plot the two possible output probability density functions.
Indicate, qualitatively, the decision regions.
(b) Determine the optimal receiver in terms of σ0 and σ1 .
(c) Write an expression for the error probability Pe as a function of σ0 and σ1 .
(a) Assume that the receiver observes Y and wants to estimate both H1 and H2 .
Let Ĥ1 and Ĥ2 be the estimates. What is the generic form of the optimal
decision rule?
(b) For the specific set of signals given, what is the set of possible observations,
assuming that σ 2 = 0? Label these signals by the corresponding (joint)
hypotheses.
(c) Assuming now that σ 2 > 0, draw the optimal decision regions.
(d) What is the resulting probability of correct decision? That is, determine the
probability P r{Ĥ1 = H1 , Ĥ2 = H2 }.
(e) Finally, assume that we are interested in only the transmission of user two.
Describe the receiver that minimizes the error probability and determine
P r{Ĥ2 = H2 }.
(a) Write the optimal decision rule as a function of the parameter σ 2 and the
received signal Y .
(b) For the value σ 2 = e4 compute the decision regions.
(c) Give expressions as simple as possible for the error probabilities Pe (0) and
Pe (1).
x2
6
1 s c0
c3 c1
s s - x1
−1 1
−1 s c2
Figure 2.29.
95
96 3. Second layer
N (t)
wi2 (t)Z, and
the energy is Z wi2 (t)dt. In both cases the energy is proportional to
wi 2 = |wi (t)|2 dt.
As in the above example, the squared norm of a signal wi (t) is generally associ-
ated with the signal’s energy. It is quite natural to assume that we communicate
via finite-energy signals. This is the first restriction on W. A linear combination
of a finite number of finite-energy signals is itself a finite-energy signal. Hence,
every vector of the vector space V spanned by W is a square-integrable function.
The second requirement is that if v ∈ V has a vanishing norm, then v(t) vanishes
for all t. Together, these requirements imply that V is an inner product space of
square-integrable functions. (See Example 2.39.)
All signals that represent real-world communication signals are finite-energy and
continuous. Hence the vector space they span is always an inner product space.
This is a good place to mention the various reasons we are interested in the
signal’s energy or, somewhat equivalently, in the signal’s power, which is the energy
per second. First, for safety and for spectrum reusability, there are regulations that
limit the power of a transmitted signal. Second, for mobile devices, the energy of
the transmitted signal comes from the battery: a battery charge lasts longer if we
decrease the signal’s power. Third, with no limitation to the signal’s power, we can
transmit across a continuous-time AWGN channel at any desired rate, regardless
of the available bandwidth and of the target error probability. Hence, it would be
unfair to compare signaling methods that do not use the same power.
For now, we assume that W is given to us. The problem of choosing a suitable
set W of signals will be studied in subsequent chapters.
The highlight of the chapter is the power of abstraction. The receiver design
for the discrete-time AWGN channel relied on geometrical ideas that can be
3.2. White Gaussian noise 97
6
H=i Ĥ
?
Encoder Decoder
6
ci Y
?
Waveform n-Tuple
Former Former
6
wi (t) R(t)
-
6
N (t)
formulated whenever we are in an inner product space. We will use the same
ideas for the continuous-time AWGN channel.
The main result is a decomposition of the sender and the receiver into the
building blocks shown in Figure 3.2. We will see that, without loss of generality,
we can (and should) think of the transmitter as consisting of an encoder that
maps the message i ∈ H into an n-tuple ci , as in the previous chapter, followed
by a waveform former that maps ci into a waveform wi (t). Similarly, we will see
that the receiver can consist of an n-tuple former that takes the channel output
and produces an n-tuple Y . The behavior from the waveform former input to the
n-tuple former output is that of the discrete-time AWGN channel considered in the
previous chapter. Hence we know already what the decoder of Figure 3.2 should
do with the n-tuple former output.
In this chapter (like in the previous one) the vectors (functions) are real-valued.
Hence, we could use the formalism that applies to real inner product spaces. Yet, in
preparation of Chapter 7, we use the formalism for complex inner product spaces.
This mainly concerns
the standard inner product between functions, where we
write a, b = a(t)b∗ (t)dt instead of a, b = a(t)b(t)dt. A similar comment
applies to the definition
& of' covariance, where for zero-mean random variables we
use cov(Zi , Zj ) = E Zi Zj∗ instead of cov(Zi , Zj ) = E [Zi Zj ].
requires measure theory if done rigorously. The good news is that a mathematical
model of N (t) is not needed because N (t) is not observable through physical
experiments. (The reason will become clear shortly.) Our approach is to model
what we can actually measure. We assume a working knowledge of Gaussian
random vectors (reviewed in Appendix 2.10).
A receiver is an electrical instrument that connects to the channel output via
a cable. For instance, in wireless communication, we might consider the channel
output to be the output of the receiving antenna; in which case, the cable is the
one that connects the antenna to the receiver. A cable is a linear time-invariant
filter. Hence, we can assume that all the observations made by the receiver are
through some linear time-invariant filter.
So if N (t) represents the noise introduced by the channel, the receiver sees, at
best, a filtered version Z(t) of N (t). We model Z(t) as a stochastic process and,
as such, it is described by the statistic of Z(t1 ), Z(t2 ), . . . , Z(tk ) for any positive
integer k and any finite collection of sampling times t1 , t2 , . . . , tk .
If the filter impulse response is h(t), then linear system theory suggests that
Z(t) = N (α)h(t − α)dα
and
Z(ti ) = N (α)h(ti − α)dα, (3.1)
but the validity of these expressions needs to be justified, because N (t) is not
a deterministic signal. It is possible to define N (t) as a stochastic process and
prove that the (Lebesgue) integral in (3.1) is well defined; but we avoid this path
which, as already mentioned, requires measure theory. In this text, equation (3.1) is
shorthand for the statement “Z(ti ) is the random variable that models the output
at time ti of a linear time-invariant filter of impulse response h(t) fed with white
Gaussian noise N (t)”. Notice that h(ti − α) is a function of α that we can rename
as gi (α). Now we are in the position to define white Gaussian noise.
definition 3.4 N (t) is white Gaussian noise of power spectral density N20 if,
for any finite collection of real-valued L2 functions g1 (α), . . . , gk (α),
Zi = N (α)gi (α)dα, i = 1, 2, . . . , k (3.2)
If we are not evaluating the integral in (3.2), how do we know if N (t) is white
Gaussian noise? In this text, when applicable, we say that N (t) is white Gaussian
noise, in which case we can use (3.3) as we see fit. In the real world, often we know
enough about the channel to know whether or not its noise can be modeled as
3.3. Observables and sufficient statistics 99
white and Gaussian. This knowledge could come from a mathematical model of
the channel. Another possibility is that we perform measurements and verify that
they behave according to Definition 3.4.
Owing to its importance and frequent use, we formulate the following special
case as a lemma. It is the most important fact that should be remembered about
white Gaussian noise.
lemma 3.5 Let {g1 (t), . . . , gk (t)} be an orthonormal set of real-valued functions.
Then Z = (Z1 , . . . , Zk )T , with Zi defined as in (3.2), is a zero-mean Gaussian
random vector with iid components of variance σ 2 = N20 .
Proof The proof is a straightforward application of the definitions.
example 3.6 Consider two bandpass filters that have non-overlapping frequency
responses but are otherwise identical, i.e. if we frequency-translate the frequency
response of one filter by the proper amount we obtain the frequency response of the
other filter. By Parseval’s relationship, the corresponding impulse responses are
orthogonal to one another. If we feed the two filters with white Gaussian noise and
sample their output (even at different times), we obtain two iid Gaussian random
variables. We could extend the experiment (in the obvious way) to n filters of non-
overlapping frequency responses, and would obtain n random variables that are
iid – hence of identical variance. This explains why the noise is called white: like
for white light, white Gaussian noise has its power equally distributed among all
frequencies.
Are there other types of noise? Yes, there are. For instance, there are natural
and man-made electromagnetic noises. The noise produced by electric motors and
that produced by power lines are examples of man-made noise. Man-made noise is
typically neither white nor Gaussian. The good news is that a careful design should
be able to ensure that the receiver picks up a negligible amount of man-made noise
(if any). Natural noise is unavoidable. Every conductor (resistor) produces thermal
(Johnson) noise. (See Appendix 3.10.) The assumption that thermal noise is white
and Gaussian is an excellent one. Other examples of natural noise are solar noise
and cosmic noise. A receiving antenna picks up these noises, the intensity of which
depends on the antenna’s gain and pointing direction. A current in a conductor
gives rise to shot noise. Shot noise originates from the discrete nature of the electric
charges. Wikipedia is a good reference to learn more about various noise sources.
where k is an arbitrary positive integer and g1 (t), . . . , gk (t) are arbitrary finite-
energy waveforms. The complex conjugate operator “∗ ” on gi∗ (α) is superfluous for
real-valued signals but, as we will see in Chapter 7, the baseband representation
of a passband impulse response is complex-valued.
Notice that we assume that we can perform an arbitrarily large but finite number
k of measurements. By disallowing infinite measurements we avoid distracting
mathematical subtleties without losing anything of engineering relevance.
It is important to point out that the kind of measurements we consider is quite
general. For instance, we can pass R(t) through an ideal lowpass filter of cutoff
frequency B for some huge B (say 1010 Hz) and collect an arbitrary large number
1
of samples taken every 2B seconds so as to fulfill the sampling theorem (Theorem
5.2). In fact, by choosing gi (t) = h( 2B i
− t), where h(t) is the impulse response of
i
the lowpass filter, Vi becomes the filter output sampled at time t = 2B . As stated
by the sampling theorem, from these samples we can reconstruct the filter output.
If R(t) consists of a signal plus noise, and the signal is bandlimited to less than B
Hz, then from the samples we can reconstruct the signal plus the portion of the
noise that has frequency components in [−B, B].
Let V be the inner product space spanned by the elements of the signal set W
and let {ψ1 (t), . . . , ψn (t)} be an arbitrary orthonormal basis for V. We claim that
the n-tuple Y = (Y1 , . . . , Yn )T with ith component
Yi = R(α)ψi∗ (α)dα
It should be clear that we can recover V from Y and U . This is so because, from
the projections onto a basis, we can obtain the projection onto any waveform in
the span of the basis. Mathematically,
∞
Vi = R(α)gi∗ (α)dα
−∞
⎡ ⎤∗
∞
n
ñ
= R(α) ⎣ ξi,j ψj (α) + ξi,j+n φj (α)⎦ dα
−∞ j=1 j=1
n
ñ
∗ ∗
= ξi,j Yj + ξi,j+n Uj ,
j=1 j=1
3.3. Observables and sufficient statistics 101
where ξi,1 , . . . , ξi,n+ñ is the unique set of coefficients in the orthonormal expansion
of gi (t) with respect to the basis {ψ1 (t), . . . , ψn (t), φ1 (t), φ2 (t), . . . , φñ (t)}.
Hence we can consider (Y, U ) as the observable and it suffices to show that Y is
a sufficient statistic. Note that when H = i,
∗
Yj = R(α)ψj (α) = wi (α) + N (α) ψj∗ (α)dα = ci,j + Z|V,j ,
where ci,j is the jth component of the n-tuple of coefficients ci that represents
the waveform wi (t) with respect to the chosen orthonormal basis, and Z|V,j is a
zero-mean Gaussian random variable of variance N20 . The notation Z|V,j is meant
to remind us that this random variable is obtained by “projecting” the noise onto
the jth element of the chosen orthonormal basis for V. Using n-tuple notation, we
obtain the following statistic
H = i, Y = ci + Z|V ,
where we used the fact that wi (t) is in the subspace spanned by {ψ1 (t), . . . , ψn (t)}
and therefore it is orthogonal to φj (t) for each j = 1, 2, . . . , ñ. The notation Z⊥V,j
reminds us that this random variable is obtained by “projecting” the noise onto
the jth element of an orthonormal basis that is orthogonal to V. Using n-tuple
notation, we obtain
H = i, U = Z⊥V ,
where Z⊥V ∼ N (0, N20 Iñ ). Furthermore, Z|V and Z⊥V are independent of each
other and of H. The conditional density of Y, U given H is
Y
U
6
Z⊥V
Q
Q
Q
Z Q
1cH |V V Q
i H
Q
HH Q
j
H
-
Y
Q 0
Q
Q
Q
Q
Q
Q
Figure 3.3. The vector of measurements (Y T , U T )T describes the projection
of the received signal R(t) onto U . The vector Y describes the projection of
R(t) onto V.
general (or else we would not bother sending cib ), it follows that the statistic of
Yb depends on i even if we know the realization of Ya .
Waveform Former
ψ1 (t)
ci,1 ?
- ×l -HH wi (t) =
ci,j ψj (t)
i∈H H j
Σ H
..
Encoder .
- ×l -
ci,n
6
ψn (t)
?
l N (t)
ψ1∗ (t) AWGN
?
Y1 Integrator ×l
ı̂ .. ..
Decoder . .
Integrator ×l R(t)
Yn
6
ψn∗ (t)
n-Tuple Former
the decomposition of Figure 3.4 is consistent with the layering philosophy of the
OSI model (Section 1.1), in the sense that the encoder and decoder are designed as
if they were talking to each other directly via a discrete-time AWGN channel. In
reality, the channel seen by the encoder/decoder pair is the result of the “service”
provided by the waveform former and the n-tuple former.
The above decomposition is useful for the system conception, for the perfor-
mance analysis, as well as for the system implementation; but of course, we always
have the option of implementing the transmitter as a straight map from the
message set H to the waveform set W without passing through the codebook C.
Although such a straight map is a possibility and makes sense for relatively
unsophisticated systems, the decomposition into an encoder and a waveform former
is standard for modern designs. In fact, information theory, as well as coding
theory, devote much attention to the study of encoder/decoder pairs.
The following example is meant to make two important points that apply when
we communicate across the continuous-time AWGN channel and make an ML
decision. First, sets of continuous-time signals may “look” very different yet they
may share the same codebook, which is sufficient to guarantee that the error
probability be the same; second, for binary constellations, what matters for the
error probability is the distance between the two signals and nothing else.
example 3.7 (Orthogonal signals) The following four choices of W = {w0 (t),
w1 (t)} look very different yet, upon an appropriate choice
√ of orthonormal √basis,
they share the same codebook C = {c0 , c1 } with c0 = ( E, 0)T and c1 = (0, E)T .
104 3. Second layer
To see this, it suffices to verify that wi , wj equals E if i = j and equals 0 otherwise.
Hence the two signals are orthogonal to each other and they have squared norm E.
Figure 3.5 shows the signals and the associated codewords.
ψ2 x2
w1 • c1 •
• ψ1 • x1
w0 c0
(a) W in the signal space. (b) C in R2 .
An advantage of sinc pulses is that they have a finite support in the frequency
domain. By taking their Fourier transform, we quickly see that they are orthogonal
to each other. See Appendix 5.10 for details.
Choice 4 (Spread spectrum):
√ 1
n
T T
w0 (t) = Eψ1 (t), with ψ1 (t) = s0,j 1 t − j ∈ 0,
T j=1 n n
√ 1
n
T T
w1 (t) = Eψ2 (t), with ψ2 (t) = s1,j 1 t − j ∈ 0, ,
T j=1 n n
where (s0,1 , . . . , s0,n ) ∈ {±1}n and (s1,1 , . . . , s1,n ) ∈ {±1}n are orthogonal. This
signaling method is called spread spectrum. It is not hard to show that it uses much
bandwidth but it has an inherent robustness with respect to interfering (non-white
and possibly non-Gaussian) signals.
Now assume that we use one of the above choices to communicate across a
continuous-time AWGN channel and that the receiver implements an ML decision
rule. Since the codebook C is the same in all cases, the decoder and the error
probability will be identical no matter which choice we make.
Computing the error probability is particularly easy when c1there are only two
−c0
codewords. From the previous chapter we know that Pe = Q 2σ , where σ 2 =
N0
2 . The distance
0
1 2
1 √ √
c − c := 2 (c − c )2 = E + E = 2E
1 0 1,i 0,i
i=1
which requires neither an orthonormal basis nor the codebook. Yet another
alternative is to use Pythagoras’ theorem. As we know already that our sig-
nals
have squared norm √ E and are orthogonal to each other, their distance is
w0 2 + w1 2 = 2E. Inserting, we obtain
E
Pe = Q .
N0
example 3.9 (Single-shot PSK) Let T and fc be positive numbers and let m be
a positive integer. We speak of single-shot phase-shift keying when the signal set
consists of signals of the form
2E 2π
wi (t) = cos 2πfc t + i 1 t ∈ [0, T ] , i = 0, 1, . . . , m − 1. (3.5)
T m
For mathematical convenience, we assume that 2fc T is an integer, so that
wi 2 = E for all i. (When 2fc T is an integer, wi2 (t) has an integer number
of periods in a length-T interval. This ensures that all wi (t) have the same
norm, regardless of the initial phase. In practice, fc T is very large, which
implies that there are many periods in an interval of length T , in which case
the energy difference due to an incomplete period is negligible.) The signal space
representation can be obtained by using the trigonometric identity cos(α + β) =
cos(α) cos(β) − sin(α) sin(β) to rewrite (3.5) as
wi (t) = ci,1 ψ1 (t) + ci,2 ψ2 (t),
where
√ 2πi 2
ci,1 = E cos , ψ1 (t) = cos(2πfc t)1{t ∈ [0, T ]},
m T
√ 2πi 2
ci,2 = E sin , ψ2 (t) = − sin(2πfc t)1{t ∈ [0, T ]}.
m T
The reader should verify that ψ1 (t) and ψ2 (t) are normalized functions and,
because 2fc T is an integer, they are orthogonal to each other. This can easily be
verified using the trigonometric identity sin α cos β = 12 [sin(α + β) + sin(α − β)].
Hence the codeword associated to wi (t) is
√ cos 2πi/m
ci = E .
sin 2πi/m
In Example 2.15, we have already studied this constellation for the discrete-time
AWGN channel.
example 3.10 (Single-shot QAM) Let T and fc be positive numbers such that
2fc T is an integer, let m be an even positive integer, and define
2
ψ1 (t) = cos(2πfc t)1{t ∈ [0, T ]}
T
2
ψ2 (t) = sin(2πfc t)1{t ∈ [0, T ]}.
T
(We have already established in Example 3.9 that ψ1 (t) and ψ2 (t) are orthogonal
to each other and have unit norm.) If the components of ci = (ci,1 , ci,2 )T , i =
0, . . . , m2 − 1, take values in some discrete subset of the form {±a, ±3a, ±5a, . . . ,
±(m − 1)a} for some positive a, then
wi (t) = ci,1 ψ1 (t) + ci,2 ψ2 (t),
3.5. Generalization and alternative receiver structures 107
The signaling methods discussed in this section are the building blocks of many
communication systems.
Y = (Y1 , Y2 , . . . , Yn )T , where
Yi = R, ψi , i = 1, . . . , n.
We now face a hypothesis testing problem with prior PH (i), i ∈ H, and observ-
able Y distributed according to
1 y − ci 2
fY |H (y|i) = n exp − ,
(2πσ 2 ) 2 2σ 2
r(t) - ×m - Integrator - r(t)b∗ (t)dt
6
b∗ (t) (a)
- @
@ -
r(t) b∗ (T − t) r(t)b∗ (t)dt
t=T
(b)
Figure 3.6. Two ways to implement r(t)b∗ (t)dt, namely via a correlator
(a) and via a matched filter (b) with the output sampled at time T .
The second is obtained from the first by using y−ci 2 = y2 −2{y, ci }+ci 2 .
Once we drop the {·} operator (the vectors are real-valued), remove the constant
y2 , scale by −1/2, we obtain (ii). ∗ ∗
∗
Rules∗ (ii) and (iii) are equivalent since r(t)wi (t)dt = r(t) j ci,j ψj (t) dt =
y c
j j i,j = y, c i .
The MAP rules (i)–(iii) require performing operations of the kind
r(t)b∗ (t)dt, (3.6)
where b(t) is some function (ψj (t) or wj (t)). There are two ways to implement
(3.6). The obvious way, shown in Figure 3.6a is by means of a so-called correlator .
A correlator is a device that multiplies and integrates two input signals. The other
way to implement (3.6) is via a so-called matched filter . This is a filter that takes
r(t) as the input and has h(t) = b∗ (T − t) as impulse response (Figure 3.6b), where
T is an arbitrary design parameter selected in such a way as to make h(t) a causal
impulse response. The matched filter output y(t) is then
y(t) = r(α) h(t − α) dα
= r(α) b∗ (T + α − t) dα,
and at t = T it is
y(T ) = r(α) b∗ (α) dα.
b(t) h0 (t)
1
t t
0 3T 3T 0
h3T (t)
t
0 3T
Figure 3.7.
1 a
t t
T T
T T
t t
1 −a
Figure 3.8. Matched filter response (right) to the input on the left.
110 3. Second layer
plots on the right of the figure show the matched filter response y(t) to the input
on the left. Indeed, at t = T we have a or −a. At any other time we have b or −b,
for some b such that 0 ≤ b ≤ a. This, and the fact that the noise variance does not
depend on the sampling time, implies that t = T is the sampling time at which the
error probability is minimized.
Figure 3.9 shows the block diagrams for the implementation of the three MAP
rules (i)–(iii). In each case the front end has been implemented by using matched
filters, but correlators could also be used, as in Figure 3.4.
Whether we use matched filters or correlators depends on the technology and on
the waveforms. Implementing a correlator in analog technology is costly. But, if the
processing is done by a microprocessor that has enough computational power, then
a correlation can be done at no additional hardware cost. We would be inclined to
use matched filters if there were easy-to-implement filters of the desired impulse
response. In Exercise 3.10 of this chapter, we give an example where the matched
filters can be implemented with passive components.
y1
- ψ1∗ (T − t) @ - minimize2
r(t) y − cj − N0 ln PH (j) ı̂
- t=T -
or, equivalently,
maximize
- ψn∗ (T − t) @ -
y, cj + qj
t=T yn
n-Tuple Former.
q0
r(t)w0∗ (t)dt
?
- w0∗ (T − t) @ - m-
r(t) ı̂
- t=T -
qm−1 Select
∗ Largest
r(t)wm−1 (t)dt
?
- wm−1
∗
(T − t) @ - m-
t=T
Figure 3.9. Block diagrams of a MAP receiver for the waveform AWGN
channel, with y = (y1 , . . . , yn )T and qj = −wj 2 /2 + (N0 /2) ln PH (j).
The dashed boxes can alternatively be implemented
via correlators.
3.6. Continuous-time channels revisited 111
But the out-of-band noise increases the chance that the electronic circuits – up to
and including the n-tuple former – saturate, i.e. that the amplitude of the noise
exceeds the range that can be tolerated by the circuits.
The typical next stage is the so-called automatic gain control (AGC) amplifier,
designed to bring the signal’s amplitude into the desired range. Hence the AGC
amplifier introduces a scaling factor that depends on the strength of the input
signal.
For the rest of this text, we ignore hardware imperfections. Therefore, we can
also ignore the presence of the low-noise amplifier, of the noise-reduction filter, and
of the automatic gain control amplifier. If the channel scales the signal by a factor
α, the receiver front end can compensate by scaling the received signal by α−1 , but
the noise is also scaled by the same factor. This explains why, in evaluating the error
probability associated to a signaling scheme, we often consider channel models that
only add noise. In such cases, the scaling factor α−1 is implicitly accounted for
in the noise parameter N0 /2. An example of how to determine N0 /2 is given in
Appendix 3.11, where we work out a case study based on satellite communication.
Propagation delay and clock misalignment Propagation delay refers to the time
it takes a signal to reach a receiver. If the signal set is W = {w0 (t), w1 (t), . . . ,
wm−1 (t)} and the propagation delay is τ , then for the receiver it is as if the signal
set were W̃ = {w0 (t − τ ), w1 (t − τ ), . . . , wm−1 (t − τ )}. The common assumption is
that the receiver does not know τ when the communication starts. For instance,
in wireless communication, a receiver has no way to know that the propagation
delay has changed because the transmitter has moved while it was turned off. It
is the responsibility of the receiver to adapt to the propagation delay. We come to
the same conclusion when we consider the fact that the clocks of different devices
are often not synchronized. If the clock of the receiver reads t − τ when that of
the transmitter reads t then, once again, for the receiver, the signal set is W̃ for
some unknown τ . Accounting for the unknown τ at the receiver goes under the
general name of clock synchronization. For reasons that will become clear, the
clock synchronization problem decomposes into the symbol synchronization and
into the phase synchronization problems, discussed in Sections 5.7 and 7.5. Until
then and unless otherwise specified, we assume that there is no propagation delay
and that all clocks are synchronized.
response h(t). Owing to the channel linearity, the output due to wi (t) at the input
is, once again, R(t) = wi (t − τ )h(τ )dτ plus noise.
The possibilities we have to cope with the channel filtering depend on whether
the channel impulse response is known to the receiver alone, to both the transmit-
ter and the receiver, or to neither. It is often realistic to assume that the receiver
can measure the channel impulse response. The receiver can then communicate it
to the transmitter via the reversed communication link (if it exists). Hence it is
hardly the case that only the transmitter knows the channel impulse response.
If the transmitter uses the signal set W = {w0 (t), w1 (t), . . . , wm−1 (t)} and the
receiver knows h(t), from the receiver’s point of view, the signal set is W̃ with the
ith signal being w̃i (t) = (wi h)(t) and the channel just adds white Gaussian noise.
This is the familiar case. Realistically, the receiver knows at best an estimate h̃(t)
of h(t) and uses it as the actual channel impulse response.
The most challenging situation occurs when the receiver does not know and
cannot estimate h(t). This is a realistic assumption in bursty communication, when
a burst is too short for the receiver to estimate h(t) and the impulse response
changes from one burst to the next.
The most favorable situation occurs when both the receiver and the transmitter
know h(t) or an estimate thereof. Typically it is the receiver that estimates the
channel impulse response and communicates it to the transmitter. This requires
two-way communication, which is typically available. In this case, the transmitter
can adapt the signal constellation to the channel characteristic. Arguably, the
best strategy is the so-called water-filling (see e.g. [19]) that can be implemented
via orthogonal frequency division multiplexing (OFDM).
We have assumed that the channel impulse response characterizes the channel
filtering for the duration of the transmission. If the transmitter and/or the receiver
move, which is often the case in mobile communication, then the channel is still
linear but time-varying. Excellent graduate-level textbooks that discuss this kind
of channel are [2] and [17].
Colored Gaussian noise We can think of colored noise as filtered white noise. It
is safe to assume that, over the frequency range of interest, i.e. the frequency range
occupied by the information-carrying signals, there is no positive-length interval
over which there is no noise. (A frequency interval with no noise is physically
unjustifiable and, if we insist on such a channel model, we no longer have an
interesting communication problem because we can transmit infinitely many bits
error-free by signaling where there is no noise.) For this reason, we assume that
the frequency response of the noise-shaping filter cannot vanish over a positive-
length interval in the frequency range of interest. In this case, we can modify
the aforementioned noise-reduction filter in such a way that, in the frequency
range of interest, it has the inverse frequency response of the noise-shaping filter.
The noise at the output of the modified noise-reduction filter, called whitening
filter , is zero-mean, Gaussian, and white (in the frequency range of interest).
The minimum error probability with the whitening filter cannot be higher than
without, because the filter is invertible in the frequency range of interest. What
we gain with the noise-whitening filter is that we are back to the familiar situation
114 3. Second layer
where the noise is white and the signal set is W̃ = {w̃0 (t), w̃1 (t), . . . , w̃m−1 (t)},
where w̃i (t) = (wi h)(t) and h(t) is the impulse response of the whitening filter.
3.7 Summary
In this chapter we have addressed the problem of communicating a message across
a waveform AWGN channel. The importance of the continuous-time AWGN
channel model comes from the fact that every conductor is a linear time-invariant
system that smooths out and adds up the voltages created by the electron’s motion.
Owing to the central limit theorem, the result of adding up many contributions can
be modeled as white Gaussian noise. No conductor can escape this phenomena,
unless it is cooled to zero degrees kelvin. Hence every channel adds Gaussian noise.
This does not imply that the continuous-time AWGN channel is the only channel
model of interest. Depending on the situation, there can be other impairments
such as fading, nonlinearities, and interference, that should be considered in the
channel model, but they are outside the scope of this text.
As in the previous chapter, we have focused primarily on the receiver that
minimizes the error probability assuming that the signal set is given to us. We
were able to move forwards swiftly by identifying a sufficient statistic that reduces
the receiver design problem to the one studied in Chapter 2. The receiver consists
of an n-tuple former and a decoder. We have seen that the sender can also be
decomposed into an encoder and a waveform former. This decomposition nat-
urally fits the layering philosophy discussed in the introductory chapter: The
waveform former at the sender and the n-tuple former at the receiver can be
seen as providing a “service” to the encoder–decoder pair. The service consists
in making the continuous-time AWGN channel look like a discrete-time AWGN
channel.
Having established the link between the continuous-time and the discrete-
time AWGN channel, we are in the position to evaluate the error probability
of a communication system for the AWGN channel by means of simulation. An
example is given in Appendix 3.8.
How do we proceed from here? First, we need to introduce the performance
parameters we care mostly about, discuss how they relate to one another, and
understand what options we have to control them. We start this discussion in the
next chapter where we also develop some intuition about the kind of signals we
want to use to transmit many bits.
Second, we need to start paying attention to cost and complexity because they
can quickly get out of hand. For a brute-force implementation, the n-tuple former
requires n correlators or matched filters and the decoder needs to compute and
compare y, cj + qj for m codewords. With k = 100 (a very modest number of
transmitted bits) and n = 2k (a realistic relationship), the brute-force approach
requires 200 matched filters or correlators and the decoder needs to evaluate
roughly 1030 inner products. These are staggering numbers. In Chapter 5 we will
learn how to choose the waveform former in such a way that the n-tuple former
can be implemented with a single matched filter. In Chapter 6 we will see that
3.8. Appendix: A simple simulation 115
there are encoders for which the decoder needs to explore a number of possibilities
that grows linearly rather than exponentially in k.
% encode
c = encodingFunction(message);
% decode
[distances,message_estimate] = min(abs(repmat(y’,1,m)...
-repmat(encodingFunction,k,1)),[],2);
noiseVariance = 1
k = 1000
errorRate = 0.2660
example 3.13 Let g1 (t) and g2 (t) be two finite-energy pulses and for i = 1, 2,
define
Zi = N (α)gi (α)dα, (3.7)
where
N (t)
is white Gaussian noise as we just defined. We compute the covariance
cov Zi , Zj as follows:
& '
cov(Zi , Zj ) = E Zi Zj∗
=E N (α)gi (α)dα N ∗ (β)gj∗ (β)dβ
= E [N (α)N ∗ (β)] gi (α)gj∗ (β)dαdβ
N0
= δ(α − β)gi (α)gj∗ (β)dαdβ
2
N0
= gi (β)gj∗ (β)dβ.
2
3.9. Appendix: Dirac-delta-based definition of white Gaussian noise 117
example 3.14 Let N (t) be white Gaussian noise at the input of a lin-
ear time-invariant circuit of impulse response h(t) and let Z(t) be the
filter’s output. Compute the autocovariance of the output Z(t) = N (α)h(t−α)dα.
Solution: The definition of autocovariance is KZ (τ ) := E [Z(t + τ )Z ∗ (t)]. We
proceed two ways. The computation using the definition of N (t) given in this
appendix mimics the derivation in Example 3.13. The result is KZ (τ ) = N20 h(t+
τ )h∗ (t)dt. If we use the definition of white Gaussian noise given in Section 3.2,
we do not need to calculate (but we do need to know (3.3), which is part of the
definition). In fact, the Zi and Zj defined in (3.2) and used in (3.3) become
Z(t + τ ) and Z(t) if we set gi (α) = h(t + τ − α) and gj (α) = h(t − α), respectively.
Hence we can read the result directly out of 3.3, namely
N0 ∗ N0
KZ (τ ) = h(t + τ − α)h (t − α)dt = h(β + τ )h∗ (β)dβ.
2 2
By defining the self-similarity function1 of h(t)
Rh (τ ) = h(t + τ )h∗ (t)dt
1
Also called the autocorrelation function. We reserve the term autocorrelation function
for stochastic processes and use self-similarity function for deterministic pulses.
2
Recall that a Dirac delta function is defined through what happens when we integrate
it against a function, i.e. through the relationship δ(t)g(t) = g(0).
118 3. Second layer
white Gaussian noise. (But then, why not bypass the mathematical description of
N (t) as we do in Section 3.2?)
As a final remark, note that defining an object indirectly through its behavior,
as we have done in Section 3.2, is not new to us. We do something similar when
we
introduce the Dirac delta function by saying that it fulfills the relationship
f (t)δ(t) = f (0). In both cases, we introduce the object of interest by saying how
it behaves when integrated against a generic function.
R VN (t)
Satellites and the corresponding Earth stations use antennas that have direc-
tivity (typically a parabolic or a horn antenna for a satellite, and a parabolic
antenna for an Earth station). Their directivity is specified by their gain G in the
pointing direction. If the transmitting antenna has gain GT , the power density in
the pointing direction at distance d is P4πd
T GT
2 watts/m2 .
120 3. Second layer
AR
λ2 , the gain GR is dimension-free.) Solving for AR and plugging into PR yields
PT GT GR
PR = . (3.9)
(4πd/λ)2
The factor LS = (4πd/λ)2 is commonly called the free-space path loss, but this
is a misnomer. In fact the free-space attenuation is independent of the wavelength.
It is the relationship between the antenna’s effective area and its gain that brings
in the factor λ2 . Nevertheless, being able to write
GT GR
PR = PT (3.10)
LS
has the advantage of underlining the “gains” and the “losses”. Notice also that
LS is a factor on which the system designer has little control (for a geostationary
satellite the distance is fixed and the carrier frequency is often dictated by
regulations), whereas PT , GT , and GR are parameters that a designer might be
able to choose (within limits).
Now suppose that the receiving antenna is connected to the receiver via a
lossless coaxial cable. The antenna and the receiver input have an impedance and
the connecting cable has a characteristic impedance. For best power transfer, the
three impedances should be resistive and have the same value, typically 50 ohms
(see, e.g., Wikipedia, impedance matching). We assume that it is indeed the case
and let R ohms be its value. Then, the impedance seen by the antenna looking
into the cable is also R as if the receiver were connected directly to the antenna
(see, e.g., Wikipedia, transmission line, or [14]). Figure 3.11 shows the electrical
model for the receiving antenna and its load.3 It shows the voltage source W (t)
that represents the intended signal, the voltage source VN (t) that represents all
noise sources, the antenna impedance R and the antenna’s load R.
3
The circuit of Figure 3.11 is a suitable model for determining the voltage (and the
current) at the receiver input (the load in the figure). There is a more complete
model [26] that enables us to associate the power dissipated by the antenna’s internal
impedance with the power that the antenna radiates back to space.
3.11. Appendix: Channel modeling, a case study 121
R
VN (t)
W (t) R
receiving antenna
Figure 3.11. Electrical model for the receiving antenna and the load
it sees looking into the first amplifier.
The advantage of having all the noise sources be represented by a single source
which is co-located with the signal source W (t) is that the signal-to-noise ratio at
that point is the same as the signal-to-noise-ratio at the input of the n-tuple former.
(Once all noise sources are accounted for at the input, the electronic circuits are
considered as noise free). So, the E/N0 of interest to us is the signal energy absorbed
by the load divided by the noise-power density absorbed by the same load.
The power harvested by the antenna is passed onto the load. This power is PR ,
hence the energy is PR τ , where τ is the duration of the signals (assumed to be
the same for all signals).
As mentioned in Appendix 3.10, it is customary to describe the noise-power
density by the temperature TN of a fictitious resistor that transfers the same noise-
power density to the same load. This density is kB TN . If we know (for instance
from measurements) the power density of each noise source, we can determine the
equivalent density at the receiver input, sum all the densities, and divide by kB to
obtain the noise temperature TN . Here we assume that this number is provided to
us by the manufacturer of the receiver (see Example 3.17 for a numerical value).
Putting things together, we obtain
PR τ PT τ GT GR
E/N0 = = . (3.11)
kB T N LS k B T N
To go one step further, we characterize the two voltage sources of Figure 3.11.
This is a calculation that the hardware designer might want to do to determine
the range of voltages and currents at the antenna output.
Recall that a voltage of v volts applied to a resistor of R ohms dissipates the
power P = v 2 /R watts. When H = i, W (t) = αwi (t) for some scaling factor α.
We determine α by computing the resulting average power dissipated by the load
α2 E
and by equating to PR . Thus PR = 4Rτ . Inserting the value of PR and solving for
α yields
#
4RPT GT GR
α= .
LS E/τ
122 3. Second layer
R
VN (t)
αwi (t) R
receiving antenna
(a) Electrical circuit.
αwi (t) - -
6
VN (t)
N0 /2 = 2RkB TN
N (t)
k B T N LS E
N0 /2 = 2PT τ GT GR
Figure 3.12a summarizes the equivalent electrical circuit under the hypothesis
H = i. As determined in Appendix 3.10, the mean square voltage of the noise
source VN (t) per Hz of (single-sided) bandwidth is N0 = 4RkB TN . Figure 3.12b
is the equivalent representation from the point of view of a system designer. The
usefulness of these models is that they give us actual voltages. As long as we are not
concerned with hardware limitations, for the purpose of the channel model, we are
allowed to scale the signal and the noise by the same factor. Specifically, if we divide
the signal by α and divide the noise-power density by α2 , we obtain the channel
model of Figure 3.12c. Observe that the impedance R has fallen out of the picture.
3.12. Exercises 123
3.12 Exercises
Exercises for Section 3.1
w0 (t) w1 (t)
2
1
t t
T T
2
Figure 3.13.
124 3. Second layer
2 2 2
1 1 1
1 2 3
t t t
1 1 2
Figure 3.14.
(a) By means of the Gram–Schmidt procedure, find an orthonormal basis for the
space spanned by the waveforms in Figure 3.14.
(b) In your chosen orthonormal basis, let w0 (t) and w1 (t) be represented by the
codewords c0 = (3, −1, 1)T and c1 = (−1, 2, 3)T , respectively. Plot w0 (t) and
w1 (t).
(c) Compute the (standard) inner products c0 , c1 and w0 , w1 and compare
them.
(d) Compute the norms c0 and w0 and compare them.
exercise 3.4 (Orthonormal expansion) For the signal set of Figure 3.15, do
the following.
(a) Find the orthonormal basis ψ1 (t), . . . , ψn (t) that you would find by following
the Gram–Schmidt (GS) procedure. Note: No need to work out the intermedi-
ate steps of the GS procedure. The purpose of this exercise is to check, with
hardly any calculation, your understanding of what the GS procedure does.
(b) Find the codeword ci ∈ Rn that describes wi (t) with respect to your orthonor-
mal basis. (No calculation needed.)
1 1 1 1
2 3
t t t t
1 3
Figure 3.15.
3.12. Exercises 125
exercise 3.5 (Noise in regions) Let N (t) be white Gaussian noise of power
spectral density N20 . Let g1 (t), g2 (t), and g3 (t) be waveforms as shown in Figure
3.16. For i = 1, 2, 3, let Zi = N (t)gi∗ (t)dt, Z = (Z1 , Z2 )T , and U = (Z1 , Z3 )T .
1 1 1
T T
t t t
T
−1 −1 −1
Figure 3.16.
6Z2 or Z3 Z2 6 6Z2 or Z3
2 - Z1 - Z1
√ 1 2
1 (0, − 2) −1
@
@
- Z1
@
@ (0, −2√2) −2
1 2
(a)
(b) (c)
Figure 3.17.
exercise 3.6 (Two-signals error probability) The two signals of Figure 3.18 are
used to communicate one bit across the continuous-time AWGN channel of power
spectral density N0 /2 = 6 W/Hz. Write an expression for the error probability of
an ML receiver.
exercise 3.7 (On–off signaling) Consider the binary hypothesis testing problem
specified by:
H=0: R(t) = w(t) + N (t)
H=1: R(t) = N (t),
126 3. Second layer
w0 (t) w1 (t)
1 1
T 2T T 3T
t 0 t
0 T
Figure 3.18.
where N (t) is additive white Gaussian noise of power spectral density N0 /2 and
w(t) is the signal shown in the left of Figure 3.19
(a) Describe the maximum likelihood receiver for the received signal R(t), t ∈ R.
(b) Determine the error probability for the receiver you described in (a).
(c) Sketch a block diagram of your receiver of part (a) using a filter with impulse
response h(t) (or a scaled version thereof ) shown in the right-hand part of
Figure 3.19.
w(t) h(t)
1 1
3T
t t
T 2T
−1
Figure 3.19.
w(t)
A
t
0 T
2k
Figure 3.20.
Sketch a block diagram of a receiver that, based on R(t), decides on the value of
H with least probability of error. (See Example 4.6 for the probability of error.)
exercise 3.10 (Matched filter implementation) In this problem, we consider the
implementation of matched filter receivers. In particular, we consider frequency-
shift keying (FSK) with the following signals:
$
nj
2
T cos 2π T t, for 0 ≤ t ≤ T,
wj (t) = (3.12)
0, otherwise,
(a) Determine the impulse response hj (t) of a causal matched filter for the signal
wj (t). Plot hj (t) and specify the sampling time.
(b) Sketch the matched filter receiver. How many matched filters are needed?
(c) Sketch the output of the matched filter with impulse response hj (t) when the
input is wj (t).
(d) Consider the ideal resonance circuit shown in Figure 3.21.
i(t)
L C u(t)
Figure 3.21.
128 3. Second layer
For this circuit, the voltage response to the input current i(t) = δ(t) is
1 t
cos √LC , t≥0
h(t) = C
0 otherwise.
Show how this can be used to implement the matched filter for wj (t).
Determine how L and C should be chosen. Hint: Suppose that i(t) = wj (t).
In this case, what is u(t)?
|w, φ|2
SNR = .
E [|N, φ|2 ]
Notice that the SNR remains the same if we scale φ(t) by a constant factor. Notice
also that
& ' N0
E |N, φ|2 = . (3.14)
2
(a) Use the Cauchy–Schwarz inequality to give an upper bound on the SNR. What
is the condition for equality in the Cauchy–Schwarz inequality? Find the φ(t)
that maximizes the SNR. What is the relationship between the maximizing
φ(t) and the signal w(t)?
(b) Let us verify that we would get the same result using a pedestrian approach.
Instead of waveforms we consider tuples. So let c = (c1 , c2 )T ∈ R2 and use cal-
culus (instead of the Cauchy–Schwarz inequality) to find the φ = (φ1 , φ2 )T ∈
R2 that maximizes c, φ subject to the constraint that φ has unit norm.
(c) Verify with a picture (convolution) that the output at time T of a filter
with
∞ input w(t) and impulse response h(t) = w(T − t) is indeed w, w =
2
−∞
w (t)dt.
where β1 , β2 , τ1 , τ2 are constants known to the receiver and N1 (t) and N2 (t) are
white Gaussian noise of power spectral density N0 /2. We assume that N1 (t) and
N2 (t) are independent of each other (in the obvious sense) and independent of X.
We also assume that w(t − τ1 )w(t − τ2 )dt = γ, where −1 ≤ γ ≤ 1.
(a) Describe an ML receiver for X that observes both R1 (t) and R2 (t) and deter-
mine its probability of error in terms of the Q function, β1 , β2 , γ, and N0 /2.
(b) Repeat part (a) assuming that the receiver has access only to the sum-signal
R(t) = R1 (t) + R2 (t).
exercise 3.13 (Receiver) The signal set
w0 (t) = sinc2 (t)
√
w1 (t) = 2 sinc2 (t) cos(4πt)
w1 (t)
A
t
T
Figure 3.22.
(a) Describe an ML receiver that decides which pulse was transmitted. We ask
that the n-tuple former contains a single matched filter. Make sure that the
filter is causal and plot its impulse response.
(b) Express the probability of error in terms of T, A, Td , N0 .
exercise 3.15 (Delayed signals) One of the two signals shown in Figure
3.23 is selected at random and is transmitted over the additive white Gaussian
noise channel of noise spectral density N20 . Draw a block diagram of a maximum
likelihood receiver that uses a single matched filter and express its error probability.
130 3. Second layer
w0 (t) w1 (t)
1 1
T 2T T 3T
t 0 t
0 T
Figure 3.23.
exercise 3.16 (ML decoder for AWGN channel) The signal of Figure 3.24 is
fed to an ML receiver designed for a transmitter that uses the four signals of Figure
3.15 to communicate across the AWGN channel. Determine the receiver output Ĥ.
R(t)
1
t
1
Figure 3.24.
exercise 3.17 (AWGN channel and sufficient statistic) Let W = {w0 (t), w1 (t)}
be the signal constellation used to communicate an equiprobable bit across an
additive Gaussian noise channel. In this exercise, we verify that the projection of
the channel output onto the inner product space V spanned by W is not necessarily
a sufficient statistic, unless the noise is white. Let ψ1 (t), ψ2 (t) be an orthonormal
basis for V. We choose the additive noise to be N (t) = Z1 ψ1 (t)+Z2 ψ2 (t)+Z3 ψ3 (t)
for some normalized ψ3 (t) that is orthogonal to ψ1 (t) and ψ2 (t) and choose
Z1 , Z2 , and Z3 to be zero-mean jointly Gaussian random variables of identical
variance σ 2 . Let ci = (ci,1 , ci,2 , 0)T be the codeword associated to wi (t) with
respect to the extended orthonormal basis ψ1 (t), ψ2 (t), ψ3 (t). There is a one-to-one
correspondence between the channel output R(t) and Y = (Y1 , Y2 , Y3 )T , where
Yi = R, ψi . In terms of Y , the hypothesis testing problem is
H = i : Y = ci + Z i = 0, 1,
(a) As a warm-up exercise, let us first assume that Z1 , Z2 , and Z3 are inde-
pendent. Use the Fisher–Neyman factorization theorem (Exercise 2.22 of
Chapter 2) to show that Y1 , Y2 is a sufficient statistic.
(b) Now assume that Z1 and Z2 are independent but Z3 = Z2 . Prove that in this
case Y1 , Y2 is not a sufficient statistic.
(c) To check a specific case, consider c0 = (1, 0, 0)T and c1 = (0, 1, 0)T . Determine
the error probability of an ML receiver that observes (Y1 , Y2 )T and that of
another ML receiver that observes (Y1 , Y2 , Y3 )T .
exercise 3.18 (Mismatched receiver) Let a channel output be
R(t) = c X w(t) + N (t), (3.15)
where c > 0 is some deterministic constant, X is a uniformly distributed random
variable that takes values in {3, 1, −1, −3}, w(t) is the deterministic waveform
1, if 0 ≤ t < 1
w(t) = (3.16)
0, otherwise,
N0
and N (t) is white Gaussian noise of power spectral density 2 .
(a) Describe the receiver that, based on the channel output R(t), decides on the
value of X with least probability of error.
(b) Find the error probability of the receiver you have described in part (a).
(c) Suppose now that you still use the receiver you have described in part (a),
but that the received signal is actually
3
R(t) = c X w(t) + N (t), (3.17)
4
i.e. you were unaware that the channel was attenuating the signal. What is
the probability of error now?
(d) Suppose now that you still use the receiver you have found in part (a) and that
R(t) is according to equation (3.15), but that the noise is colored. In fact, N (t)
is a zero-mean stationary Gaussian noise process of auto-covariance function
1 −|τ |/α
e
KN (τ ) = ,
4α
where 0 < α < ∞ is some deterministic real parameter. What is the
probability of error now?
4 Signal design trade-offs
4.1 Introduction
In Chapters 2 and 3 we have focused on the receiver, assuming that the signal set
was given to us. In this chapter we introduce the signal design.
The problem of choosing a convenient signal constellation is not as clean-cut as
the receiver-design problem. The reason is that the receiver-design problem has
a clear objective, to minimize the error probability, and one solution, namely the
MAP rule. In contrast, when we choose a signal constellation we make trade-offs
among conflicting objectives.
We have two main goals for this chapter: (i) to introduce the design parameters
we care mostly about; and (ii) to sharpen our intuition about the role played by the
dimensions of the signal space as we increase the number of bits to be transmitted.
The continuous-time AWGN channel model is assumed.
132
4.2. Isometric transformations applied to the codebook 133
example 4.1 Figure 4.1 shows an original codebook C = {c0 , c1 , c2 , c3 } and three
variations obtained by applying to C a reflection, a rotation, and a translation,
respectively. In each case the isometry a : Rn → Rn sends ci to c̃i = a(ci ).
x2 x2
c1 c0 c̃2 c̃3
x1 x1
c2 c3 c̃1 c̃0
x2 x2
& '
average energy E = E Y can be decreased by a translation if and only if the
2
example 4.2 Let w0 (t) and w1 (t) be rectangular pulses with support [0, T ] and
[T, 2T ], respectively, as shown on the left of Figure 4.2a. Assuming that PH (0) =
PH (1) = 12 , we calculate the average m(t) = 12 w0 (t) + 12 w1 (t) and see that it is
non-zero (center waveform). Hence we can save energy by using the new signal set
defined by w̃i (t) = wi (t) − m(t), i = 0, 1 (right). In Figure 4.2b we see the signals
in the signal space, where ψi (t) = w i−1 (t)
wi−1 , i = 1, 2. As we see from the figures,
w̃0 (t) and w̃1 (t) are antipodal signals. This is not a coincidence: After we remove
the mean, any two signals become the negative of each other. As an alternative to
representing the elements of W in the signal space, we could have represented the
elements of the codebook C in R2 , as we did in Figure 4.1. The two representations
are equivalent.
t t
t t t
w1 •
• w̃1 •
m
• ψ1 ψ1 ψ1
w0
•
w̃0
(b) Signal space viewpoint.
the number m of messages goes to infinity. Recall that the waveform associated to
message i is
wi (t) = ci ψ(t),
where σ 2 = N20 is the variance of the noise in each coordinate. If we insert E = kEb
and m = 2k , we see that the lower bound goes to 1 as k goes to infinity. √ This
happens because the circumference of the PSK constellation grows as k whereas
the number of points grows as 2k . Hence, the minimum distance between points
goes to zero (indeed exponentially fast).
As they are, the signal constellations used in the above two examples are not
suitable to transmit a large amount k of bits by letting the constellation size
m = 2k grow exponentially with k. The problem with the above two examples
is that, as m grows, we are trying to pack an exponentially increasing number of
points into a space that also grows in size but not fast enough. The space becomes
“crowded” as m grows, meaning that the minimum distance becomes smaller and
the probability of error increases.
4.4. Building intuition about scalability: n versus k 137
We should not conclude that PAM and PSK are not useful to send many bits.
On the contrary, these signaling methods are widely used. In the next chapter we
will see how. (See also the comment after the next example.)
example 4.5 (Bit-by-bit on a pulse train) The idea is to use a different dimen-
sion for each bit. Let (bi,1 , bi,2 , . . . , bi,k ) be the binary sequence corresponding to
message i. For mathematical convenience, we assume these bits to take value in
{±1} rather than {0, 1}. We let the associated codeword ci = (ci,1 , ci,2 , . . . , ci,k )T
be defined by
ci,j = bi,j Eb ,
E
where Eb = k is the energy per bit. The transmitted signal is
k
wi (t) = ci,j ψj (t), t ∈ R. (4.2)
j=1
The above expression justifies the name bit-by-bit on a pulse train given to this
signaling method (see Figure 4.3). As we will see in Chapter 5, there are many
other possible choices for the pulse ψ(t).
ψ(t) wi (t)
t t
−Ts Ts Ts 3Ts 9Ts
2 2 2 2 2
(a) (b)
√
Figure 4.3. Example of (4.3) for k = 4 and ci = Eb (1, 1, −1, 1)T .
from the figure what the decoding regions of an ML decoder are, but let us proceed
analytically and find an ML decoding rule that works for any k. The ML receiver
√ k
decides that the constellation point used by the sender is the ci ∈ ± Eb that
maximizes y, ci − c2i . Since ci 2 is the same
2
x2
c1 • • c0
c1 c0
• • x x1
0
c2 • • c3
(a) k = 1. (b) k = 2.
c5 c4
• • −x1
c1 c0
• •
−x2
• •
c6 c7
• •
c2 c3
−x3
(c) k = 3.
We now compute the error probability. As usual, we first compute the error
probability conditioned on a specific ci . From the codebook symmetry, √ we expect
that the error probability will not depend on i. If ci,j is positive, Yj = E√
b +Zj and
a maximum likelihood decoder will make the correct decision if Zj > − Eb . (The
statement is an “if and only if ” if we ignore the zero-probability event that Zj =
√ √
− Eb .) This happens with probability 1−Q σEb . Based on similar reasoning, it is
4.4. Building intuition about scalability: n versus k 139
straightforward to verify that the probability of error is the same if ci,j is negative.
Now let Cj be the event that the decoder makes the correct decision about the
jth bit. The probability of Cj depends only on Zj . The independence of the noise
components implies the independence of C1 , C2 , . . . , Ck . Thus, the probability that
all k bits are decoded correctly when H = i is
√ k
Eb
Pc (i) = 1 − Q ,
σ
which is the same for all i and, therefore, it is also the average Pc . Notice that Pc →
0 as k → ∞.√
However, the probability that any specific bit be decoded incorrectly
is Pb = Q( σEb ), which does not depend on k.
This is called block-orthogonal signaling. The name stems from the fact that in
practice a block of k bits are collected and then mapped into one of m orthogonal
waveforms (see Figure 4.5). Notice that wi 2 = E for all i.
There are many ways to choose the 2k waveforms ψi (t). One way is to choose
ψi (t) = ψ(t − iT ) for some normalized pulse ψ(t) such that ψ(t − iT ) and ψ(t − jT )
are orthogonal when i = j. In this case the requirement for ψ(t) is the same as that
in bit-by-bit on a pulse train, but now we need 2k rather than k shifted versions,
and we send one pulse rather than a train of k weighted pulses. For obvious reasons
this signaling scheme is called pulse position modulation.
Another example is to choose
2E
wi (t) = cos(2πfi t)1{t ∈ [0, T ]}. (4.4)
T
T
2E 1 1
wi , wj = cos[2π(fi + fj )t] + cos[2π(fi − fj )t] dt = E1{i = j},
T 0 2 2
as desired.
x3
x2 c3 •
c2 •
• x2
c2
•
• x1 c1
c1 x1
(a) m = n = 2. (b) m = n = 3.
E
ĤM L (y) = arg maxy, ci −
i 2
= arg maxy, ci
i
= arg max yi ,
i
where yi is the ith component of y. To compute (or bound) the error probability,
we start as usual with a fixed ci . We choose i = 1. When H = 1,
√
E + Zj if j = 1,
Yj =
Zj 1.
if j =
Eb
Pe < exp −k − ln 2 .
2N0
Eb
We see that Pe → 0 as k → ∞, provided that N 0
> 2 ln 2. (It is possible to prove
Eb
that the weaker condition N0 > ln 2 is sufficient. See Exercise 4.3.)
The result of the above example is quite surprising at first. The more bits we
send, the larger is the probability Pc that they will all be decoded correctly. Yet
what goes on is quite clear. In setting all but one component of each codeword
142 4. Signal design trade-offs
√
to zero, we can make the non-zero component as large as kEb . The decoder
looks for the largest component. Because the variance of the noise is the same
in all components and does not grow with k, when k is large it becomes almost
impossible for the noise to alter the position of the largest component.
1
The Shannon Award is the most prestigious award bestowed by the Information
Theory Society. Slepian was the first, after Shannon himself, to receive the award. The
recipient presents the Shannon Lecture at the next IEEE International Symposium on
Information Theory.
4.5. Duration, bandwidth, and dimensionality 143
In essence, this result says that for an arbitrary time interval (a, b) of length T
and an arbitrary frequency interval (c, d) of width W , in the limit of large T and
W , the set of finite-energy signals that are time-limited to (a, b) and frequency-
limited to (c, d) is spanned by T W orthonormal functions. For later reference we
summarize this by the expression
n=
˙ T W, (4.5)
where the “·” on top of the equal sign is meant to remind us that the relationship
holds in the limit of large values for W and T .
Unlike Slepian’s bandwidth definition, which applies also to complex-valued
signals, the bandwidth definitions of Appendix 4.9 have been conceived with real-
valued signals in mind. If s(t) is real-valued, the conjugacy constraint implies
that |sF (f )| is an even function.3 If, in addition, the signal is baseband, then
it is frequency-limited to some interval of the form (−B, B) and, according to a
well-established practice, we say that the signal’s bandwidth is B (not 2B). To
avoid confusions, we use the letter W for bandwidths that account for positive and
2
We do not require that this signal set be closed under addition and under multiplication
by scalars, i.e. we do not require that it forms a vector space.
3
See Section 7.2 for a review of the conjugacy constraint.
144 4. Signal design trade-offs
negative frequencies and use B for so-called single-sided bandwidths. (We may call
W a double-sided bandwidth.)
A result similar to (4.5) can be formulated for other meaningful definitions of
time and frequency limitedness. The details depend on the definitions but the
essence does not. What remains true for many meaningful definitions is that,
asymptotically, there is a linear relationship between W T and n.
Two illustrative examples of this relationship follow. To avoid annoying calcula-
tions, for each example, we take the freedom to use the most convenient definition
of duration and bandwidth.
example 4.9 Let ψ(t) = √1sinc(t/Ts ) and
Ts
ψF (f ) = Ts 1 f ∈ [−1/(2Ts ), 1/(2Ts )]
be a normalized pulse and its Fourier transform. Let ψl (t) = ψ(t−lTs ), l = 1, . . . , n.
The collection B = {ψ1 (t), . . . , ψn (t)} forms an orthonormal set. One way to see
that ψi (t) and ψj (t) are orthogonal to each other when i = j is to go to the
Fourier domain and use Parseval’s relationship. (Another way is to evoke Theorem
5.6 of Chapter 5.) Let G be the space spanned by the orthonormal basis B. It
has dimension n by construction. All signals of G are strictly frequency-limited to
(−W/2, W/2) for W = 1/Ts and time-limited (for some η) to (0, T ) for T = nTs .
For this example W T = n.
example 4.10 If we substitute an orthonormal basis {ψ1 (t), . . . , ψn (t)} with the
related
√ orthonormal basis {ϕ1 (t), . . . , ϕn (t)} obtained via the relationship ϕi (t) =
bψi (bt) for some b ≥ 1, i = 1, . . . , n, then all signals are time-compressed and
frequency-expanded by the same factor b. This example shows that we can trade W
for T without changing the dimensionality of the signal space, provided that W T
is kept constant.
Note that, in this section, n is the dimensionality of the signal space that may
or may not be related to a codeword length (also denoted by n).
The relationship between n and W T establishes a fundamental relationship
between the discrete-time and the continuous-time channel model. It says that
if we are allowed to use a frequency interval of width W Hz during T seconds,
then we can make approximately (asymptotically exactly) up to W T uses of the
equivalent discrete-time channel model. In other words, we get to use the discrete-
time channel at a rate of up to W channel uses per second.
The symmetry of (4.5) implies that time and frequency are on an equal footing
in terms of providing the degrees of freedom exploited by the discrete-time channel.
It is sometimes useful to think of T and W as the width and height of a rectangle
in the time–frequency plane, as shown in Figure 4.6. We associate such a rectangle
with the set of signals that have the corresponding time and frequency limitations.
Like a piece of land, such a rectangle represents a natural resource and what
matters for its exploitation is its area.
The fact that n can grow linearly with W T and not faster is bad news for block-
orthogonal signaling. This means that n cannot grow exponentially in k unless
W T does the same. In a typical system, W is fixed by regulatory constraints
4.6. Bit-by-bit versus block-orthogonal 145
W t
and T grows linearly with k. (T is essentially the time it takes to send k bits.)
Hence W T cannot grow exponentially in k, which means that block-orthogonal
is not scalable. Of the four examples studied in Section 4.4, only bit-by-bit on
a pulse train seems to be a viable candidate for large values of k, provided that
we can make it more robust to additive white Gaussian noise. The purpose of
the next section is to gain valuable insight into what it takes to achieve this
goal.
where Nd is the number of dominant terms, i.e. the number of nearest neigh-
bors to ci , and dm is the minimum distance, i.e. the distance to a nearest
neighbor.
For bit-by-bit on a pulse train, there are k closest neighbors, each neighbor
obtained
√ by changing ci in exactly one component, and each of them is at distance
2 Eb from ci . As k increases, Nd increases and Q( d2σ
m
) stays constant. The increase
of Nd makes Pe (i) increase.
Now consider block-orthogonal signaling. All signals are at the same distance
from each other. Hence there are Nd = 2k −1 nearest neighbors to ci , all at distance
146 4. Signal design trade-offs
√ √
dm = 2E = 2kEb . Hence
dm 1 d2 1 kEb
Q ≤ exp − m2 = exp − 2 ,
2σ 2 8σ 2 4σ
Nd = 2k − 1 = exp(k ln 2) − 1.
We see that the probability kEthat
the noise carries a signal closer to a specific
neighbor decreases as exp − 4σ 2 , whereas the number of nearest neighbors
b
Eb
increases as exp(k ln 2). For 4σ2 > ln 2 the product decreases, otherwise it increases.
In essence, to reduce the error probability we need to increase the minimum
distance. If the number of dimensions remains constant, as in the first two examples
of Section 4.4, the space occupied by the signals becomes crowded, the minimum
distance decreases, and the error probability
√ increases. For block-orthogonal signal-
√ the signal’s norm increases as kEb and, by Pythagoras,
ing, √ the distance is a factor
2 larger than the norm – hence the distance grows as 2kEb . In bit-by-bit on a
pulse train, the minimum distance remains constant. As we will see in Chapter 6,
sophisticated coding techniques in conjunction with a generalized form of bit-by-bit
on a pulse train can reduce the error probability by increasing the distance profile.
4.7 Summary
In this chapter we have introduced new design parameters and performance meas-
ures. The ones we are mostly concerned with are as follows.
• The cardinality m of the message set H. Since in most cases the message consists
of bits, typically we choose m to be a power of 2. Whether m is a power of 2
or not, we say that a message carries k = log2 m bits of information (assuming
that all messages are equiprobable).
• The message error probability Pe and the bit error rate Pb . The former, also
called block error probability, is the error probability we have considered so
far. The latter can be computed, in principle, once we specify the mapping
between the set of k-bit sequences and the set of messages. Until then, the
only statement we can make about Pb is that Pke ≤ Pb ≤ Pe . The right bound
applies with equality if a message error always translates into 1-out-of-k bits
being incorrectly reproduced. The left is an equality if all bits are incorrectly
reproduced each time that there is a message error. Whether we care more
about Pe or Pb depends on the application. If we send a file that contains a
computer program, every single bit of the file has to be received correctly in
order for the transmission to be successful. In this case we clearly want Pe to be
small. However, there are sources that are more tolerant to occasional errors.
This is the case of a digitized voice signal. For voice it is sufficient to have Pb
small. To appreciate the difference between Pe and Pb , consider the hypothetical
situation in which one message corresponds to k = 103 bits and 1 bit of every
message is incorrectly reconstructed. Then the message error probability is 1
(every message is incorrectly reconstructed), whereas the bit-error probability
is 10−3 .
4.7. Summary 147
• The average signal’s energy E and the average energy per bit Eb , where Eb = Ek .
We are typically willing to double the energy to send twice as many bits. In
this case we fix Eb and let E be a function of k.
• The transmission rate Rb = Tk = log2T(m) [bits/second].
• The single-sided bandwidth B and the two-sided bandwidth W . There are
several meaningful criteria to determine the bandwidth.
• Scalability, in the sense that we ought to be able to communicate bit sequences
of any length (provided we let W T scale in a sustainable way).
• The implementation cost and computational complexity. To keep the discussion
as simple as possible, we assume that the cost is determined by the number of
matched filters in the n-tuple former and the complexity is that of the decoder.
Clearly we desire scalability, high transmission rate, little energy spent per
bit, small bandwidth, small error probability (message or bit, depending on the
application), low cost and low complexity. As already mentioned, some of these
goals conflict. For instance, starting from a given codebook we can trade energy
for error probability by scaling down all the codewords by some factor. In so
doing the average energy will decrease and so will the distance between codewords,
which implies that the error probability will increase. Alternatively, once we have
reduced the energy by scaling down the codewords we can add new codewords at
the periphery of the codeword constellation, choosing their location in such a way
that new codewords do not further increase the error probability. We keep doing
this until the average energy has returned to the original value. In so doing we
trade bit rate for error probability. By removing codewords at the periphery of the
codeword constellation we can trade bit rate for energy. All these manipulations
pertain to the encoder. By acting inside the waveform former, we can boost the
at the expense of bandwidth. For instance, we can substitute ψi (t) with
bit rate √
φi (t) = bψi (bt) for some b > 1. This scales the duration of all signals by 1/b with
two consequences. First, the bit rate is multiplied by b. (It takes a fraction b of
time to send the same number of bits.) Second, the signal’s bandwidth expands
by b. (The scaling property of the Fourier transform asserts that the Fourier
transform of ψ(bt) is |b|1
ψF ( fb ).) These examples are meant to show that there
is considerable margin for trading among bit rate, bandwidth, error probability,
and average energy.
We have seen that, rather surprisingly, it is possible to transmit an increasing
number k of bits at a fixed energy per bit Eb and to make the probability that even
a single bit is decoded incorrectly go to zero as k increases. However, the scheme
we used to prove this has the undesirable property of requiring an exponential
growth of the time–bandwidth product. Such a growth would make us quickly
run out of time and/or bandwidth even with moderate values of k. In real-world
applications, we are given a fixed bandwidth and we let the duration grow linearly
with k. It is not a coincidence that most signaling methods in use today can be
seen one way or another as refinements of bit-by-bit on a pulse train. This line of
signaling technique will be pursued in the next two chapters.
Information theory is a field that searches for the ultimate trade-offs, regardless
of the signaling method. A main result from information theory is the famous
148 4. Signal design trade-offs
formula
W 2P
C= log2 1 + (4.6)
2 N0 W
P
= B log2 1 + .
N0 B
It gives a precise value to the ultimate rate C bps at which we can transmit reliably
over a waveform AWGN channel of noise power spectral density N0 /2 watts/Hz
if we are allowed to use signals of power not exceeding P watts and absolute
(single-sided) bandwidth not exceeding B Hz.
This is a good time to clarify our non-standard use of the words coding, encoder,
codeword, and codebook. We have seen that no matter which waveform signals
we use to communicate, we can always break down the sender into a block that
provides an n-tuple and one that maps the n-tuple into the corresponding wave-
form. This view is completely general and serves us well, whether we analyze or
implement a system. Unfortunately there is no standard name for the first block.
Calling it an encoder is a good name, but the reader should be aware that the
current practice is to say that there is coding when the mapping from bits to
codewords is non-trivial, and to say that there is no coding when the map is trivial
as in bit-by-bit on a pulse train. Making such a distinction is not a satisfactory
solution in our view. An example of a non-trivial encoder will be studied in depth
in Chapter 6.
Calling the second block a waveform former is definitely non-standard, but we
find this name to be more appropriate than calling it a modulator, which is the
most common name used for it. The term modulator has been inherited from the
old days of analog communication techniques such as amplitude modulation (AM)
for which it was an appropriate name.
so that for Z ∼ N (0, σ 2 In ) we can write fZ (z) = g(z). Then for any codebook
C = {c0 , . . . , cm−1 }, decoding regions R0 , . . . , Rm−1 , and isometry a : Rn → Rn
we have
(a)
= g(a(y) − a(ci ))dy
y∈Ri
(b)
= g(a(y) − a(ci ))dy
y:a(y)∈a(Ri )
(c)
= g(α − a(ci ))dα
α∈a(Ri )
|hF (0)|2
2 outside I. In other words, outside I the value of |hF (f )| is at least
3-dB smaller than at f = 0.
150 4. Signal design trade-offs
• η-bandwidth For any number η ∈ (0, 1), the η-bandwidth is the smallest
positive number B such that
B ∞
|hF (f )|2 df ≥ (1 − η) |hF (f )|2 df.
−B −∞
The name comes from the fact that if we feed with white noise a filter of
impulse response h(t) and we feed with the same input an ideal lowpass filter
of frequency response |hF (0)|1{f ∈ [−B, B]}, then the output power is the
same in both situations. ∞
• Root-mean-square (RMS) bandwidth This is defined if −∞ |hF (f )|2 df < ∞,
in which case it is
. ∞ /1
−∞
f 2 |hF (f )|2 df 2
B= ∞ .
−∞
|hF (f )|2 df
∞ |hF (f )|
2
To understand this definition, notice that the function g(f ) := |hF (f )|2 df
is
−∞
non-negative, even, and integrates
$ to 1. Hence it is the density of some zero-
mean random variable and B = f 2 g(f )df is the standard deviation of that
random variable.
4.10 Exercises
Exercises for Section 4.3
exercise 4.1 (Signal translation) Consider the signals w0 (t) and w1 (t) shown in
Figure 4.7, used to communicate 1 bit across the AWGN channel of power spectral
density N0 /2.
w0 (t) w1 (t)
1 1
2T
t t
T 2T
−1 −1
Figure 4.7.
4.10. Exercises 151
(a) Determine an orthonormal basis {ψ0 (t), ψ1 (t)} for the space spanned by
{w0 (t), w1 (t)} and find the corresponding codewords c0 and c1 . Work out
two solutions, one obtained via Gram–Schmidt and one in which the second
element of the orthonormal basis is a delayed version of the first. Which of
the two solutions would you choose if you had to implement the system?
(b) Let X be a uniformly distributed binary random variable that takes values
in {0, 1}. We want to communicate the value of X over an additive white
Gaussian noise channel. When X = 0, we send w0 (t), and when X = 1,
we send w1 (t). Draw the block diagram of an ML receiver based on a single
matched filter.
(c) Determine the error probability Pe of your receiver as a function of T and N0 .
(d) Find a suitable waveform v(t), such that the new signals w̃0 (t) = w0 (t) −
v(t) and w̃1 (t) = w1 (t) − v(t) have minimal energy and plot the resulting
waveforms.
(e) What is the name of the kind of signaling scheme that uses w̃0 (t) and w̃1 (t)?
Argue that one obtains this kind of signaling scheme independently of the
initial choice of w0 (t) and w1 (t).
exercise 4.2 (Orthogonal signal sets) Consider a set W = {w0 (t), . . . , wm−1 (t)}
of mutually orthogonal signals with squared norm E each used with equal
probability.
(a) Find the minimum-energy signal set W̃ = {w̃0 (t), . . . , w̃m−1 (t)} obtained by
translating the original set.
(b) Let Ẽ be the average energy of a signal picked at random within W̃. Determine
Ẽ and the energy saving E − Ẽ.
(c) Determine the dimension of the inner product space spanned by W̃.
exercise 4.3 (Suboptimal receiver for orthogonal signaling) This exercise takes
a different approach to the evaluation of the performance of block-orthogonal sig-
naling (Example 4.6). Let the message H ∈ {1, . . . , m} be uniformly distributed
and consider the communication problem described by
H=i: Y = ci + Z, Z ∼ N (0, σ 2 Im ),
where Y = (Y1 , . . . , Ym )T ∈ Rm is the received vector and {c1 , . . . , cm } ⊂ Rm is
the codebook consisting of constant-energy codewords that are orthogonal to each
other. Without loss of essential generality, we can assume
√
ci = Eei ,
where ei is the ith unit vector in Rm , i.e. the vector that contains 1 at position i
and 0 elsewhere, and E is some positive constant.
(a) Describe the statistic of Yj for j = 1, . . . , m given that H = 1.√
(b) Consider a suboptimal receiver that uses a threshold t = α E where 0 <
α < 1. The receiver declares Ĥ = i if i is the only integer such that Yi ≥ t.
152 4. Signal design trade-offs
If there is no such i or there is more than one index i for which Yi ≥ t, the
receiver declares that it cannot decide. This will be viewed as an error. Let
Ei = {Yi ≥ t}, Eic = {Yi < t}, and describe, in words, the meaning of the
event
E1 ∩ E2c ∩ E3c ∩ · · · ∩ Em
c
.
(c) Find an upper bound to the probability that the above event does not occur
when H = 1. Express your result using the Q function.
(d) Now let m = 2k and let E = kEb for some fixed energy per bit Eb . Prove
that the error probability goes to 0 as k → ∞, provided that σEb2 > 2αln2 2 .
(Notice that because we can choose α2 as close to 1 as we wish, if we insert
Eb
σ 2 = N20 , the condition becomes N 0
> ln 2, which is a weaker condition than
the one obtained in Example 4.6.) Hint: Use m − 1 < m = exp(ln m) and
2
Q(x) < 12 exp(− x2 ).
(a) Describe an orthonormal basis for the inner product space W spanned by
wi (t), i = 0, . . . , 3 and plot the signal constellation in Rn , where n is the
dimensionality of W.
(b) Determine an assignment between pairs of bits and waveforms such that the
bit-error probability is minimized and derive an expression for Pb .
(c) Draw a block diagram of the receiver that achieves the above Pb using a single
causal filter.
(d) Determine the energy per bit Eb and the power of the transmitted signal.
4.10. Exercises 153
w0 (t) w2 (t)
1 1
Ts
t t
0 Ts
w1 (t) w3 (t)
Ts
t t
Ts
−1 −1
Figure 4.8.
2E
wi (t) = cos 2π(fc + iΔf )t 1{t ∈ [0, T ]}, i = 0, . . . , m − 1,
T
where E, T , fc , and Δf are fixed parameters, with Δf fc .
(a) Determine the average energy E. (You can assume that fc T is an integer.)
(b) Assuming that fc T is an integer, find the smallest value of Δf that makes
wi (t) orthogonal to wj (t) when i = j.
(c) In practice the signals wi (t), i = 0, 1, . . . , m−1 can be generated by changing
the frequency of a single oscillator. In passing from one frequency to another a
phase shift θ is introduced. Again, assuming that fc T is an integer, determine
the smallest value Δf that ensures orthogonality between cos(2π(fc + iΔf )t +
θi ) and cos(2π(fc + jΔf )t + θj ) whenever i = j regardless of θi and θj .
(d) Sometimes we do not have complete control over fc either, in which case it
is not possible to set fc T to an integer. Argue that if we choose fc T 1 then
for all practical purposes the signals will be orthogonal to one another if the
condition found in part (c) is met.
(e) Give an approximate value for the bandwidth occupied by the signal constel-
lation. How does the W T product behave as a function of k = log2 (m)?
(a) For the set G spanned by the above orthonormal basis, determine the relation-
ship between n and W T .
(b) Compare with Example 4.9 and explain the difference.
exercise 4.9 (Time- and frequency-limited orthonormal sets) Complement
Example 4.9 and Exercise 4.8 with similar examples in which the shifts occur in
the frequency domain. The corresponding time-domain signals can be complex-
valued.
exercise 4.10 (Root-mean-square bandwidth) The root-mean-square band-
width (abbreviated rms bandwidth) of a lowpass signal g(t) of finite-energy is
defined by
. ∞ /1/2
−∞
f 2 |gF (f )|2 df
Brms = ∞ ,
−∞ F
|g (f )|2 df
where |gF (f )|2 is the energy spectral density of the signal. Correspondingly, the
root-mean-square (rms) duration of the signal is defined by
. ∞ /1/2
−∞
t2 |g(t)|2 dt
Trms = ∞ .
−∞
|g(t)|2 dt
(b) In the above inequality insert g1 (t) = tg(t) and g2 (t) = dg(t)
dt and show that
∞ 2 ∞ ∞7 7
d 7 dg(t) 72
∗
t [g(t)g (t)] dt ≤ 4 t |g(t)| dt
2 2 7 7
7 dt 7 dt.
−∞ dt −∞ −∞
the left-hand side by parts and use the fact that |g(t)| → 0 faster
(c) Integrate
than 1/ |t| as |t| → ∞ to obtain
∞ 2 ∞ ∞7 7
7 dg(t) 72
|g(t)| dt ≤ 4
2
t |g(t)| dt
2 2 7 7
7 dt 7 dt.
−∞ −∞ −∞
exercise 4.11 (Real basis for complex space) Let G be a complex inner product
space of finite-energy waveforms with the property that g(t) ∈ G implies g ∗ (t) ∈ G.
(a) Let GR be the subset of G that contains only real-valued waveforms. Argue
that GR is a real inner product space.
(b) Prove that if g(t) = a(t) + jb(t) is in G, then both a(t) and b(t) are in GR .
(c) Prove that if {ψ1 (t), . . . , ψn (t)} is an orthonormal basis for the real inner
product space GR then it is also an orthonormal basis for the complex inner
product space G.
Comment: In this exercise we have shown that we can always find a real-valued
orthonormal basis for an inner product space G such that g(t) ∈ G implies g ∗ (t)
∈ G. An equivalent condition is that if g(t) ∈ G then also the inverse Fourier
∗
transform of gF (−f ) is in G. The set G of complex-valued finite-energy waveforms
that are strictly time-limited to (− T2 , T2 ) and bandlimited to (−B, B) (for any
of the bandwidth definitions given in Appendix 4.9) fulfills the stated conjugacy
condition.
Miscellaneous exercises
fA (a) = (4.7)
0, otherwise.
We assume that, unlike the transmitter, the receiver knows the realization of A.
We also assume that the receiver implements a maximum likelihood decision, and
that the signal’s energy is Eb .
(a) Describe the receiver.
(b) Determine the error probability conditioned on the event A = a.
(c) Determine the unconditional error probability Pf . (The subscript stands for
fading.)
(d) Compare Pf to the error probability Pe achieved by an ML receiver that
observes R(t) = mwi (t) + N (t), where m = E [A]. Comment on the different
behavior of the two error probabilities. For each of them, find the Eb /N0 value
2
necessary to obtain the probability of error 10−5 . (You may use 12 exp(− x2 )
as an approximation of Q(x).)
exercise 4.15 (Non-white Gaussian noise) Consider the following transmit-
ter/receiver design problem for an additive non-white Gaussian noise channel.
(a) Let the hypothesis H be uniformly distributed in H = {0, . . . , m−1} and when
H = i, i ∈ H, let wi (t) be the channel input. The channel output is then
R(t) = wi (t) + N (t),
where N (t) is Gaussian noise of power spectral density G(f ), where we assume
that G(f ) = 0 for all f . Describe a receiver that, based on the channel output
R(t), decides on the value of H with least probability of error. Hint: Find a
way to transform this problem into one that you can solve.
(b) Consider the setting as in part (a) except that now you get to design the
signal set with the restrictions that m = 2 and that the average energy cannot
exceed E. We also assume that G2 (f ) is constant in the interval [a, b], a < b,
where it also achieves its global minimum. Find two signals that achieve the
smallest possible error probability under an ML decoding rule.
4.10. Exercises 157
exercise 4.16 (Continuous-time AWGN capacity) To prove the formula for the
capacity C of the continuous-time AWGN channel of noise power density N0 /2
when signals are power-limited to P and frequency-limited to (− W W
2 , 2 ), we first
derive the capacity Cd for the discrete-time AWGN channel of noise variance σ 2
and symbols constrained to average energy not exceeding Es . The two expressions
are:
1 Es
Cd = log2 1 + 2 [bits per channel use],
2 σ
P
C = (W/2) log2 1 + [bps].
W (N0 /2)
To derive Cd we need tools from information theory. However, going from Cd to
C using the relationship n = W T is straightforward. To do so, let Gη be the set
of all signals that are frequency-limited to (− W W T T
2 , 2 ) and time-limited to (− 2 , 2 )
at level η. We choose η small enough that for all practical purposes all signals of
Gη are strictly frequency-limited to (− W W T T
2 , 2 ) and strictly time-limited to (− 2 , 2 ).
Each waveform in Gη is represented by an n-tuple and as T goes to infinity n
approaches W T . Complete the argument assuming n = W T and without worrying
about convergence issues.
(a) Using the capacity formula, determine the energy per symbol EsC (k) needed
to transmit k bits per channel use. (The superscript C stands for channel
capacity.) At any rate below capacity it is possible to make the error prob-
ability arbitrarily small by increasing the codeword length. This implies that
there is a way to achieve the desired error probability at energy per symbol
EsC (k).
(b) Using single-shot m-PAM, we can achieve an arbitrary small error probability
by making the parameter a sufficiently large. As the size m of the constellation
increases, the edge effects become negligible, and the average error probability
approaches 2Q( σa ), which is the probability of error conditioned on an interior
point being transmitted. Find the numerical value of the parameter a for
2
which 2Q( σa ) = 10−5 . (You may use 12 exp(− x2 ) as an approximation of
Q(x).)
(c) Having fixed the value of a, we can use equation (4.1) to determine the average
energy EsP (k) needed by PAM to send k bits at the desired error probability.
158 4. Signal design trade-offs
(The superscript P stands for PAM.) Find and compare the numerical values
of EsP (k) and EsC (k) for k = 1, 2, 4.
E C (k+1) E P (k+1)
(d) Find limk→∞ sE C (k) and limk→∞ sE P (k) .
s s
(e) Comment on PAM’s efficiency in terms of energy per bit for small and large
values of k. Comment also on the relationship between this exercise and
Example 4.3.
5 Symbol-by-symbol on a pulse
train: Second layer revisited
5.1 Introduction
In this and the following chapter, we focus on the signal design problem. This
chapter is devoted to the waveform former and its receiver-side counterpart, the
n-tuple former. In Chapter 6 we focus on the encoder/decoder pair.1
In principle, the results derived in this chapter can be applied to both baseband
and passband communication. However, for reasons of flexibility, hardware costs,
and robustness, we design the waveform former for baseband communication and
assign to the up-converter, discussed in Chapter 7, the task of converting the
waveform-former output into a signal suitable for passband communication.
Symbol-by-symbol on a pulse train will emerge as a natural signaling technique.
To keep the notation to the minimum, we write
n
w(t) = sj ψ(t − jT ) (5.1)
j=1
n
instead of wi (t) = j=1 ci,j ψ(t − jT ). We drop the message index i from wi (t)
because we will be studying properties of the pulse ψ(t), as well as properties of
the stochastic process that models the transmitter output signal, neither of which
depends on a particular message choice. Following common practice, we refer to
sj as a symbol .
example 5.1 (PAM signaling) PAM signaling (PAM for short) is indeed symbol-
by-symbol on a pulse train, with the symbols taking value in a PAM alphabet as
described in Figure 2.9. It depends on the encoder whether or not all sequences
with symbols taking value in the given PAM alphabet are allowed. As we will see
in Chapter 6, we can decrease the error probability by allowing only a subset of the
sequences.
We have seen the acronym PAM in three contexts that are related but should
not be confused. Let us review them. (i) PAM alphabet as the constellation of
1
The two chapters are essentially independent and could be studied in the reverse order,
but the results of Section 5.3 (which is independent of the other sections) are needed
for a few exercises in Chapter 6. The chosen order is preferable for continuity with the
discussion in Chapter 4.
159
160 5. Second layer revisited
points of Figure 2.9. (ii) Single-shot PAM as in Example 3.8. We have seen that
this signaling method is not appropriate for transmitting many bits. Therefore we
will not discuss it further. (iii) PAM signaling as in Example 5.1. This is symbol-
by-symbol on a pulse train with symbols taking value in a PAM alphabet. Similar
comments apply to QAM and PSK, provided that we view their alphabets as
subsets of C rather than of R2 . The reason it is convenient to do so will become
clear in Chapter 7.
As already mentioned, most modern communication systems rely on PAM,
QAM, or PSK signaling. In this chapter we learn the main tool to design the
pulse ψ(t).
The chapter is organized as follows. In Section 5.2, we develop an instructive
special case where the channel is strictly bandlimited and we rediscover symbol-
by-symbol on a pulse train as a natural signaling technique for that situation.
This also forms the basis for software-defined radio. In Section 5.3 we derive the
expression for the power spectral density of the transmitted signal for an arbitrary
pulse when the symbol sequence constitutes a discrete-time wide-sense stationary
process. As a preview, we discover that when the symbols are uncorrelated, which
is frequently the case, the spectrum is proportional to |ψF (f )|2 . In Section 5.4, we
derive the necessary and sufficient condition on |ψF (f )|2 in order for {ψj (t)}j∈Z
to be an orthonormal set when ψj (t) = ψ(t − jT ). (The condition is that |ψF (f )|2
fulfills the so-called Nyquist criterion.)
w(t) - h(t) - - R(t)
6
N (t)
N0
AWGN, 2
numbers. The idea is to let the encoder produce these numbers and let the wave-
form former do the “interpolation” that converts the samples into the desired w(t).
theorem 5.2 (Sampling theorem) Let w(t) be a continuous L2 function (possi-
bly complex-valued) and let its Fourier transform wF (f ) vanish for f ∈ [−B, B].
Then w(t) can be reconstructed from the sequence of T -spaced samples w(nT ),
n ∈ Z, provided that T ≤ 2B
1
. Specifically,
∞
t
w(t) = w(nT ) sinc −n , (5.2)
n=−∞
T
sin(πt)
where sinc(t) = πt .
sj sj ψ(t − jT ) R(t) Yj
- ψ(t) - h(t) - - ψ ∗ (−t) @ -
6 t = jT
Waveform n-tuple
N (t)
Former Former
√
where sj = w(jT ) T . Hence a signal w(t) that fulfills the conditions of the
sampling theorem is one that lives in the inner product space spanned by {ψ(t −
jT )}∞j=−∞ . When we sample such a signal, we obtain (up to a scaling factor) the
coefficients of its orthonormal expansion with respect to the orthonormal basis
{ψ(t − jT )}∞ j=−∞ .
Now let us go back to our communication problem. We have just seen that any
physical (continuous and L2 ) signal w(t) that has no energy outside the frequency
range [−B, B] can be synthesized as w(t) = j sj ψ(t−jT ). This signal has exactly
the form of symbol-by-symbol on a pulse train.√To implement this signaling method
we let the jth encoder output be sj = w(jT ) T , and let the waveform former be
defined by the pulse ψ(t) = √1T sinc( Tt ). The waveform former, the channel, and
the n-tuple former are shown in Figure 5.2.
It is interesting to observe that we use the sampling theorem somewhat back-
wards, in the following sense. In a typical application of the sampling theorem,
the first step consists of sampling the source signal, then the samples are stored or
transmitted, and finally the original signal is reconstructed from the samples. To
the contrary, in the diagram of Figure 5.2, the transmitter does the (re)construction
as the first step, the (re)constructed signal is transmitted, and finally the receiver
does the sampling.
Notice also that ψ ∗ (−t) = ψ(t) (the sinc function is even and real-valued) and
its Fourier transform is
√
T , |f | ≤ 2T
1
ψF (f ) =
0, otherwise
(Appendix 5.10 explains an effortless method for relating the rectangle and the sinc
as Fourier pairs). Therefore the matched filter at the receiver is a lowpass filter.
It does exactly what seems to be the right thing to do – remove the out-of-band
noise.
example 5.3 (Software-defined radio) The sampling theorem is the theoretical
underpinning of software-defined radio. No matter what the communications stan-
dard is (GSM, CDMA, EDGE, LTE, Bluetooth, 802.11, etc.), the transmitted
signal can be described by a sequence of numbers. In a software-defined-radio
implementation of a transmitter, the encoder that produces the samples is a com-
puter program. Only the program is aware of the standard being implemented.
5.3. Power spectral density 163
The hardware that converts the sequence of numbers into the transmitted signal
(the waveform former of Figure 5.2) can be the same off-the-shelf device for all
standards. Similarly, the receiver front end that converts the received signal into
a sequence of numbers (the n-tuple former of Figure 5.2) can be the same for
all standards. In a software-defined-radio receiver, the decoder is implemented in
software. In principle, any past, present, and future standard can be implemented
by changing the encoder/decoder program. The sampling theorem was brought to
the engineering community by Shannon [24] in 1948, but only recently we have the
technology and the tools needed to make software-defined radio a viable solution.
In particular, computers are becoming fast enough, real-time operating systems
such as RT Linux make it possible to schedule critical events with precision, and
the prototyping is greatly facilitated by the availability of high-level programming-
languages for signal-processing such as MATLAB.
∞
X(t) = Xi ξ(t − iT − Θ), (5.4)
i=−∞
which depends only on i since, by assumption, {Xj }∞ j=−∞ is WSS. We use also
the self-similarity function of the pulse ξ(τ ), defined as
∞
Rξ (τ ) := ξ(α + τ )ξ ∗ (α)dα. (5.5)
−∞
(Think of the definition of an inner product if you tend to forget where to put the
∗
in the above definition.)
The process X(t) is zero-mean. Indeed, using the independence ∞ between
Xi and Θ and the fact that E[Xi ] = 0, we obtain E[X(t)] = i=−∞ E[Xi ]
E[ξ(t − iT − Θ)] = 0.
The autocovariance of X(t) is
&
∗ '
KX (t + τ, t) = E X(t + τ ) − E[X(t + τ )] X(t) − E[X(t)]
& '
= E X(t + τ )X ∗ (t)
. ∞ ∞
/
∗ ∗
=E Xi ξ(t + τ − iT − Θ) Xj ξ (t − jT − Θ)
i=−∞ j=−∞
. ∞ ∞
/
=E Xi Xj∗ ξ(t + τ − iT − Θ)ξ ∗ (t − jT − Θ)
i=−∞ j=−∞
∞
∞
(a)
= E[Xi Xj∗ ]E[ξ(t + τ − iT − Θ)ξ ∗ (t − jT − Θ)]
i=−∞ j=−∞
∞
∞
= KX [i − j]E[ξ(t + τ − iT − Θ)ξ ∗ (t − jT − Θ)]
i=−∞ j=−∞
∞ ∞
(b) 1 T
= KX [k] ξ(t + τ − iT − θ)ξ ∗ (t − iT + kT − θ)dθ
i=−∞
T 0
k=−∞
∞ ∞
(c) 1
= KX [k] ξ(t + τ − θ)ξ ∗ (t + kT − θ)dθ
T −∞
k=−∞
1
= KX [k] Rξ (τ − kT ),
T
k
where in (a) we use the fact that Xi Xj∗ and Θ are independent random variables,
in (b) we make the change of variable k = i − j, and in (c) we use the fact that
for an arbitrary function u : R → R, an arbitrary number a ∈ R, and a positive
(interval length) b,
∞
a+b ∞
u(x + ib)dx = u(x)dx. (5.6)
i=−∞ a −∞
5.3. Power spectral density 165
(If (5.6) is not clear to you, picture integrating from a to a + 2b by integrating first
from a to a + b, then from a + b to a + 2b, and summing the results. This is the
right-hand side. Now consider integrating both times from a to a + b, but before
you perform the second integration you shift the function to the left by b. This is
the left-hand side.)
We see that KX (t + τ, t) depends only on τ . Hence we simplify notation and
write KX (τ ) instead of KX (t + τ, t). We summarize:
1
KX (τ ) = KX [k] Rξ (τ − kT ). (5.7)
T
k
The process X(t) is WSS because neither its mean nor its autocovariance
depend on t.
For the last step in the derivation of the power spectral density, we use the
fact that the Fourier transform of Rξ (τ ) is |ξF (f )|2 . This follows from Parseval’s
relationship,
∞
Rξ (τ ) = ξ(α + τ )ξ ∗ (α)dα
−∞
∞
∗
= ξF (f )ξF (f ) exp(j2πτ f )df
−∞
∞
= |ξF (f )|2 exp(j2πτ f )df.
−∞
Now we can take the Fourier transform of KX (τ ) to obtain the power spectral
density
|ξF (f )|2
SX (f ) = KX [k] exp(−j2πkf T ). (5.8)
T
k
The above expression is in a form that suits us. In many situations, the infinite
sum has only a small number of non-zero terms. Note that the summation in (5.8)
is the discrete-time Fourier transform of {KX [k]}∞k=−∞ , evaluated at f T . This is
the power spectral density of the discrete-time process {Xi }∞i=−∞ . If we think of
|ξF (f )|2
T as being the power spectral density of ξ(t), we can interpret SX (f ) as
being the product of two PSDs, that of ξ(t) and that of {Xi }∞
i=−∞ .
In many cases of interest, KX [k] = E1{k = 0}, where E = E[|Xi |2 ]. In this case
we say that the zero-mean WSS process {Xi }∞i=−∞ is uncorrelated . Then (5.7) and
(5.8) simplify to
Rξ (τ )
KX (τ ) = E , (5.9)
T
|ξF (f )|2
SX (f ) = E . (5.10)
T
example 5.4 Suppose that {Xi }∞ is an independent and uniformly dis-
√
i=−∞
tributed sequence taking values in {± E}, and ξ(t) = 1/T sinc( Tt ). Then
166 5. Second layer revisited
In the next example, we work out a case where KX [k] = E1{k = 0}. In this
case, we say that the zero-mean WSS process {Xi }∞
i=−∞ is correlated.
When we compare this example to Example 5.4, we see that this encoder shapes
the power spectral density from a rectangular shape to a squared sinusoid. Notice
that the spectral density vanishes at f = 0. This is desirable if the channel blocks
very low frequencies, which happens for instance for a cable that contains ampli-
fiers. To avoid amplifying offset voltages and leaky currents, amplifiers are AC
(alternating current) coupled. This means that amplifiers have a highpass filter at
the input, often just a capacitor, that blocks DC (direct current) signals. Notice
that the encoder is a linear time-invariant system (with respect to addition and
multiplication in R). Hence the cascade of the encoder and the pulse forms a
linear time-invariant system. It is immediate to verify that its impulse response is
˜ = ξ(t) − ξ(t − 2T ). Hence in this case we can write
ξ(t)
X(t) = Xl ξ(t − lT ) = ˜ − lT ).
Bl ξ(t
l l
The encoder in Example 5.5 is linear with respect to (the field) R and this is
the reason its effect can be incorporated into the pulse but this is not the case in
general (see Exercise 5.14 and see Chapter 6).
5.4. Nyquist criterion for orthonormal bases 167
The form of the left-hand side suggests using Parseval’s relationship. Doing so
yields
∞ ∞
1{n = 0} = ψ(t − nT )ψ ∗ (t)dt = ψF (f )ψF ∗
(f )e−j2πnT f df
−∞ −∞
∞
7 7
= 7ψF (f )72 e−j2πnT f df
−∞
7 7
1
where in (a) we use again (5.6) (but in the other direction), in (b) we use the fact
that e−j2πnT (f − T ) = e−j2πnT f , and in (c) we introduce the function
k
77 72
k 77
7
g(f ) = 7ψF f − T 7 .
k∈Z
Notice that g(f ) is a periodic function of period 1/T and the right-hand side of (c)
is 1/T times the nth Fourier series coefficient An of the periodic function g(f ). (A
review of Fourier series is given in Appendix 5.11.) Because A0 = T and Ak = 0 for
k = 0, the Fourier series of g(f ) is the constant T . Up to a technicality discussed
below, this proves the following result.
∞
k 2
l. i. m. |ψF (f − )| = T, f ∈ R. (5.13)
T
k=−∞
set ψ̃F (0) = 0. The inverse Fourier transform of ψ̃F (f ) is still ψ(t). Hence ψ(t) is
orthogonal to its T -spaced time translates. Yet (5.13) is no longer fulfilled if we
omit the l. i. m. For our specific example, the left and the right differ at exactly
one point of each period. Equality still holds in the l. i. m. sense. In all practical
applications, ψF (f ) is a smooth function and we can ignore the l. i. m. in (5.13).
Notice that the left side of the equality in (5.13) is periodic with period 1/T .
Hence to verify that |ψF (f )|2 fulfills Nyquist’s criterion with parameter T , it is
sufficient to verify that (5.13) holds over an interval of length 1/T .
example 5.7 The following functions satisfy Nyquist’s criterion with param-
eter T .
(a) (Constant but not T ) ψ(t) is orthogonal to its T -spaced time translates even
when the left-hand side of (5.13) is L2 equivalent to a constant other than T ,
but in this case ψ(t)2 = 1. This is a minor issue, we just have to scale the
pulse to make it unit-norm.
(b) (Minimum bandwidth) A function |ψF (f )|2 cannot fulfill Nyquist’s criterion
with parameter T if its support is contained in an interval of the form [−B, B]
1
with 0 < B < 2T . Hence, the minimum bandwidth to fulfill Nyquist’s
1
criterion is 2T .
(c) (Test for bandwidths between 2T 1
and T1 ) If |ψF (f )|2 vanishes outside [− T1 , T1 ],
the Nyquist criterion is satisfied if and only if |ψF ( 2T 1
−)|2 +|ψF (− 2T
1
−)|2 =
T for ∈ [− 2T
1 1
, 2T ] (see Figure 5.3). If, in addition, ψ(t) is real-valued, which
is typically the case, then |ψF (−f )|2 = |ψF (f )|2 . In this case, it is sufficient
that we check the positive frequencies, i.e. Nyquist’s criterion is met if
5.4. Nyquist criterion for orthonormal bases 169
7 72 7 72
7 7 7 7
7ψF ( 1 − )7 + 7ψF ( 1 + )7 = T, ∈ 0,
1
.
7 2T 7 7 2T 7 2T
This means that |ψF ( 2T1
)|2 = T2 and the amount by which the function
|ψF (f )| varies when we go from f = 2T
2 1 1
to f = 2T − is compensated by
1 1
the function’s variation in going from f = 2T to f = 2T + . For examples
of such a band-edge symmetry see Figure 5.3, Figure 5.6a, and the functions
1
(b) and (c) in Example 5.7. The bandwidth BN = 2T is sometimes called the
Nyquist bandwidth.
6 |ψF ( 2T
1
− )|2 + |ψF (− 2T
1
− )|2 = T
?
T
|ψF (f )|2 HH HH |ψF (f − 1
)|2
H
T
H
HH HH
H H - f
1 1
2T T
Figure 5.3. Band-edge symmetry for a pulse |ψF (f )|2 that vanishes outside
[− T1 , T1 ] and fulfills Nyquist’s criterion.
(d) (Test for arbitrary finite bandwidths) When the support of |ψF (f )|2 is wider
than T1 , it is harder to see whether or not Nyquist’s criterion is met with
parameter T . A convenient way to organize the test goes as follows. Let I
be the set of integers i for which the frequency interval of width T1 centered
at fi = 2T1
+ i T1 intersects with the support of |ψF (f )|2 . For the example of
Figure 5.4, I = {−3, −2, 1, 2}, and the frequencies fi , i ∈ I are marked with
a “×”. For each i ∈ I, we consider the function |ψF (fi + )|2 , ∈ [− 2T 1 1
, 2T ],
as shown in Figure 5.5. Nyquist’s criterion is met if and only if the sum of
these functions,
1 1
g() = |ψF (fi + )|2 , ∈ − , ,
2T 2T
i∈I
|ψF (f )|2
T
3
× × × × f
1
2T
− 3
T
1
2T
− 2
T
− 2T
1 0 1
2T
1
2T
+ 1
T
1
2T
+ 2
T
− 2T
1 0 1
2T
− 2T
1 0 1
2T
(a) i = −3 (b) i = −2
|ψF (fi + )| 2
|ψF (fi + )|2
T T
3 3
− 2T
1 0 1
2T
− 2T
1 0 1
2T
(c) i = 1 (d) i = 2
2
A root-raised-cosine impulse response should not be confused with a root-raised-cosine
function.
5.5. Root-raised-cosine family 171
|ψF (f )|2
f
1
2T
(a)
ψ(t)
1
T
t
T
(b)
0.4
0.4
0.2
0.2
0 0
−0.2
−0.2
−0.4
−0.4
−40 −20 0 10 20 30 40 50 60 70 80 −40 −20 0 10 20 30 40 50 60 70 80
(a) (b)
Figure 5.8 shows the matched filter outputs obtained when the functions of
Figure 5.7 are applied at the filter’s input. Specifically, Figure 5.8a shows the
train of symbol-scaled self-similarity functions. From the figure we see that (5.15)
is satisfied. (When a pulse achieves its maximum, which has value 1, the other
pulses vanish.) We see it also from Figure 5.8b, in that the signal y(t) takes values
in the symbol alphabet {±1} at the sampling times t = 0, 10, 20, 30.
If ψ(t) is not orthogonal to ψ(t−iT ), which can happen for instance if a truncated
pulse is made too short, then Rψ (iT ) will be non-zero for several integers i. If we
define li = Rψ (iT ), then we can write
5.6. Eye diagrams 173
1.5
1
1
0.5
0.5
0
0
−0.5
−0.5
−1 −1
(a) (b)
y(jT ) = si lj−i .
i
The fact that the noiseless y(jT ) depends on multiple symbols is referred to as
inter-symbol interference (ISI).
There are two main causes to ISI. We have already mentioned one, specifically
when Rψ (τ ) is non-zero for more than one sample. ISI happens also if the matched-
filter
output is not sampled
at the correct times. In this
case, we obtain y(jT +Δ) =
i si Rψ (j − i)T + Δ , which is again of the form i si lj−i for li = Rψ (iT + Δ).
The eye diagram is a technique that allows us to visualize if there is ISI and to
see how critical it is that the sampling time be precise. The eye diagramis obtained
from the matched-filter output before sampling. Let y(t) = i si Rψ t − iT ) be
the noiseless matched filter output, with symbols taking value in some discrete set
S. For the example that follows, S = {±1}. To obtain the eye diagram, we plot
the superposition of traces of the form y(t − iT ), t ∈ [−T, T ], for various integers
i. Figure 5.9 gives examples of eye diagrams for various roll-off factors and pulse
truncation lengths. Parts (a), (c), and (d) show no sign of ISI. Indeed, all traces
go through ±1 at t = 0, which implies that y(iT ) ∈ S. We see that truncating
the pulse to length 20T does not lead to ISI for either roll-off factor. However,
ISI is present when β = 0.25 and the pulse is truncated to 4T (part (b)). We
see its presence from the fact that the traces go through various values at t = 0.
This means that y(iT ) takes on values outside S. These examples are meant to
illustrate the point made in the last paragraph of the previous section.
Note also that the eye, the untraced space in the middle of the eye diagram, is
wider in (c) than it is in (a). The advantage of a wider eye is that the system is more
tolerant to small variations (jitter) in the sampling time. This is characteristic of
a larger β and it is a consequence of the fact that as β increases, the pulse decays
faster as a function of |t|. For the same reason, a pulse with larger beta can be
truncated to a shorter length, at the price of a larger bandwidth.
174 5. Second layer revisited
2
2
1 1
0 0
−1 −1
−2
−2
−10 −5 0 5 10 −10 −5 0 5 10
(a) (b)
1 1
0.5 0.5
0 0
−0.5 −0.5
−1 −1
−10 −5 0 5 10 −10 −5 0 5 10
(c) (d)
Figure 5.9. Eye diagrams of i si Rψ t − iT ) for si ∈ {±1} and pulse of the
root-raised-cosine family with T = 10. The abscissa is the time. The roll-off
factor is β = 0.25 for the top figures and β = 0.9 for the bottom ones. The
pulse is truncated to length 20T for the figures on the left, and to length 4T
for those on the right. The eye diagram of part (b) shows the presence of ISI.
The MATLAB program used to produce the above plots can be downloaded from
the book web page.
The popularity of the eye diagram lies in the fact that it can be obtained quite
easily by looking at the matched-filter output with an oscilloscope triggered by
the clock that produces the sampling time. The eye diagram is very informative
even if the channel has attenuated the signal and/or has added noise.
Suppose that the transmission starts with a training signal s(t) known to the
receiver. For the moment, we assume that s(t) is real-valued. Extension to a
complex-valued s(t) is done in Section 7.5.
The channel output signal is
where N (t) is white Gaussian noise of power spectral density N20 , α is an unknown
scaling factor (channel attenuation and receiver front-end amplification), and θ is
the unknown parameter to be estimated. We can assume that the receiver knows
that θ is in some interval, say [0, θmax ], for some possibly large constant θmax .
To describe the ML estimate of θ, we need a statistical description of the received
signal as a function of θ and α. Towards this goal, suppose that we have an
orthonormal basis φ1 (t), φ2 (t), . . . , φn (t) that spans the set {s(t−θ̂) : θ̂ ∈ [0, θmax ]}.
To simplify the notation, we assume that the orthonormal basis is finite, but an
infinite basis is also a possibility. For instance, if s(t) is continuous, has finite
duration, and is essentially bandlimited, then the sampling theorem tells us that
we can use sinc functions for such a basis.
For i = 1, . . . , n, let Yi = R(t), φi (t) and let yi be the observed sample value of
T
Yi . The random vector Y2
= (Y1 , . . . , Yn ) consists of independent random variables
with Yi ∼ N αmi (θ), σ , where mi (θ) = s(t − θ), φi (t) and σ 2 = N20 . Hence,
the density of Y parameterized by θ and α is
n 2
1 − i=1 (yi −αmi (θ))
f (y; θ, α) = n e 2σ 2 . (5.17)
(2πσ 2 ) 2
176 5. Second layer revisited
The shape of the training signal did not matter for the derivation of the ML
estimate, but it does matter for the delay locked loop approach. We assume that
the training signal takes the same form as the communication signal, namely
5.7. Symbol synchronization 177
θ + lT
t
L−1
s(t) = cl ψ(t − lT ),
l=0
where c0 , . . . , cL−1 are training symbols known to the receiver. The easiest way to
see how the delay locked loop works is to assume that ψ(t) is a rectangular pulse
1
ψ(t) = 1 {0 ≤ t ≤ T }
T
√
and let the training symbol sequence c0 , . . . , cL−1 be an alternating sequence of Es
√ L−1
and − Es . The corresponding received signal R(t) is α l=0 cl ψ(t − lT − θ) plus
white Gaussian noise, where α is the unknown scaling factor. If we neglect the noise
for the moment, the matched filter output (before sampling) is the convolution of
R(t) with ψ ∗ (−t), which can be written as
L−1
M (t) = α cl Rψ (t − lT − θ),
l=0
where
|τ |
Rψ (τ ) = 1− 1{−T ≤ τ ≤ T }
T
is the self-similarity function of ψ(t). Figure 5.10 plots a piece of M (t).
The desired sampling times of the form θ + lT , l integer, correspond to the
maxima and minima of M (t). Let tk be the kth sampling point. Until symbol
synchronization is achieved, M (tk ) is not necessarily near a maximum or a
minimum of M (t). For every sample point tk , we also collect an early sample
at tEk = tk − Δ and a late sample at tk = tk + Δ, where Δ is some small
L
positive value (smaller than T /2). The dots in Figure 5.11 are examples of sample
values. Consider the cases when M (tk ) is positive (parts (a) and (b)). We see that
M (tLk ) − M (tk ) is negative when tk is late with respect to the target, and it is
E
positive when tk is early. The opposite is true
when M (tk ) is negative (parts (c)
and (d)). Hence, in general, M (tL k ) − M (tk ) M (tk ) can be used as a feedback
E
signal to the clock that determines the sampling times. A positive feedback signal
is a sign for the clock to speed up, and a negative value is a sign to slow down. This
can be implemented via a voltage-controlled oscillator (VCO), with the feedback
signal as the controlling voltage.
Now consider the effect of noise. The noise added to M (t) is zero-mean. Intu-
itively, if the VCO does not react too quickly to the feedback signal, or equivalently
if the feedback signal is lowpass filtered, then we expect the sampling point to settle
178 5. Second layer revisited
t t
tk tk
(a) (b)
tk tk
t t
(c) (d)
Figure 5.11. DLL sampling points. The three consecutive dots of each part
are examples of M (tE L
k ), M (tk ), and M (tk ), respectively.
at the correct position even when the feedback signal is noisy. A rigorous analysis
is possible but it is outside the scope of this text. For a more detailed introduction
on synchronization we recommend [14, Chapters 14–16], [15, Chapter 4], and the
references therein.
Notice the similarity between the ML and the DLL solution. Ultimately they
both make use of the fact that a self-similarity function achieves its maximum at
the origin. It is also useful to think of a correlation such as
r(t)s(t − θ̂)dt
r(t)s(t)
as a measure for the degree of similarity between two functions, where the denom-
inator serves to make the result invariant to a scaling of either function. In this
case the two functions are r(t) = αs(t − θ) + n(t) and s(t − θ̂) and the maximum
is achieved when s(t − θ̂) lines up with s(t − θ). The solution to the ML approach
correlates with the entire training signal s(t), whereas the DLL correlates with
the pulse ψ(t); but it does so repeatedly and averages the results. The DLL is
designed to work with a VCO, and together they provide a complete and easy-to-
implement solution that tracks the sampling times.3 It is easy to see that the DLL
provides valuable feedback even after the transition from the training symbols
to the regular symbols, provided that the symbols change polarity sufficiently
often. To implement the ML approach, we still need a good way to find the
maximum of (5.18) and to offset the clock accordingly. This can easily be done
if the receiver is implemented in a digital signal processor (DSP) but could be
costly in terms of additional hardware if the receiver is implemented with analog
technology.
3
The DLL can be interpreted as a stochastic gradient descent method that seeks the ML
estimate of θ.
5.8. Summary 179
5.8 Summary
The signal design consists of choosing the finite-energy signals w0 (t), . . . , wm−1 (t)
that represent the messages. Rather than choosing the signal set directly, we choose
an orthonormal basis {ψ1 (t), . . . , ψn (t)} and a codebook {c0 , . . . , cm−1 } ⊂ Rn and,
for i = 0, . . . , m − 1, we define
n
wi (t) = ci,j ψj (t).
j=1
In doing so, we separate the signal design problem into two subproblems: finding
an appropriate orthonormal basis and codebook. In this chapter, we have focused
on the choice of the orthonormal basis. The choice of the codebook will be the
topic of the next chapter.
Particularly interesting are those orthonormal bases that consist of an appropri-
ately chosen unit-norm function ψ(t) and its T -spaced translates. Then the generic
form of a signal is
n
w(t) = sj ψ(t − jT ).
j=1
In this case, the n inner products performed by the n-tuple former can be obtained
by means of a single matched filter with the output sampled at n time instants.
We call sj the jth symbol. In theory, a symbol can take values in R or in C; in
practice, the symbol alphabet is some discrete subset S of R or of C. For instance,
PAM symbols are in R, and QAM or PSK symbols, viewed as complex-valued
numbers, are standard examples of symbol in C (see Chapter 7).
If the symbol sequence is a realization of an uncorrelated WSS process, then the
power spectral density of the transmitted signal is SX (f ) = E|ψF (f )|2 /T , where E
is the average energy per symbol. The pulse ψ(t) has a unit norm and is orthogonal
to its T -spaced translates if and only if |ψF (f )|2 fulfills Nyquist’s criterion with
parameter T (Theorem 5.6).
Typically ψ(t) is real-valued, in which case |ψF (f )|2 is an even function. To save
bandwidth, we often choose |ψF (f )|2 in such a way that it vanishes for f ∈ [− T1 , T1 ].
When these two conditions are satisfied, Nyquist’s criterion is fulfilled if and only
if |ψF (f )|2 has the so-called band-edge symmetry.
It is instructive to compare the sampling
theorem to the Nyquist criterion.
Both are meant for signals of the form i si ψ(t − iT ), where ψ(t) is orthogonal
to its T -spaced time translates. Without loss of generality, we can assume that
ψ(t) has a unit norm. In this case, |ψF (f )|2 fulfills the Nyquist criterion with
parameter T . Typically we use the Nyquist criterion to select a pulse that leads
to symbol-by-symbol on a pulse train of a desired√ power spectral density. In the
sampling theorem, we choose ψ(t) = sinc( Tt )/ T because the orthonormal basis
√
B = {sinc[(t − iT )/T ]/ T }i∈Z spans the inner product space of continuous L2
1 1
functions that have a vanishing Fourier transform outside [− 2T , 2T ]. Any signal
in this space can be represented by the coefficients of its orthonormal expansion
with respect to B.
180 5. Second layer revisited
sj - - Yj = s j + Z j
6
Zj ∼ N (0, N0
2
)
Figure 5.12. Equivalent discrete-time channel used at the rate of one channel
use every T seconds. The noise is iid.
The eye diagram is a field test that can easily be performed to verify that the
matched filter output at the sampling times is as expected. It is also a valuable
tool for designing the pulse ψ(t).
Because the matched filter output is sampled at a rate of one sample every T
seconds, we say that the discrete-time AWGN channel seen by the top layer is used
at a rate of one symbol every T seconds. This channel is depicted in Figure 5.12.
Questions that pertain to the encoder/decoder pair (such as the number of bits
per symbol, the average energy per symbol, and the error probability) can be
answered assuming that the top layer communicates via the discrete-time channel.
Questions that pertain to the signal’s time/frequency characteristics need to take
the waveform former into consideration. Essentially, the top two layers can be
designed independently.
example 5.8 The Dirichlet function f : [0, 1] → {0, 1} is 0 where its argument
is irrational and 1 otherwise. Its Lebesgue integral is well defined and equal to 0.
But to see why, we need to introduce the notion of measure. We will do so shortly.
The Riemann integral of the Dirichlet function is undefined. The problem is that
in each interval of the abscissa, no matter how small, we can find both rational and
irrational numbers. So to each approximating rectangle, we can assign height 0 or
1. If we assign height 0 to all approximating rectangles, the integral is approximated
by 0; if we assign height 1 to all rectangles, the integral is approximated by 1. And
we can choose, no matter how narrow we make the vertical slices. Clearly, we can
create approximation sequences that do not converge. This could not happen with
a continuous function or a function that has a finite number of discontinuities, but
the Dirichlet function is discontinuous everywhere.
The Dirichlet is such a function. When we apply the Lebesgue construction to the
Dirichlet function, we effectively partition the domain of the function, namely the
unit interval [0, 1], into two subsets, say the set A that contains the rationals and
the set B that contains the irrationals. If we could assign a “total length” L(A)
and L(B) to these sets as we do for countable unions of disjoint intervals of the
real line, then we could say that the Lebesgue integral of the Dirichlet function
is 1 × L(A) + 0 × L(B) = L(A). The Lebesgue measure does precisely this. The
Lebesgue measure of A is 0 and that of B is 1. This is not surprising, given that
[0, 1] contains a countable number of rationals and an uncountable number of
irrationals. Hence, the Lebesgue integral of the Dirichlet function is 0.
Note that it is not possible to assign the equivalent of a “length” to every subset
of R. Every attempt to do so leads to contradictions, whereby the measure of the
union of certain disjoint sets is not the sum of the individual measures. The subsets
of R to which we can assign the Lebesgue measure are called Lebesgue measurable
sets. It is hard to come up with non-measurable sets; for an example see the end of
Appendix 4.9 of [2]. A function is said to be Lebesgue measurable if the Lebesgue’s
construction partitions the abscissa into Lebesgue measurable sets.
We would not mention the Lebesgue integral if it were just to integrate bizarre
functions. The real power of Lebesgue integration theory comes from a number
of theorems that give precise conditions under which a limit and an integral
can be exchanged. Such operations come up frequently in the study of Fourier
transforms and Fourier series. The reader should not be alarmed at this point.
We will interchange integrals, swap integrals and sums, swap limits and integrals,
all without a fuss: but we do this because we know that the Lebesgue integration
theory allows us to do so, in the cases of interest to us.
In introducing the Lebesgue integral, we have assumed that the function being
integrated is non-negative. The same idea applies to non-positive functions. (In
fact, we can integrate the negative of the function and then take the negative of
the result.) If the function takes on positive as well as negative values, we split
the function into its positive and negative parts, we integrate each part separately,
and we add the two results. This works as long as the two intermediate results
are not +∞ and −∞, respectively. If they are, the Lebesgue integral is undefined;
otherwise the Lebesgue integral is defined.
A complex-valued function g : R → C is integrated by separately integrating its
real and imaginary parts. If the Lebesgue integral over both parts is defined and
finite, the function is said to be Lebesgue integrable. The set of Lebesgue-integrable
functions is denoted by L1 . The notation L1 comes from the easy-to-verify fact
that for a Lebesgue measurable function g, the Lebesgue integral is finite if and
∞
only if −∞ |g(x)|dx < ∞. This integral is the L1 norm of g(x).
Every bounded Riemann-integrable function of bounded support is Lebesgue
integrable and the values of the two integrals agree. This statement does not extend
to functions that are defined over the real line. To see why, consider integrating
sinc(t) over the real line. The Lebesgue integral of sinc(t) is not defined, because
the integral of the positive part of sinc(t) and that over the negative part are
+∞ and −∞, respectively. The Riemann integral of sinc(t) exists because, by
definition, Riemann integrates from −T to T and then lets T go to infinity.
5.9. Appendix: L2 , and Lebesgue integral: A primer 183
All functions that model physical processes are finite-energy and measurable,
hence L2 functions. All finite-energy functions that we encounter in this text are
measurable, hence L2 functions. There are examples of finite-energy functions that
are not measurable, but it is hard to imagine an engineering problem where such
functions would arise.
The set of L2 functions forms a complex vector space with the zero vector being
the all-zero function. If we modify an L2 function in a countable number of points,
the result is also an L2 function and the (Lebesgue) integral over the two functions
is the same. (More generally, the same is true if we modify the function over a set
of measure zero.) The difference between the two functions is an L2 function ξ(t)
such that the Lebesgue integral |ξ(t)|2 dt = 0. Two L2 functions that have this
property are said to be L2 equivalent.
Unfortunately, L2 with the (standard) inner product that maps a(t), b(t) ∈ L2 to
a, b = a(t)b∗ (t)dt (5.19)
does not form an inner product space because axiom (c) of Definition 2.36 is not
fulfilled. In fact, a, a = 0 implies that a(t) is L2 equivalent to the zero function,
but this is not enough: to satisfy the axiom, a(t) must be the all-zero function.
There are two obvious ways around this problem if we want to treat finite-
energy functions as vectors of an inner product space. One way is to consider
only subspaces
V of L2 such that there is only one vector ξ(t) in V that has the
property |ξ(t)|2 dt = 0. This will always be the case when V is spanned by a set
W of waveforms that represent electrical signals.
example 5.9 The set V that consists of the continuous functions of L2 is a
vector space. Continuity is sufficient
to ensure that there is only one function
ξ(t) ∈ V for which the integral |ξ(t)|2 dt = 0, namely the zero function. Hence V
equipped with the standard inner product is an inner product space.4
Another way is to form equivalence classes. Two signals that are L2 equivalent
cannot be distinguished by means of a physical experiment. Hence the idea of
partitioning L2 into equivalence classes, with the property that two functions
are in the same equivalence class if and only if they are L2 equivalent. With an
appropriately defined vector addition and multiplication of a vector by a scalar,
the set of equivalence classes forms a complex vector space. We can use (5.19)
to define an inner product over this vector space. The inner product between two
equivalence classes is the result of applying (5.19) with an element of the first class
and an element of the second class. The result does not depend on which element
of a class we choose to perform the calculation. This way L2 can be transformed
into an inner product space denoted by L2 . As a “sanity check”, suppose that we
want to compute the inner product of a vector with itself. Let a(t) be an arbitrary
element of the corresponding class. If a, a = 0 then a(t) is in the equivalence class
that contains all the functions that have 0 norm. This class is the zero vector.
4
However, one can construct a sequence of continuous functions that converges to a
discontinuous function. In technical terms, this inner product space is not complete.
184 5. Second layer revisited
f (t) defined over a finite-length interval is also an L1 function, i.e. |f (t)|dt is
finite, which the ∞ − ∞ problem mentioned in Appendix 5.9. Then also
excludes
|f (t)ej2παt |dt = |f (t)|dt is finite. Hence (5.21) is finite.
If the L2 function is defined over R, then the ∞ − ∞ problem mentioned in
Appendix 5.9 can arise. This is the case with the sinc(t) function. In such cases,
we truncate the function to the interval [−T, T ], compute its Fourier transform,
and let T go to infinity. It is in this sense that the Fourier transform of sinc(t)
is defined. An important result of Fourier analysis says that we are allowed to do
so (see Plancherel’s theorem, [2, Section 4.5.2]). Fortunately we rarely have to do
this, because the Fourier transform of most functions of interest to us is tabulated.
Thanks to Plancherel’s theorem, we can make the sweeping statement that the
Fourier transform is defined for all L2 functions. The transformed function is in
L2 , hence also its inverse is defined. Be aware though, when we compute the
transform and then the inverse of the transformed, we do not necessarily obtain
the original function. However, what we obtain is L2 equivalent to the original. As
already mentioned, no physical experiment will ever be able to detect the difference
between the original and the modified function.
example 5.10 The function g(t) that has value 1 at t = 0 and 0 everywhere
else is an L2 function. Its Fourier transform gF (f ) is 0 everywhere. The inverse
Fourier transform of gF (f ) is also 0 everywhere.
One way to remember whether (5.20) or (5.21) has the minus sign in the
exponent is to think of the Fourier transform as a tool that allows us to write
a function g(u) as a linear combination of complex exponentials. Hence we are
writing g(u) = gF (α)φ(α, u)dα with φ(α, u) = exp(j2πuα) viewed as a function
of u with parameter α. Technically this is not an orthonormal expansion, but
it looks like one, where gF (α) is the coefficient of the function φ(α, u). Like for
an orthonormal expansion, the coefficient is obtained from an expression that
takes the form gF (u) = g(α), φ(α, u) = g(α)φ∗ (α, u)dα. It is the complex
conjugate in the computation of the inner product that brings in the minus sign in
the exponent. We emphasize that we are working by analogy here. The complex
exponential has infinite energy – hence not a unit-norm function (at least not with
respect to the standard inner product).
A useful formula is Parseval’s relationship
a(t)b∗ (t)dt = aF (f )b∗F (f )df, (Parseval) (5.22)
transform. These properties follow directly from the definition of Fourier transform,
namely
∞
g(0) = gF (α)dα (5.23)
−∞
∞
gF (0) = g(α)dα. (5.24)
−∞
Everyone knows how to compute the area of a rectangle. But how do we compute
the area under a sinc? Here is where the second and not-so-well-known trick comes
in handy. The area under a sinc is the area of the triangle inscribed in its main
lobe. Hence the integral under the two curves of Figure 5.14 is identical and equal
to ab, and this is true for all positive values a, b.
Let us consider a specific example of how to use the above two tricks. It does
not matter if we start from a rectangle or from a sinc and whether we want to
find its Fourier transform or its inverse Fourier transform. Let a, b, c, and d be as
shown in Figure 5.15.
We want to relate a, b to c, d (or vice versa). Since b must equal the area under
the sinc and d the area under the rectangle, we have
b = cd
d = 2ab,
b sinc( xa )
b b
x x
a a
∞
Figure 5.14. −∞
b sinc( xa )dx equals the area under the triangle on the right.
b1{−a ≤ x ≤ a} d sinc( xc )
d
b
x x
a c
F
e−a|t| ⇐⇒ 2a
a2 +(2πf )2
−at F
e , t≥0 ⇐⇒ 1
a+j2πf
−πt2 F −πf 2
e ⇐⇒ e
F
1{−a ≤ t ≤ a} ⇐⇒ 2a sinc(2af )
See Exercise 5.16 for a list of useful Fourier transform relations and see Table
5.1 for the Fourier transform of a few L2 functions.
f (x) = Ai ej2πxi/p (5.25)
i∈Z
p p √ ej2πxi/p p p
f (x)1 − ≤ x ≤ = pAi √ 1 − ≤x≤ ,
2 2 p 2 2
i∈Z
188 5. Second layer revisited
√
ej2πxi/p
where we have multiplied and divided by p to make φi (x) = √
p 1 − p
≤
2
Notice that the right-hand side of (5.25) is periodic, and that of (5.26) has finite
support. In fact we can use the Fourier series for both kind of functions, periodic
and finite-support. That Fourier series can be used for finite-support functions is
an obvious consequence of the fact that a finite-support function can be seen as
one period of a periodic function. In communication, we are more interested in
functions that have finite support. (Once we have seen one period of a periodic
function, we have seen them all: from that point on there is no information being
conveyed.)
In terms of mathematical rigor, there are two things that can go wrong in what
we have said. (1) For some i, the integral in (5.27) might not be defined or might
not be finite. We can show that neither is the case if f (x)1 − p2 ≤ x ≤ p2 is
l √
an L2 function. (2) for a specific x, the truncated series i=−l pAi φi (x) might
not converge
as l goes
to infinity or might converge to a value that differs from
f (x)1 − p2 ≤ x ≤ p2 . It is not hard to show that the norm of the function
p p √
l
f (x)1 − ≤ x ≤ − pAi φi (x)
2 2
i=−l
goes to zero as l goes to infinity. Hence the two functions are L2 equivalent. We
write this as follows
p p √
f (x)1 − ≤ x ≤ = l. i. m. pAi φi (x),
2 2
i∈Z
We summarize with the following rigorous statement. The details that we have
omitted in our “proof” can be found in [2, Section 4.4].
5.12. Appendix: Proof of the sampling theorem 189
Ak
t
w(t) = Ak 2b sinc (2bt + k) = sinc +k ,
T T
k k
1
where T = 2b .
We still need to determine ATk . It is straightforward to determine Ak from the
definition of the Fourier series, but it is even easier to plug t = nT in both sides
of the above expression to obtain w(nT ) = AT−n . This completes the proof. To see
that we can easily obtain Ak from the definition (5.27), we write
b ∞
1
Ak = wF (f )e−jπkf /b df = T wF (f )e−j2πT kf df = T w(−kT ),
2b −b −∞
where the first equality is the definition of the Fourier coefficient Ak , the second
uses the fact that wF (f ) = 0 for f ∈
/ [−b, b], and the third is the inverse Fourier
transform evaluated at t = −kT .
190 5. Second layer revisited
is defined for all (x1 , . . . , xk ) ∈ Rk . In words, the statistic is defined for every finite
collection Xt1 , Xt2 , . . . , Xtk of samples of {Xt : t ∈ R}.
The mean mX (t), the autocorrelation RX (s, t), and the autocovariance KX (s, t)
of a continuous-time stochastic process {Xt : t ∈ R} are, respectively,
mX (t) := E [Xt ] (5.28)
RX (s, t) := E [Xs Xt∗ ] (5.29)
KX (s, t) := E [(Xs − E [Xs ])(Xt − E [Xt ])∗ ] = RX (s, t) − mX (s)m∗X (t), (5.30)
where the “∗ ” denotes complex conjugation and can be omitted for real-valued
stochastic processes.5 For a zero-mean process, which is usually the case in our
applications, KX (s, t) = RX (s, t).
5
To remember that the “∗ ” in (5.29) goes on the second random variable, it helps to
observe the similarity between the definition of RX (s, t) and that of an inner product
such as a(t), b(t) = a(t)b∗ (t)dt.
5.13. Appendix: A review of stochastic processes 191
This proves that KY (τ ) is the inverse Fourier transform of |hF (f )|2 SX (f ). Hence,
SY (f ) = |hF (f )|2 SX (f ).
√ πT β β β
cF (f ) = T cos f+ 1 f∈ − , .
2β 2T 2T 2T
After some manipulations of ψ(t) = a(t) + b(t) we obtain the desired expression
& ' (1−β)π & '
4β cos (1 + β)π T + 4β sinc (1 − β) T
t t
ψ(t) = √
2 .
π T 1 − 4β Tt
The picket fence miracle refers to the fact that the Fourier transform of a picket
fence is again a (scaled) picket fence. Specifically,
. ∞ /
1 n
∞
F δ(t − nT ) = δ f− ,
n=−∞
T n=−∞ T
where F[·] stands for the Fourier transform of the enclosed expression.
The above
relationship can be derived by expanding the periodic function n δ(t − nT ) as a
Fourier series, namely:
∞
∞
1 j2πtn/T
δ(t − nT ) = e .
n=−∞
T n=−∞
(The careful reader should wonder in which sense the above equality holds. We
are indeed being informal here.)
Taking the Fourier transform on both sides yields
. ∞ /
1 n
∞
F δ(t − nT ) = δ f− ,
n=−∞
T n=−∞ T
Using this notation, the relationship that we just proved can be written as
1
F [ET (t)] =
E 1 (f ).
T T
The picket fence miracle is a practical tool in engineering and physics, but in
the stated form it is not appropriate to obtain results that are mathematically
rigorous. An example follows.
example 5.15 We give an informal proof of the sampling theorem by using
the picket fence miracle. Let s(t) be such that sF (f ) = 0 for f ∈
/ [−B, B] and
let T ≤ 2B1
. We want to show that s(t) can be reconstructed from the T -spaced
samples {s(nT )}n∈Z . Define
∞
s|(t) = s(nT )δ(t − nT ).
n=−∞
(Note that s| is just a name for the expression on the right-hand side of the
equality.) Using the fact that s(t)δ(t − nT ) = s(nT )δ(t − nT ), we can also write
s|(t) = s(t)ET (t).
6
The choice of the letter E is suggested by the fact that it looks like a picket fence when
rotated 90 degrees.
5.15. Appendix: The picket fence “miracle” 195
1
sF (f )
f
−B 0 B
1
T
n sF (f −
1 n
T T
)
f
− T1 − B − T1 + B −B 0 B
1
T
−B 1
T
+B
Figure
5.16. Fourier transform of a function s(t) (top) and of
s|(t) = n s(nT )δ(t − nT ) (bottom).
From the figure, it is obvious that we can reconstruct the original signal s(t) by
filtering s|(t) with a filter that scales (1/T )sF (f ) by T and blocks (1/T )sF f − Tn
for n = 0. Such a filter exists if,like in
the figure, the support of sF (f ) does not
intersect with the support of sF f − Tn for n = 0. This is the case if T ≤ 2B 1
.
(We allow equality because the output of a filter is unchanged if the filter’s input is
modified at a countable number of points.) If h(t) is the impulse response of such
a filter, the filter output y(t) when the input is s|(t) satisfies
!
1
∞ n
yF (f ) = sF f − hF (f ) = sF (f ).
T n=−∞ T
After taking the inverse Fourier transform, we obtain the reconstruction (also
called interpolation) formula
∞ ∞
y(t) = s(nT )δ(t − nT ) h(t) = s(nT )h(t − nT ) = s(t).
n=−∞ n=−∞
A specific filter that has the desired properties is the lowpass filter of frequency
response
T, f ∈ [− 2T1 1
, 2T ]
hF (f ) =
0, otherwise.
Its impulse response is sinc( Tt ). Inserting into the reconstruction formula yields
∞
t
s(t) = s(nT ) sinc −n , (5.31)
n=−∞
T
The picket fence miracle is useful for computing the spectrum of certain signals
related to sampling. Examples are found in Exercises 5.1 and 5.8.
5.16 Exercises
Exercises for Section 5.2
exercise 5.1 (Sampling and reconstruction) Here we use the picket fence mir-
acle to investigate practical ways to approximate sampling and/or reconstruction.
We assume that for some positive B, s(t) satisfies sF (f ) = 0 for f ∈ [−B, B]. Let
T be such that 0 < T ≤ 1/2B.
exercise 5.2 (Sampling and projections) We have seen that the reconstruction
formula of the sampling theorem can be rewritten in such a way that it becomes
an orthonormal expansion (expression (5.3)). If ψj (t) is the jth element of an
orthonormal set of functions used to expand w(t), then the jth coefficient cj equals
the inner product w, ψj . Explain why we do not need to explicitly perform an
inner product to obtain the coefficients used in the reconstruction formula (5.3).
5.16. Exercises 197
ψ(t)
√1
T
T
t
− √1T
Figure 5.17.
198 5. Second layer revisited
exercise 5.5 (Nyquist’s criterion) For each function |ψF (f )|2 in Figure 5.18,
indicate whether the corresponding pulse ψ(t) has unit norm and/or is orthogonal
to its time-translates by multiples of T . The function in Figure 5.18d is sinc2 (f T ).
f f
− 2T
1 1
2T
− T1 1
T
(a) (b)
T 1
f f
− 2T
1 1
2T
− T1 1
T
(c) (d)
Figure 5.18.
exercise 5.6 (Nyquist pulse) A communication system uses signals of the form
sl p(t − lT ),
l∈Z
where sl takes values in some symbol alphabet and p(t) is a finite-energy pulse.
The transmitted signal is first filtered by a channel of impulse response h(t) and
then corrupted by additive white Gaussian noise of power spectral density N20 . The
receiver front end is a filter of impulse response q(t).
(a) Neglecting the noise, show that the front-end filter output has the form
y(t) = sl g(t − lT ),
l∈Z
A function g(t) that fulfills this condition is called a Nyquist pulse of param-
eter T . Prove the following theorem:
5.16. Exercises 199
theorem 5.16 (Nyquist criterion for Nyquist pulses) The L2 pulse g(t) is
a Nyquist pulse (of parameter T ) if and only if its Fourier transform gF (f )
fulfills Nyquist’s criterion (with parameter T ), i.e.
l
l. i. m. gF f − = T, t ∈ R.
T
l∈Z
pF (f ) qF (f )
T 1
f f
0 1 0 1
T T
Figure 5.19.
exercise 5.7 (Pulse orthogonal to its T -spaced time translates) Figure 5.20
shows part of the plot of a function |ψF (f )|2 , where ψF (f ) is the Fourier transform
of some pulse ψ(t).
|ψF (f )|2
f
0 1
2T
Figure 5.20.
Complete the plot (for positive and negative frequencies) and label the ordinate,
knowing that the following conditions are satisfied:
• For every pair of integers k, l, ψ(t − kT )ψ(t − lT )dt = 1{k = l};
• ψ(t) is real-valued;
• ψF (f ) = 0 for |f | > T1 .
200 5. Second layer revisited
exercise 5.8 (Nyquist criterion via picket fence miracle) Give an informal proof
of Theorem 5.6 (Nyquist criterion for orthonormal pulses) using the picket fence
miracle (Appendix 5.15). Hint: A function p(t) is a Nyquist pulse of parameter T
if and only if p(t)ET (t) = δ(t).
(3)
gF (f )
f
− T3 − T2 − T1 0 1
3T
2
3T
1
T
2
T
3
T
Figure 5.21.
(n)
(a) Show that for every n ≥ 1, gF (f ) fulfills Nyquist’s criterion with parameter
T . Hint: It is sufficient that you verify that Nyquist’s criterion is fulfilled for
f ∈ [0, T1 ]. Towards that end, first check what happens to the central rectangle
(n)
when you perform the operation l∈Z gF (f − Tl ). Then see how the small
rectangles fill in the gaps.
(n) (0)
(b) As n goes to infinity, gF (f ) converges to gF (f ). (It converges for every f
(n)
and it converges also in L2 , i.e. the squared norm of the difference gF (f ) −
(0) (0)
gF (f ) goes to zero.) Peculiar is that the limiting function gF (f ) fulfills
Nyquist’s criterion with parameter T (0) = T . What is T (0) ?
(c) Suppose that we use symbol-by-symbol on a pulse train to communicate across
the AWGN channel. To do so, we choose a pulse ψ(t) such that |ψF (f )|2 =
(n) T
gF (f ) for some n, and we choose n sufficiently large that 2n is much smaller
N0
than the noise power spectral density 2 . In this case, we can argue that our
1
bandwidth B is only 3T . This means a 30% bandwidth reduction with respect
1
to the minimum absolute bandwidth 2T . This reduction is non-negligible if we
pay for the bandwidth we use. How do you explain that such a pulse is not
used in practice? Hint: What do you expect ψ(t) to look like?
(d) Construct a function gF (f ) that looks like Figure 5.21 in the interval shown
by the figure except for the heights of the rectangles. Your function should have
infinitely many smaller rectangles on each side of the central rectangle and
5.16. Exercises 201
(n)
(like gF (f )) shall satisfy Nyquist’s criterion.
∞ Hint: One such construction is
suggested by the infinite geometric series i=1 ( 12 )i , which adds to 1.
ties:
(i) it is T for f ∈ [0, 2T
1
− 2T β
);
(ii) it equals s(f ) for f ∈ [ 2T − 2T
1 β 1
, 2T β
+ 2T ];
(iii) it is 0 for f ∈ ( 2T
1 β
+ 2T , ∞);
(iv) it is an even function.
exercise 5.11 (Peculiarity of the sinc pulse) Let {Uk }nk=0 be an iid sequence
n in {±1}. Prove that for certain values of
of uniformly distributed bits taking value
t and for n sufficiently large, s(t) = k=0 Uk sinc(t − k) can become ∞ larger than
∞
any given constant. Hint: The series k=1 k1 diverges and so does k=1 k−a 1
for
any constant a ∈ (0, 1). Note: This implies that the eye diagram of s(t) is closed.
Miscellaneous exercises
(b) Now suppose that the (noiseless) channel outputs the input plus a delayed and
scaled replica of the input. That is, the channel output is w(t) + ρw(t − T ) for
some T and some ρ ∈ [−1, 1]. At the receiver, the channel output is filtered
by ψ(−t). The resulting waveform ỹ(t) is again sampled at multiples of T .
Determine the samples ỹ(mT ), for 1 ≤ m ≤ K.
(c) Suppose that the kth received sample is Yk = dk + αdk−1 + Zk , where Zk ∼
N (0, σ 2 ) and 0 ≤ α < 1 is a constant. Note that dk and dk−1 are realizations
of independent random variables that take on the values 1 and −1 with equal
probability. Suppose that the receiver decides dˆk = 1 if Yk > 0, and decides
dˆk = −1 otherwise. Find the probability of error for this receiver.
exercise 5.13 (Communication link design) Specify the block diagram for a
digital communication system that uses twisted copper wires to connect devices that
are 5 km apart from each other. The cable has an attenuation of 16 dB/km. You
are allowed to use the spectrum between −5 and 5 MHz. The noise at the receiver
input is white and Gaussian, with power spectral density N0 /2 = 4.2×10−21 W/Hz.
The required bit rate is 40 Mbps (megabits per second) and the bit-error probability
should be less than 10−5 . Be sure to specify the symbol alphabet and the waveform
former of the system you propose. Give precise values or bounds for the bandwidth
used, the power of the channel input signal, the bit rate, and the error probability.
Indicate which bandwidth definition you use.
where ψ(t) is normalized and orthogonal to its T -spaced time-translates. The signal
is sent over the AWGN channel of power spectral density N0 /2 and at the receiver
5.16. Exercises 203
is passed through the matched filter of impulse response ψ ∗ (−t). Let Yi be the filter
output at time iT .
(a) Determine RX [k], k ∈ Z, assuming an infinite sequence {Xi }∞ i=−∞ .
(b) Describe a method to estimate Di from Yi and Yi−1 , such that the performance
is the same if the polarity of Yi is inverted for all i. We ask for a simple
decoder, not necessarily ML.
(c) Determine (or estimate) the error probability of your decoder.
exercise 5.15 (Mixed questions)
2
(a) Consider the signal x(t) = cos(2πt) sin(πt)
πt . Assume that we sample x(t)
with sampling period T . What is the maximum T that guarantees signal
recovery?
(b) You are given a pulse p(t) with spectrum pF (f ) = T (1 − |f |T ), |f | ≤ T1 .
What is the value of p(t)p(t − 3T )dt?
exercise 5.16 (Properties of the Fourier transform) Prove the following prop-
F
erties of the Fourier transform. The sign ⇐⇒ relates Fourier transform pairs,
with the function on the right being the Fourier transform of that on the left. The
Fourier transforms of v(t) and w(t) are denoted by vF (f ) and wF (f ), respectively.
(a) Linearity:
F
αv(t) + βw(t) ⇐⇒ αvF (f ) + βwF (f ).
(b) Time-shifting:
F
v(t − t0 ) ⇐⇒ vF (f )e−j2πf t0 .
(c) Frequency-shifting:
F
v(t)ej2πf0 t ⇐⇒ vF (f − f0 ).
(d) Convolution in time:
F
(v w)(t) ⇐⇒ vF (f )wF (f ).
(e) Time scaling by α = 0:
F 1 f
v(αt) ⇐⇒ vF .
|α| α
(f ) Conjugation:
F
v ∗ (t) ⇐⇒ ∗
vF (−f ).
(g) Time-frequency duality:
F
vF (t) ⇐⇒ v(−f ).
204 5. Second layer revisited
Hint: Use Parseval’s relationship on the expression on the right and interpret
the result.
6 Convolutional coding
and Viterbi decoding:
First layer revisited
6.1 Introduction
In this chapter we shift focus to the encoder/decoder pair. The general setup is
that of Figure 6.1, where N (t) is white Gaussian noise of power spectral density
N0 /2. The details of the waveform former and the n-tuple former are immaterial
for this chapter. The important fact is that the channel model from the encoder
output to the decoder input is the discrete-time AWGN channel of noise variance
σ 2 = N0 /2.
The study of encoding/decoding methods has been an active research area
since the second half of the twentieth century. It is called coding theory. There
are many coding techniques, and a general introduction to coding can easily
occupy a one-semester graduate-level course. Here we will just consider an example
of a technique called convolutional coding. By considering a specific example,
we can considerably simplify the notation. As seen in the exercises, applying
the techniques learned in this chapter to other convolutional encoders is fairly
straightforward. We choose convolutional coding for two reasons: (i) it is well
suited in conjunction with the discrete-time AWGN channel; (ii) it allows us to
introduce various instructive and useful tools, notably the Viterbi algorithm to
do maximum likelihood decoding and generating functions to upper bound the
bit-error probability.
205
206 6. First layer revisited
6
(b1 , . . . , bk ) (b̂1 , . . . , b̂k )
?
Encoder Decoder
√ √ 6
E(x1 , . . . , xn ) Exj yj (y1 , . . . , yn )
- -
?
6
Waveform Baseband
Former Front End
Z ∼ N 0, N0
2
6
R(t)
-
6
N (t)
The source symbols enter the encoder sequentially, at regular intervals deter-
mined by the encoder clock. During the jth epoch, j = 1, 2, . . . , the encoder takes
bj and produces two output symbols, x2j−1 and x2j , according to the encoding
map 1
x2j−1 = bj bj−2
x2j = bj bj−1 bj−2 .
To produce x1 and x2 the encoder needs b−1 and b0 , which are assumed to be 1
by default.
The circuit that implements the convolutional encoder is depicted in Figure 6.2,
where “×” denotes multiplication in R. A shift register stores the past two inputs.
As implied by the indices of x2j−1 , x2j , the two encoder outputs produced during
an epoch are transmitted sequentially.
Notice that the encoder output has length n = 2k. The following is an example of
a source sequence of length k = 5 and the corresponding encoder output sequence
of length n = 10.
bj 1 −1 −1 1 1
x2j−1 , x2j 1, 1 −1, −1 −1, 1 −1, 1 −1, −1
j 1 2 3 4 5
1
We are choosing this particular encoding map because it is the simplest one that is
actually used in practice.
6.2. The encoder 207
× x2j−1 = bj bj−2
bj
bj−1 bj−2
−1|1, −1
t
State Labels
−1| − 1, 1 1| − 1, 1 t = (−1, −1)
1|1, −1
l = (−1, 1)
r = (1, −1)
l r
b = (1, 1)
−1|1, 1
−1| − 1, −1 1| − 1, −1
1|1, 1
The choice of letting the encoder input and output symbols be the elements
of {±1} is not standard. Most authors choose the input/output alphabet to be
{0, 1} and use addition modulo 2 instead of multiplication over R. In this case, a
memoryless mapping
√ at the encoder output transforms the symbol alphabet from
{0, 1} to {± Es }. The notation is different but the end result is the same. The
choice we have made is better suited for the AWGN channel. The drawback of
this choice is that it is less evident that the encoder is linear. In Exercise 6.12 we
establish the link between the two viewpoints and in Exercise 6.5 we prove from
first principles that the encoder is indeed linear.
In each epoch, the convolutional encoder we have chosen has k0 = 1 symbol
entering and n0 = 2 symbols exiting the encoder. In general, a convolutional
encoder is specified by (i) the number k0 of source symbols entering the encoder in
each epoch; (ii) the number n0 of symbols produced by the encoder in each epoch,
where n0 > k0 ; (iii) the constraint length m0 defined as the number of input
k0 -tuples used to determine an output n0 -tuple; and (iv) the encoding function,
specified for instance by a k0 × m0 matrix of 1s and 0s for each component of the
output n0 -tuple. The matrix associated to an output component specifies which
inputs are multiplied to obtain that output. In our example, k0 = 1, n0 = 2,
m0 = 3, and the encoding function is specified by [1, 0, 1] (for the first component
of the output) and [1, 1, 1] (for the second component). (See the connections that
determine the top and bottom output in Figure 6.2.) In our case, the elements of
the output n0 tuple are serialized into a single sequence that we consider to be
the actual encoder output, but there are other possibilities. For instance, we could
take the pair x2j−1 , x2j and map it into an element of a 4-PAM constellation.
To find an x that maximizes x, y, we could in principle compute x, y for all 2k
sequences that can be produced by the encoder. This brute-force approach would
be quite unpractical. As already mentioned, if k = 100 (which is a relatively modest
value for k), 2k = (210 )10 which is approximately (103 )10 = 1030 . A VLSI chip that
makes 109 inner products per second takes 1021 seconds to check all possibilities.
This is roughly 4 × 1013 years. The universe is “only” roughly 2 × 1010 years old!
We wish for a method that finds a maximizing x with a number of operations
that grows linearly (as opposed to exponentially) in k. We will see that the so-called
Viterbi algorithm achieves this.
To describe the Viterbi algorithm (VA), we introduce a fourth way of describing
a convolutional encoder, namely the trellis. The trellis is an unfolded transition
diagram that keeps track of the passage of time. For our example, if we assume
that we start at state (1, 1), that the source sequence is b1 , b2 , . . . , b5 , and that
we complete the transmission by feeding the encoder with two “dummy bits”
b6 = b7 = 1 that make the encoder stop in the initial state, we obtain the trellis
description shown on the top of Figure 6.4, where an edge (transition) from a state
at depth j to a state at depth j+1 is labeled with the corresponding encoder output
x2j−1 , x2j . The encoder input that corresponds to an edge is the first component
of the next state.
There is a one-to-one correspondence between an encoder input sequence b, an
encoder output sequence x, and a path (or state sequence) that starts at the initial
state (1, 1) (left state) and ends at the final state (1, 1) (right state) of the trellis.
Hence we can refer to a path by means of an input sequence, an output sequence
or a sequence of states.
To decode using the Viterbi algorithm, we replace the label of each edge with
the edge metric (also called branch metric) computed as follows. The edge with
x2j−1 = a and x2j = b, where a, b ∈ {±1}, is assigned the edge metric ay2j−1 +by2j .
Now if we add up all the edge metrics along a path, we obtain the path metric
x, y.
example 6.1 Consider the trellis on the top of Figure 6.4 and let the decoder
input sequence be y = (1, 3), (−2, 1), (4, −1), (5, 5), (−3, −3), (1, −6), (2, −4). For
convenience, we chose the components of y to be integers, but in reality they are
real-valued. Also for convenience, we use parentheses to group the components of y
into pairs (y2j−1 , y2j ) that belong to the same trellis section. The edge metrics are
shown on the second trellis (from the top) of Figure 6.4. Once again, by adding the
edge metric along a path, we obtain the path metric x, y, where x is the encoder
output associated to the path.
example 6.2 Our starting point is the second trellis of Figure 6.4, which has
been labeled with the edge metrics. We construct the third trellis in which every
state is labeled with the metric of the surviving path to that state obtained as
follows. We use j = 0, 1, . . . , k + 2 to run over the trellis depth. Depth j = 0 refers
210 6. First layer revisited
STATE
−1,−1 1,−1 1,−1 1,−1
− 1, − 1, − 1, − 1,
1 1 1 1
,1
,1
,1
,1
1,−1
−1
−1
−1
−1
1 1 1 1 1
1 ,− 1,− 1,− 1,− 1,−
1,1 1,1 1,1
−1
−1
−1
−1
−1
−1,1
,−
,−
,−
,−
,−
1
1
−1 −1 −1 −1 −1
− 1, − 1, − 1, − 1, − 1,
1,1
1,1 1,1 1,1 1,1 1,1 1,1 1,1
STATE
−1,−1 5 0 0
−5 0 0 −7
1,−1
−5
3
0
−3 5 0 0 7
3 10 −6
−3
−1
2
−1,1
−4 1 −3 − 10 0 6
1,1
4 −1 3 10 −6 −5 −2
STATE
−1 5
4 0
4 0
20
−1,−1
−5 0 0 −7
−7 10 4 20 29
1,−1
−5
3
−3 5 0 0 7
3 10 −6
−4 5 0 20 22
−3
−1
2
−1,1
0
−4 1 3 −3 6 − 10 16 6 10 25 31
1,1
4
4 −1 3 10 −6 −5 −2
STATE
−1 5
4 0
4 0
20
−1,−1
−5 0 0 −7
−7 10 4 20 29
1,−1
−5
3
−3 5 0 0 7
3 10 −6
−4 5 0 20 22
−3
−1
−1,1
0
−4 1 3 −3 6 − 10 16 6 10 25 31
1,1
4
4 −1 3 10 −6 −5 −2
Figure 6.4. The Viterbi algorithm. Top figure: Trellis representing the encoder
where edges are labeled with the corresponding output symbols. Second figure:
Edges are re-labeled with the edge metric corresponding to the received
sequence (1, 3), (−2, 1), (4, −1), (5, 5), (−3, −3), (1, −6), (2, −4) (parentheses have
been inserted to facilitate parsing). Third figure: Each state has been labeled
with the metric of a survivor to that state and non-surviving edges are pruned
(dashed). Fourth figure: Tracing back from the end, we find the decoded path
(bold); it corresponds to the source sequence 1, 1, 1, 1, −1, 1, 1.
6.4. Bit-error probability 211
to the initial state (leftmost) and depth j = k + 2 to the final state (rightmost)
after sending the k bits and the 2 “dummy bits”. Let j = 0 and to the single state
at depth j assign the metric 0. Let j = 1 and label each of the two states at depth
j with the metric of the only subpath to that state. (See the third trellis from the
top.) Let j = 2 and label the four states at depth j with the metric of the only
subpath to that state. For instance, the label to the state (−1, −1) at depth j = 2 is
obtained by adding the metric of the single state and the single edge that precedes
it, namely −1 = −4 + 3. From j = 3 on the situation is more interesting, because
now every state can be reached from two previous states. We label the state under
consideration with the largest of the two subpath metrics to that state and make
sure to remember to which of the two subpaths it corresponds. In the figure, we
make this distinction by dashing the last edge of the other path. (If we were doing
this by hand we would not need a third trellis. Rather we would label the states on
the second trellis and cross out the edges that are dashed in the third trellis.) The
subpath with the highest edge metric (the one that has not been dashed) is called
survivor. We continue similarly for j = 4, 5, . . . , k + 2. At depth j = k + 2 there is
only one state and its label maximizes x, y over all paths. By tracing back along
the non-dashed path, we find the maximum likelihood path. From it, we can read
out the corresponding bit sequence. The maximum likelihood path is shown in bold
on the fourth and last trellis of Figure 6.4.
From the above example, it is clear that, starting from the left and working
its way to the right, the Viterbi algorithm visits all states and keeps track of the
subpath that has the largest metric to that state. In particular, the algorithm finds
the path that has the largest metric between the initial state and the final state.
The complexity of the Viterbi algorithm is linear in the number of trellis sections,
i.e. in k. Recall that the brute-force approach has complexity exponential in k. The
saving of the Viterbi algorithm comes from not having to compute the metric of
non-survivors. When we dash an edge at depth j, we are in fact eliminating 2k−j
possible extensions of that edge. The brute-force approach computes the metric of
all those extensions but not the Viterbi algorithm.
A formal definition of the VA (one that can be programmed on a computer) and
a more formal argument that it finds the path that maximizes x, y is given in
the Appendix (Section 6.6).
reference path
a sequence of k ones with initial encoder state (1, 1). The encoder output is a
sequence of 1s of length n = 2k.
The task of the decoder is to find (one of) the paths in the trellis that has the
largest x, y, where x is the encoder output that corresponds to that path. The
encoder input b that corresponds to this x is the maximum likelihood message
chosen by the decoder.
The concept of a detour plays a key role in upper-bounding the bit-error
probability. We start with an analogy. Think of the trellis as a road map,
of the path followed by the encoder as the itinerary you have planned for a
journey, and of the decoded path as the actual route you follow on your journey.
Typically the itinerary and the actual route differ due to constructions that
force you to take detours. Similarly, the detours taken by the Viterbi decoder
are those segments of the decoded path that share with the reference path
only their initial and final state. Figure 6.5 illustrates a reference path and two
detours.
Errors are produced when the decoder follows a detour. To the trellis path
selected by the decoder, we associate a sequence ω0 , ω1 , . . . , ωk−1 defined as follows.
If there is a detour that starts at depth j, j = 0, 1, . . . , k − 1, we let ωj be the
number of bit errors produced by that detour. It is determined by comparing the
corresponding segments of the two encoder input sequences and by letting ωj be
the number of positions in which they differ.
k−1If depth j does not correspond to
the start of a detour, then ωj = 0. Now j=0 ωj is the number of bits that are
k−1
incorrectly decoded and kk1 0 j=0 ωj is the fraction of such bits (k0 =1 in our
running example; see Section 6.2).
example 6.3 Consider the example of Figure 6.4 where k = 5 bits are transmit-
ted (followed by the two dummy bits 1, 1). The reference path is the all-one path
and the decoded path is the one marked by the solid line on the bottom trellis. There
is one detour, which starts at depth 4 in the trellis. Hence ωj = 0 for j = 0, 1, 2, 3
whereas ω4 = 0. To determine the value of ω4 , we need to compare the encoder
input bits over the span of the detour. The input bits that correspond to the detour
are −1, 1, 1 and those that correspond to the reference path are 1, 1, 1. There is
one disagreement,
k−1 hence ω4 = 1. The fraction of bits that are decoded incorrectly
is k1 j=0 ωj = 1/5.
⎡ ⎤
1
k−1
1
k−1
Pb = E ⎣ Ωj ⎦ = E [Ωj ] .
kk0 j=0
kk0 j=0
To upper bound the above expression, we need to learn how many detours of a
certain kind there are. We do so in the next section.
In this subsection, we consider the infinite trellis obtained by extending the finite
trellis in both directions. Each path of the infinite trellis corresponds to an infinite
encoder input sequence b = . . . b−1 , b0 , b1 , b2 , . . . and an infinite encoder output
sequence x = . . . x−1 , x0 , x1 , x2 , . . .. These are sequences that belong to {±1}∞ .
Given any two paths in the trellis, we can take one as the reference and consider
the other as consisting of a number of detours with respect to the reference. To
each of the two paths there corresponds an encoder input and an encoder output
sequence. For every detour we can compare the two segments of encoder output
sequences and count the number of positions in which they differ. We denote this
number by d and call it the output distance (over the span of the detour). Similarly,
we can compare the segments of encoder input sequences and call input distance
(over the span of the detour) the number i of positions in which they differ.
example 6.4 Consider again the example of Figure 6.4 and let us choose the
all-one path as the reference. Consider the detour that starts at depth j = 0 and
ends at j = 3. From the top trellis, comparing labels, we see that d = 5. (There are
two disagreements in the first section of the trellis, one in the second, and two in
the third.) To determine the input distance i we need to label the transitions with
the corresponding encoder input. If we do so and compare we see that i = 1. As
another example, consider the detour that starts at depth j = 0 and ends at j = 4.
For this detour, d = 6 and i = 2.
We seek the answer to the following question: For any given reference path and
depth j ∈ {0, 1, . . . }, what is the number a(i, d) of detours that start at depth
j and have input distance i and output distance d, with respect to the reference
path? This number depends neither on j nor on the reference path. It does not
depend on j because the encoder is a time-invariant machine, i.e. all the sections
of the infinite trellis are identical. (This is the reason why we are considering the
infinite trellis in this section.) We will see that it does not depend on the reference
path either, because the encoder is linear in a sense that we will discuss.
example 6.5 Using the top trellis of Figure 6.4 with the all-one path as the
reference and j = 0, we can verify by inspection that there are two detours that
have output distance d = 6. One ends at j = 4 and the other ends at j = 5. The
input distance is i = 2 in both cases. Because there are two detours with parameters
d = 6 and i = 2, a(2, 6) = 2.
214 6. First layer revisited
ID
ID D
D
ID2 D2
s l r e
example 6.6 In Figure 6.6, the shortest path that connects s to e has length 3.
It consists of the edges labeled ID2 , D, and D2 , respectively. The product of these
labels is the path label ID5 . This path tells us that there is a detour with i = 1 (the
exponent of I) and d = 5 (the exponent of D). There is no other path with path
label ID5 . Hence, as we knew already, a(1, 5) = 1.
Our next goal is to determine the generating function T (I, D) of a(i, d) defined as
T (I, D) = I i Dd a(i, d).
i,d
The letters I and D in the above expression should be seen as “place holders” with-
out any physical meaning. It is like describing a set of coefficients a0 , a1 , . . . , an−1
by means of the polynomial p(x) = a0 + a1 x + · · · + an−1 xn−1 . To determine
T (I, D), we introduce auxiliary generating functions, one for each intermediate
state of the detour flow graph, namely
6.4. Bit-error probability 215
Tl (I, D) = I i Dd al (i, d),
i,d
Tt (I, D) = I i Dd at (i, d),
i,d
Tr (I, D) = I i Dd ar (i, d),
i,d
Te (I, D) = I i Dd ae (i, d),
i,d
where in the first line we define al (i, d) as the number of paths in the detour flow
graph that start at state s, end at state l, and have path label I i Dd . Similarly, for
x = t, r, e, ax (i, d) is the number of paths in the detour flow graph that start at
state s, end at state x, and have path label I i Dd . Notice that Te (I, D) is indeed
the T (I, D) of interest to us.
From the detour flow graph, we see that the various generating functions are
related as follows, where to simplify notation we drop the two arguments (I and
D) of the generating functions:
Tl = ID2 + Tr I
Tt = Tl ID + Tt ID
T r = T l D + Tt D
T e = Tr D 2 .
To write down the above equations, the reader might find it useful to apply
the following rule. The Tx of a state x is the sum of a product: the sum is over
all states y that have an edge into x and the product is Ty times the label on the
edge from y to x. The reader can verify that this rule applies to all of the above
equations except the first. When used in an attempt to find the first equation, it
yields Tl = Ts ID2 + Tr I, but Ts is not defined because there is no detour starting
at s and ending at s. If we define Ts = 1 by convention, the rule applies without
exception.
The above system can be solved for Te (hence for T ) by pure formal manipula-
tions, like solving a system of equations. The result is
ID5
T (I, D) = .
1 − 2ID
As we will see shortly, the generating function T (I, D) of a(i, d) is more useful than
a(i, d) itself. However, to show that we can indeed obtain a(i, d) from T (I, D) we
use the expansion2 1−x 1
= 1 + x + x2 + x3 + · · · to write
2
We do not need to worry about convergence issues at this stage, because for now, xi is
just a “place holder”. In other words, we are not adding up the powers of x for some
number x.
216 6. First layer revisited
ID5
T (I, D) = = ID5 (1 + 2ID + (2ID)2 + (2ID)3 + · · ·
1 − 2ID
= ID5 + 2I 2 D6 + 22 I 3 D7 + 23 I 4 D8 + · · ·
This means that there is one path with parameters i = 1, d = 5, that there are
two paths with i = 2, d = 6, etc. The general expression for i = 1, 2, . . . is
2i−1 , d = i + 4
a(i, d) =
0, otherwise.
√ n
w(t) = E xj ψj (t),
j=1
1
k−1
Pb = E[Ωj ]. (6.1)
kk0 j=0
where the sum is over all detours h that start at depth j with respect to the
reference path, π(h) stands for the probability that detour h is taken, and i(h) for
the input distance between detour h and the reference path.
Next we upper bound π(h). If a detour starts at depth j and ends at depth
l = j + m, then the corresponding encoder-output symbols form a 2m tuple
ū ∈ {±1}2m . Let u = x2j+1 , . . . , x2l ∈ {±1}2m and ρ = y2j+1 , . . . , y2l be the
corresponding sub-sequence of the reference path and of the channel output,
respectively, see Figure 6.7.
A necessary (but not sufficient) condition for the Viterbi algorithm to take a
detour is that the subpath metric along the detour is at least as large as the
corresponding subpath metric √ along the reference
√ path. An equivalent condition is
that
√ ρ is at least as close to E s ū as it is to E s u. Observe that ρ has the statistic
of Es u + Z where Z ∼ N (0, N20 I2m ) and 2m is the common length of u, ū, and ρ.
√ √
The probability that ρ is at least as close to Es ū as it is to Es u is Q d2σE , where
√ √ √
dE = 2 Es d is the Euclidean distance between Es u and Es ū. Using dE (h) to
denote the Euclidean distance of detour h to the reference path, we obtain
!
dE (h) Es d(h)
π(h) ≤ Q =Q ,
2σ σ2
Figure 6.7. Detour and reference path, labeled with the corresponding
output subsequences.
218 6. First layer revisited
E[Ωj ] = i(h)π(h)
h
!
Es d(h)
≤ i(h)Q
σ2
h
∞ ∞
!
(a) Es d
= iQ ã(i, d)
i=1 d=1
σ2
∞ ∞
!
(b) Es d
≤ iQ a(i, d)
i=1
σ2
d=1
∞
(c) ∞
≤ iz d a(i, d).
i=1 d=1
To obtain equality (a) we group the terms of the sum that have the same i and
d and introduce ã(i, d) to denote the number of such terms in the finite trellis.
Note that ã(i, d) is the finite-trellis equivalent to a(i, d) introduced in Section
6.4.1. As the infinite trellis contains all the detours of the finite trellis and more,
ã(i, d) ≤ a(i, d). This justifies (b). In (c) we use
!
Es d Es d Es
Q ≤ e− 2σ2 = z d , for z = e− 2σ2 .
σ2
For the final step towards the upper bound to Pb , we use the relationship
∞ ∞ 7
∂ i 7
if (i) = I f (i)77 ,
i=1
∂I i=1 I=1
which
∞ holds for any function f and can be verified by taking the derivative of
i
i=1 I f (i) with respect to I and then setting I = 1. Hence
∞
∞
E[Ωj ] ≤ iz d a(i, d) (6.3)
i=1 d=1
∞ ∞ 7
∂ 7
= I D a(i, d)77
i d
∂I i=1 d=1 I=1,D=z
7
∂ 7
= T (I, D)77 .
∂I I=1,D=z
Plugging into (6.1) and using the fact that the above bound does not depend on
j yields
7
1 7
k−1
1 ∂
Pb = E[Ωj ] ≤ T (I, D)77 . (6.4)
kk0 j=0 k0 ∂I I=1,D=z
6.5. Summary 219
ID 5 ∂T
In our specific example we have k0 = 1 and T (I, D) = 1−2ID , hence ∂I =
5
D
(1−2ID)2 . Thus
z5
Pb ≤ . (6.5)
(1 − 2z)2
The bit-error probability depends on the encoder and on the channel. Bound
(6.4) nicely separates the two contributions. The encoder is accounted for by
T (I, D)/k0 and the channel by z. More precisely, z d is an upper bound to the
probability that a maximum likelihood receiver makes a decoding error when the
choice is between two encoder output sequences that have Hamming distance d. As
shown in Exercise 2.32(b) of Chapter 2, we can use the Bhattacharyya bound to
determine z for any binary-input discrete memoryless channel. For such a channel,
z= P (y|a)P (y|b), (6.6)
y
where a and b are the two letters of the input alphabet and y runs over all the
elements of the output alphabet. Hence, the technique used in this chapter is
applicable to any binary input discrete memoryless channel.
It should be mentioned that the upper bound (6.5) is valid under the condition
that there is no convergence issue associated to the various sums following (6.3).
This is the case when 0 ≤ z ≤ 12 , which is the case when the numerator and
the denominator of (6.5) are non-negative. The z from (6.6) fulfills 0 ≤ z ≤ 1.
However, if we use the tighter Bhattacharyya bound discussed in Exercise 2.29 of
Chapter 2 (which is tighter by a factor 12 ) then it is guaranteed that 0 ≤ z ≤ 12 .
6.5 Summary
To assess the impact of the convolutional encoder, let us compare two situations.
In both cases, the transmitted signal looks identical to an observer, namely it has
the form
2l
w(t) = si ψ(t − iT )
i=1
for some positive integer l and some unit-norm pulse ψ(t) that is √ orthogonal to
its T -space translates. In both cases, the symbols take values in {± Es } for some
fixed energy-per-symbol Es , but the way the symbols are obtained differs in the two
cases. In one case, the symbols are obtained from the output of the convolutional
encoder studied in this chapter. We call this the coded case. In the other √
case, the
symbols are simply the source bits, which take value in {±1}, scaled by Es . We
call this the uncoded case.
For the coded case, the number of symbols is twice the number of bits. Hence,
letting Rb , Rs , and Eb be the bit rate, the symbol rate, and the energy per bit,
respectively, we obtain
220 6. First layer revisited
Rs 1
Rb = = [bits/symbol],
2 2T
Eb = 2Es ,
z5
Pb ≤ ,
(1 − 2z)2
Es
where z = e− 2σ2 . As 2σEs
2 becomes large, the denominator of the above bound for
101
100
10−1
Uncoded
−2
10
10−3
Es
Q σ2
Pb
10−4
10−5
10−6 z5
(1−2z)2
10−7 LDPC
10−8
Conv. Enc.
0 1 2 3 4 5 6 7 8
Es /σ in dB
2
From a high-level point of view, non-trivial coding is about using only selected
sequences to form
√ the codebook. In this chapter, we have fixed the channel-input
alphabet to {± Es }. Then our only option to introduce non-trivial coding is to
increase the codeword length from n = k to n > k. For a fixed bit rate, increasing n
implies increasing the symbol rate. To increase the symbol rate we time-compress
the pulse ψ(t) by the appropriate factor and the bandwidth expands by the same
factor. If we fix the bandwidth, the symbol rate stays the same and the bit rate
has to decrease.
It would be wrong to conclude that non-trivial coding always requires reducing
the bit rate or increasing the bandwidth. Instead of keeping the channel-input
alphabet constant, for the coded system we could have used, say, 4-PAM. Then
each pair of binary symbols produced by the encoder can be mapped into a single
4-PAM symbol. In so doing, the bit rate, the symbol rate, and the bandwidth
remain unchanged.
The ultimate answer comes from information theory (see e.g. [19]). Information
theory tells us that, by means of coding, we can achieve an error probability as
small as desired, provided that we send fewer bits per symbol than the channel
capacity C, which for the discrete-time AWGN channel is C = 12 log2 (1 + σEs2 )
bits/symbol. According to this expression, to send 1/2 bits per symbol as we do in
our example, we need Es /σ 2 = 1, which means 0 dB. We see that the performance
222 6. First layer revisited
of the√LDPC code is quite good. Even with the channel-input alphabet restricted
to {± Es } (no such restriction is imposed in the derivation of C), the LDPC code
achieves the kind of error probability that we typically want in applications at an
Es /σ 2 which is within 1 dB from the ultimate limit of 0 dB required for reliable
communication.
Convolutional codes were invented by Elias in 1955 and have been used in many
communication systems, including satellite communication, and mobile communi-
cation. In 1993, Berrou, Glavieux, and Thitimajshima captured the attention of
the communication engineering community by introducing a new class of codes,
called turbo codes, that achieved a performance breakthrough by concatenating
two convolutional codes separated by an interleaver. Their performance is not far
from that of the low-density parity-check codes (LDPC) – today’s state-of-the-art
in coding.
Thanks to its tremendous success, coding is in every modern communication
system. In this chapter we have only scratched the surface. Recommended books
on coding are [22] for a classical textbook that covers a broad spectrum of coding
techniques and [23] for the reference book on LDPC coding.
where x2j−1 , x2j is the encoder output of the corresponding edge. If there is no
such edge, we let μj−1,j (α, β) = −∞.
Since μj−1,j (α, β) is the jth term in x, y for any path that goes through state
α at depth j − 1 and state β at depth j, x, y is obtained by adding the edge
metrics along the path specified by x.
The path metric is the sum of the edge metrics taken along the edges of a path.
A longest path from state (1, 1) at depth j = 0, denoted (1, 1)0 , to a state α at
depth j, denoted αj , is one of the paths that has the largest path metric. The
Viterbi algorithm works by constructing, for each j, a list of the longest paths
to the states at depth j. The following observation is key to understanding the
Viterbi algorithm. If path ∗ αj−1 ∗ βj is a longest path to state β of depth j, where
path ∈ Γj−2 and ∗ denotes concatenation, then path ∗ αj−1 must be a longest path
to state α of depth j − 1, for if another path, say alternatepath ∗ αj−1 were shorter
for some alternatepath ∈ Γj−2 , then alternatepath ∗ αj−1 ∗ βj would be shorter
than path ∗ αj−1 ∗ βj . So the longest depth j path to a state can be obtained by
checking the extension of the longest depth (j − 1) paths by one edge.
The following notation is useful for the formal description of the Viterbi algo-
rithm. Let μj (α) be the metric of a longest path to state αj and let Bj (α) ∈
{±1}j be the encoder input sequence that corresponds to this path. We call
6.7. Exercises 223
Bj (α) ∈ {±1}j the survivor because it is the only path through state αj that
will be extended. (Paths through αj that have a smaller metric have no chance of
extending into a maximum likelihood path.) For each state, the Viterbi algorithm
computes two things: a survivor and its metric. The formal algorithm follows,
where B(β, α) is the encoder input that corresponds to the transition from state
β to state α if there is such a transition and is undefined otherwise.
(1) Initially set μ0 (1, 1) = 0, μ0 (α) = −∞ for all α = (1, 1), B0 (1, 1) =
∅, and j = 1.
The reader should have no difficulty verifying (by induction on j) that μj (α) as
computed by Viterbi’s algorithm is indeed the metric of a longest path from (1, 1)0
to state α at depth j and that Bj (α) is the encoder input sequence associated to it.
6.7 Exercises
Exercises for Section 6.2
where Ts and Es are fixed positive numbers, ψ(t) is some unit-energy function, T0
is a uniformly distributed random variable taking values in [0, Ts ), and {Xi }∞
i=−∞
is the output of the convolutional encoder described by
X2n = Bn Bn−2
X2n+1 = Bn Bn−1 Bn−2
(a) Express the power spectral density of X(t) for a general ψ(t).
(b) Plot the power spectral density of X(t) assuming that ψ(t) is a unit-norm
rectangular pulse of width Ts .
224 6. First layer revisited
exercise 6.3 (Viterbi algorithm) An output sequence x1 , . . . , x10 from the con-
volutional encoder of Figure 6.9 is transmitted over the discrete-time AWGN chan-
nel. The initial and final state of the encoder is (1, 1). Using the Viterbi algorithm,
find the maximum likelihood information sequence b̂1 , . . . , b̂4 , 1, 1, knowing that
b1 , . . . , b4 are drawn independently and uniformly from {±1} and that the channel
output y1 , . . . , y10 = 1, 2, −1, 4, −2, 1, 1, −3, −1, −2. (It is for convenience that we
are choosing integers rather than real numbers.)
× x2j−1
bj ∈ {±1}
bj−1 bj−2
× × x2j
Figure 6.9.
where Bi is the ith information bit, h0 , . . . , hL are coefficients that describe the
inter-symbol interference, and Zi is zero-mean, Gaussian, of variance σ 2 , and
statistically independent of everything else. Relationship (6.7) can be described by
a trellis, and the ML decision rule can be implemented by the Viterbi algorithm.
(a) Draw the trellis that describes all sequences of the form X1 , . . . , X6 resulting
from information sequences of the form B1 , . . . , B5 , 0, Bi ∈ {0, 1}, assuming
⎧
⎨ 1, i=0
hi = −2, i = 1
⎩
0, otherwise.
To determine the initial state, you may assume that the preceding informa-
tion sequence terminated with 0. Label the trellis edges with the input/output
symbols.
6.7. Exercises 225
6
(b) Specify a metric f (x1 , . . . , x6 ) = i=1 f (xi , yi ) whose minimization or max-
imization with respect to the valid x1 , . . . , x6 leads to a maximum likelihood
decision. Specify if your metric needs to be minimized or maximized.
(c) Assume y1 , . . . , y6 = 2, 0, −1, 1, 0, −1. Find the maximum likelihood estimate
of the information sequence B1 , . . . , B5 .
exercise 6.5 (Linearity) In this exercise, we establish in what sense the encoder
of Figure 6.2 is linear.
(a) For this part you might want to review the axioms of a field. Consider the set
F0 = {0, 1} with the following addition and multiplication tables.
+ 0 1 × 0 1
0 0 1 0 0 0
1 1 0 1 0 1
(The addition in F0 is the usual addition over R with result taken modulo 2.
The multiplication is the usual multiplication over R and there is no need to
take the modulo 2 operation because the result is automatically in F0 .) F0 ,
“+”, and “×” form a binary field denoted by F2 . Now consider F− = {±1}
and the following addition and multiplication tables.
+ 1 −1 × 1 −1
1 1 −1 1 1 1
−1 −1 1 −1 1 −1
(The addition in F− is the usual multiplication over R.) Argue that F− , “+”,
and “×” form a binary field as well. Hint: The second set of operations can be
obtained from the first set via the transformation T : F0 → F− that sends 0 to
1 and 1 to −1. Hence, by construction, for a, b ∈ F0 , T (a + b) = T (a) + T (b)
and T (a × b) = T (a) × T (b). Be aware of the double meaning of “+” and “×”
in the previous sentence.
(b) For this part you might want to review the notion of a vector space. Let
F0 , “+” and “×” be as defined in (a). Let V = F0∞ . This is the set of
infinite sequences taking values in F0 . Does V, F0 , “+” and “×” form a
vector space? (Addition of vectors and multiplication of a vector with a scalar
is done component-wise.) Repeat using F− .
(c) For this part you might want to review the notion of linear transformation.
Let f : V → V be the transformation that sends an infinite sequence b ∈ V to
an infinite sequence x ∈ V according to
x2j−1 = bj−1 + bj−2 + bj−3
x2j = bj + bj−2 ,
where the “+” is the one defined over the field of scalars implicit in V. Argue
∞
that this f is linear. Comment: When V = F− , this encoder is the one used
throughout Chapter 6, with the only difference that in the chapter we multiply
226 6. First layer revisited
over R rather than adding over F− , but this is just a matter of notation,
the result of the two operations on the elements of F− being identical. The
standard way to describe a convolutional encoder is to choose F0 and the
corresponding addition, namely addition modulo 2. See Exercise 6.12 for the
reason we opt for a non-standard description.
exercise 6.6 (Independence of the distance profile from the reference path)
We want to show that a(i, d) does not depend on the reference path. Recall that
in Section 6.4.1 we define a(i, d) as the number of detours that leave the reference
path at some arbitrary but fixed trellis depth j and have input distance i and output
distance d with respect to the reference path.
(a) Let b and b̄, both in {±1}∞ , be two infinite-length input sequences to the
encoder of Figure 6.2 and let f be the encoding map. The encoder is linear
in the sense that the componentwise product over the reals bb̄ is also a valid
input sequence and the corresponding output sequence is f (bb̄) = f (b)f (b̄)
(see Exercise 6.5). Argue that the distance between b and b̄ equals the distance
between bb̄ and the all-one input sequence. Similarly, argue that the distance
between f (b) and f (b̄) equals the distance between f (bb̄) and the all-one output
sequence (which is the output to the all-one input sequence).
(b) Fix an arbitrary reference path and an arbitrary detour that splits from the
reference path at time 0. Let b and b̄ be the corresponding input sequences.
Because the detour starts at time 0, bi = b̄i for i < 0 and b0 = b̄0 . Argue that
b̄ uniquely defines a detour b̃ that splits from the all-one path at time 0 and
such that:
(i) the distance between b and b̄ is the same as that between b̃ and the all-one
input sequence;
(ii) the distance between f (b) and f (b̄) is the same as that between f (b̃) and
the all-one output sequence.
(c) Conclude that a(i, d) does not depend on the reference path.
exercise 6.7 (Rate 1/3 convolutional code) For the convolutional encoder of
Figure 6.10 do the following.
× x3n = bn bn−2
bn ∈ {±1}
bn−1 bn−2
Figure 6.10.
6.7. Exercises 227
(a) Draw the state diagram and the detour flow graph.
(b) Suppose that the serialized encoder output symbols are scaled so that the
resulting energy per bit is Eb and are sent over the discrete-time AWGN
channel of noise variance σ 2 = N0 /2. Derive an upper bound to the bit-error
probability assuming that the decoder implements the Viterbi algorithm.
exercise 6.8 (Rate 2/3 convolutional code) The following equations describe
the output sequence of a convolutional encoder that in each epoch takes k0 = 2
input symbols from {±1} and outputs n0 = 3 symbols from the same alphabet.
x3n = b2n b2n−1 b2n−2
x3n+1 = b2n+1 b2n−2
x3n+2 = b2n+1 b2n b2n−2
(a) Draw an implementation of the encoder based on delay elements and multi-
pliers.
(b) Draw the state diagram.
(c) Suppose that the serialized encoder output symbols are scaled so that the
resulting energy per bit is Eb and are sent over the discrete-time AWGN
channel of noise variance σ 2 = N0 /2. Derive an upper bound to the bit-error
probability assuming that the decoder implements the Viterbi algorithm.
exercise 6.9 (Convolutional encoder, decoder, and error probability) For the
convolutional code described by the state diagram of Figure 6.11:
(a) draw the encoder;
(b) as a function of the energy per bit Eb , upper bound the bit-error probability of
the Viterbi algorithm when the scaled encoder output sequence is transmitted
over the discrete-time AWGN channel of noise variance σ 2 = N0 /2.
−1| − 1, 1
t
State Labels
−1|1, 1 1| − 1, −1 t = (−1, −1)
1|1, −1
l = (−1, 1)
r = (1, −1)
l r
b = (1, 1)
−1| − 1, −1
−1|1, −1 1| − 1, 1
1|1, 1
Figure 6.11.
228 6. First layer revisited
exercise 6.10 (Viterbi for the binary erasure channel) Consider the convolu-
tional encoder of Figure 6.12 with inputs and outputs over {0, 1} and addition
modulo 2. Its output is sent over the binary erasure channel described by
PY |X (0|0) = PY |X (1|1) = 1 − ,
PY |X (?|0) = PY |X (?|1) = ,
PY |X (1|0) = PY |X (0|1) = 0,
⊕ x2j−1
bj
bj−1 bj−2
⊕ ⊕ x2j
Figure 6.12.
exercise 6.11 (Bit-error probability) In the process of upper bounding the bit-
error probability, in Section 6.4.2 we make the following step
∞ ∞
!
Es d
E[Ωj ] ≤ iQ a(i, d)
i=1
σ2
d=1
∞
∞
≤ iz d a(i, d).
i=1 d=1
(a) Instead of upper bounding the Q function as done above, use the results of
Section 6.4.1 to substitute a(i, d) and d with explicit functions of i and get
rid of the second sum. You should obtain
∞
!
Es (i + 4) i−1
Pb ≤ iQ 2 .
i=1
σ2
(b) Truncate the above sum to the first five terms and evaluate it numerically for
Es /σ 2 between 2 and 6 dB. Plot the results and compare to Figure 6.8.
6.7. Exercises 229
Miscellaneous exercises
x̄2j−1
+ T
b̄j ∈ F0 ={0, 1}
b̄j−1 b̄j−2
x̄2j
+ + T
× x2j−1
bj ∈ F− ={±1}
bj−1 bj−2
× × x2j
Figure 6.13.
Comment: The encoder of Figure 6.13b is linear over the field F− (see Exercise
6.5), whereas the encoder of Figure 6.13a is linear over F0 only if we omit the
output map T . The comparison of the two figures should explain why in this chapter
we have opted for the description of part (b) even though the standard description
of a convolutional encoder is as in part (a).
exercise 6.13 (Trellis with antipodal signals) Figure 6.14a shows a trellis
section labeled with the output symbols x2j−1 , x2j of a convolutional encoder. Notice
how branches that are the mirror-image of each other have antipodal output symbols
(symbols that are the negative of each other). The purpose of this exercise is to see
that when the trellis has this particular structure and codewords are sent through the
discrete-time AWGN channel, the maximum likelihood sequence detector further
simplifies (with respect to the Viterbi algorithm).
Figure 6.14b shows two consecutive trellis sections labeled with the branch metric.
Notice that the mirror symmetry of part (a) implies the same kind of symmetry
for part (b). The maximum likelihood path is the one that has the largest path
metric. To avoid irrelevant complications we assume that there is only one path
that maximizes the path metric.
230 6. First layer revisited
−1 −b −d
1,
+1 +1
1, 1 −a −c
(a) (b)
Figure 6.14.
(a) Let σj ∈ {±1} be the state visited by the maximum likelihood path at depth j.
Suppose that a genie informs the decoder that σj−1 = σj+1 = 1. Write down
the necessary and sufficient condition for the maximum likelihood path to go
through σj = 1.
(b) Repeat for the remaining three possibilities of σj−1 and σj+1 . Does the nec-
essary and sufficient condition for σj = 1 depend on the value of σj−1 and
σj+1 ?
(c) The branch metric for the branch with output symbols x2j−1 , x2j is
where yj is xj plus noise. Using the result of the previous part, specify a
maximum likelihood sequence decision for σj = 1 based on the observation
y2j−1 , y2j , y2j+1 , y2j+2 .
where {Bi }∞i=−∞ , Bi ∈ {1, −1}, is a sequence of independent and uniformly dis-
tributed bits and ψ(t) is a centered and unit-energy rectangular pulse of width T .
The communication channel between the transmitter and the receiver is the AWGN
channel of power spectral density N20 . At the receiver, the channel output Z(t) is
passed through a filter matched to ψ(t), and the output is sampled, ideally at times
tk = kT , k integer.
(a) Consider that there is a timing error, i.e. the sampling time is tk = kT −
τ where Tτ = 0.25. Ignoring the noise, express the matched filter output
observation wk at time tk = kT − τ as a function of the bit values bk and
bk−1 .
(b) Extending to the noisy case, let rk = wk + zk be the kth matched filter
output observation. The receiver is not aware of the timing error. Compute
the resulting error probability.
6.7. Exercises 231
(c) Now assume that the receiver knows the timing error τ (same τ as above) but
it cannot correct for it. (This could be the case if the timing error becomes
known once the samples are taken.) Draw and label four sections of a trellis
that describes the noise-free sampled matched filter output for each input
sequence b1 , b2 , b3 , b4 . In your trellis, take
into consideration the fact that
4
the matched filter is “at rest” before x(t) = i=1 bi ψ(t − iT ) enters the filter.
(d) Suppose that the sampled matched filter output consists of 2, 0.5, 0, −1. Use
the Viterbi algorithm to decide on the transmitted bit sequence.
exercise 6.15 (Simulation) The purpose of this exercise is to determine, by
simulation, the bit-error probability of the communication system studied in this
chapter. For the simulation, we recommend using MATLAB, as it has high-level func-
tions for the various tasks, notably for generating a random information sequence,
for doing convolutional encoding, for simulating the discrete-time AWGN channel,
and for decoding by means of the Viterbi algorithm. Although the actual simulation
is on the discrete-time AWGN, we specify a continuous-time setup. It is part of
your task to translate the continuous-time specifications into what you need for the
simulation. We begin with the uncoded version of the system of interest.
(a) By simulation, determine the minimum obtainable bit-error probability Pb of
bit-by-bit on a pulse train transmitted over the AWGN channel. Specifically,
the channel input signal has the form
X(t) = Xj ψ(t − jT ),
j
√
where the symbols are iid and take value in {± Es }, the pulse ψ(t) has unit
norm and is orthogonal to its T -spaced time translates. Plot Pb as a function
of Es /σ 2 in the range from 2 to 6 dB, where σ 2 is the noise variance. Verify
your results with Figure 6.8.
(b) Repeat with the symbol sequence
√ being the output of the convolutional encoder
of Figure 6.2 multiplied by Es . The decoder shall implement the Viterbi
algorithm. Also in this case you can verify your results by comparing with
Figure 6.8.
7 Passband communication
via up/down conversion:
Third layer
7.1 Introduction
We speak of baseband communication when the signals have their energy in some
frequency interval [−B, B] around the origin (Figure 7.1a). Much more common
is the situation where the signal’s energy is concentrated in [fc − B, fc + B] and
[−fc − B, −fc + B] for some carrier frequency fc greater than B. In this case,
we speak of passband communication (Figure 7.1b). The carrier frequency fc is
chosen to fulfill regulatory constraints, to avoid interference from other signals, or
to make the best possible use of the propagation characteristics of the medium
used to communicate.
W
f
−B 0 B
(a)
W W
f
−fc − B −fc −fc + B 0 fc − B fc fc + B
(b)
The purpose of this chapter is to introduce a third and final layer responsible for
passband communication. With this layer in place, the upper layers are designed
for baseband communication even when the actual communication happens in
passband.
example 7.1 (Regulatory constraints) Figure 7.2 shows the radio spectrum allo-
cation for the United States (October 2003). To get an idea about its complexity, the
chart is presented in its entirety even if it is too small to read. The interested reader
can find the original on the website of the (US) National Telecommunications and
Information Administration.
232
Administration, Office of Spectrum Management (October 2003).
US Department of Commerce, National Telecommunications and Information
Figure 7.2. Radio spectrum allocation in the United States, produced by the
Primary
Secondary
SERVICE
FIXED
MOBILE
SATELLITE
SATELLITE
Mobile
FIXED
ACTIVITY CODE
AERONAUTICAL
BROADCASTING
BROADCASTING
AERONAUTICAL
AERONAUTICAL
EXAMPLE
FIXED SATELLITE
RADIONAVIGATION
MOBILE SATELLITE
August 2011
AMATEUR SATELLITE
EARTH EXPLORATION
GOVERNMENT EXCLUSIVE
NON-GOVERNMENT EXCLUSIVE
Capital Letters
DESCRIPTION
MOBILE
MARITIME
SATELLITE
SATELLITE
SATELLITE
METEOROLOGICAL
METEOROLOGICAL
MOBILE SATELLITE
RADIONAVIGATION
of Frequency Allocations. Therefore, for complete information, users should consult the Table to determine the
This chart is a graphic single-point-in-time portrayal of the Table of Frequency Allocations used by the FCC and
RADIO SERVICES COLOR LEGEND
GOVERNMENT/NON-GOVERNMENT SHARED
RADIOLOCATION
SPACE RESEARCH
RADIODETERMINATION
STANDARD FREQUENCY
RADIONAVIGATION SATELLITE
Space research
3 kHz
(space-to-Earth)
exploration-
3 GHz
MARITIME
30GHz
335.4
Aeronautical
300 kHz
(radiobeacons)
(radiobeacons)
Radionavigation
30 MHz
3GHz
ASTRONOMY (passive) SATTELLITE (passive) Radiolocation Radiolocation 399.9 FIXED MOBILE 325
Amateur MOBILE SATELLITE
mobile
Maritime AERONAUTICAL
FIXED
33.0 Radionavigation
Radiolocation
FIXED MOBILE
Aeronautical
AMATEUR
EXPLORATION -
RADIONAVIGATION
exploration - ASTRONOMY
THIS CHART WAS CREATED
(active)
SPACE
RADIO
Radio
Space
FIXED MOBILE
LOCATION
(active)
location
RESEARCH
SATTELLITE (active)
research
satellite SATELLITE
4.4 LAND MOBILE
FIXED-
Mobile-
SPACE
satellite
SATELLITE
467.7375
RESEARCH
Meteorological
(space-to-Earth) (Earth-to-space)
4.438
(space-to-Earth)
(space-to-Earth)
(Earth-to-space)
FIXED LAND MOBILE
(space-to-Earth)
MOBILE-
SATELLITE
SATELLITE 43.69
(space-to-Earth)
(space-to-Earth)
FIXED-SATELLITE
Mobile
41.0
ISM - 40.68 ± .02 MHz
4.65
Maritime
Aeronautical
FIXED
FIXED- BROADCASTING-
BROADCASTING
LAND MOBILE
SATELLITE BROADCASTING
(TV CHANNELS 14 - 20)
FIXED
(space-to-Earth)
Mobile MOBILE
(space-to-Earth)
512.0
FIXED-SATELLITE
MOBILE FIXED
50.2 MOBILE RADIONAVIGATION
mobile
MOBILE
FIXED
MOBILE
FIXED-SATELLITE (Earth-to-space) RADIOLOCATION 608.0
Earth
51.4
(medical telemetry and
SATELLITE
exploration-
SPACE RESEARCH
52.6 614.0
EARTH EXPLORATION-SATELLITE (passive) SPACE RESEARCH (passive) 5.255 5.68
SPACE RESEARCH
54.25 AERONAUTICAL MOBILE (OR)
(passive) INTER- SATELLITE EARTH EXPLORATION-SATELLITE (passive)
55.78 5.73
RADIOLOCATION
Earth
(active)
(active)
(active)
EARTH
INTER- MOBILE
exploration-
RADIONAVIGATION
SATELLITE
MOBILE FIXED
Radiolocation
BROADCASTING
(active)
(active)
(active)
EARTH
exploration-
SATELLITE
57.0
AERONAUTICAL
RADIOLOCATION
EARTH EXPLORATION-SATELLITE (passive)
Space research
satellite (active)
Radiolocation
RADIONAVIGATION
INTER- 5.46
SATELLITE SPACE RESEARCH (passive)
6.2 14
(TV CHANNELS 38-51)
BROADCASTING
58.2
SPACE
Earth
EARTH EXPLORATION-
(active)
(active)
(active)
EARTH
RESEARCH
exploration-
SATELLITE
SATELLITE (passive)
RADIONAVIGATION
SPACE RESEARCH SPACE RESEARCH SPACE RESEARCH
EXPLORATION- EXPLORATION- EXPLORATION- EXPLORATION-
59.0 5.47 698.0
EARTH EXPLORATION-SATELLITE (passive)
RADIO- INTER- MARITIME BROADCASTING
LOCATION SATELLITE RADIONAVIGATION FIXED MOBILE
SPACE RESEARCH (passive)
MOBILE
FIXED
(TV CHANNELS 52-61)
6.525
Earth
(active)
(active)
EARTH
(active)
59.3
exploration-
SATELLITE
763.0
Space research
satellite (active)
AERONAUTICAL MOBILE (R)
SPACE RESEARCH
EXPLORATION-
MOBILE FIXED RADIOLOCATION SATELLITE 5.57 FIXED MOBILE 6.85
64.0 RADIOLOCATION MARITIME RADIONAVIGATION
BROADCASTING
INTER-
FIXED
Earth)
STANDARD FREQUENCY AND TIME SIGNAL (20 kHz)
MOBILE
FIXED
Earth)
6.525
(space-to-
FIXED-
809.0
MOBILE
FIXED-SATELLITE
SATELLITE
(space-to-
FIXED LAND MOBILE 72.0
74.0 (Earth-to-space) FIXED 849.0 FIXED MOBILE 7.3
6.7 AERONAUTICAL MOBILE BROADCASTING
20.05
BROADCASTING FIXED-SATELLITE (Earth-to-space)(space-to-Earth) FIXED 73.0
BROADCASTING 6.875 851.0 7.4
Earth)
Space
Earth)
MOBILE FIXED-SATELLITE (Earth-to-space)(space-to-Earth) LAND MOBILE
research
FIXED-
FIXED RADIO ASTRONOMY
(space-to-
SATELLITE FIXED
MOBILE
(space-to-
SATELLITE
SATELLITE 7.025 854.0 74.6
76.0 MOBILE FIXED-SATELLITE (Earth-to-space) FIXED FIXED LAND MOBILE FIXED MOBILE
Space research RADIO 7.075 894.0
(space-to-Earth)
RADIOLOCATION Amateur AERONAUTICAL MOBILE 74.8
ASTRONOMY MOBILE FIXED
77.0 896.0 AERONAUTICAL RADIONAVIGATION
mobile (R)
FIXED
(space-to-Earth)
RADIOLOCATION Amateur-satellite Amateur FIXED FIXED LAND MOBILE MOBILE
ASTRONOMY FIXED
except aeronautical
FIXED-SATELLITE (space-to-Earth) MOBILE-SATELLITE (space-to-Earth) Fixed FIXED LAND MOBILE MARITIME MOBILE
BROADCASTING
930.0
Space
RADIO
FIXED-
7.3
research
FIXED
MOBILE
MOBILE-
FIXED
MOBILE
SATELLITE
SATELLITE
(Earth-to-space)
FIXED-SATELLITE (space-to-Earth) Mobile-satellite (space-to-Earth) 8.815
ASTRONOMY
FIXED
(space-to-Earth)
ISM - 915.0± .13 MHz
931.0 AERONAUTICAL MOBILE (R)
84.0 METEOROLOGICAL Mobile-satellite
7.45 FIXED LAND MOBILE
FIXED-SATELLITE (space-to-Earth) FIXED 932.0 8.965
SATELLITE (space-to-Earth) (space-to-Earth) AERONAUTICAL MOBILE (OR)
RADIO 7.55 FIXED 9.04
935.0
FIXED-
ASTRONOMY FIXED-SATELLITE (space-to-Earth) Mobile-satellite (space-to-Earth) FIXED FIXED LAND MOBILE
FIXED
SATELLITE
FIXED
MOBILE
(Earth-to-space) (Earth-to-space)
7.75 940.0 9.4
METEOROLOGICAL-
BROADCASTING
RADIO
SPACE
EARTH
960.0
(passive)
(passive)
Mobile-satellite
8.025 88.0
SATELLITE
RESEARCH
EARTH EXPLORATION- FIXED SATELLITE
ASTRONOMY
(Earth-to-space)
9.9
EXPLORATION-
SATELLITE (space-to-Earth) (Earth-to-space) FIXED FIXED
92.0 (no airborne) 9.995
METEOROLOGICAL- Mobile-satellite
8.175 STANDARD FREQUENCY AND TIME SIGNAL (10 MHz)
FIXED-SATELLITE EARTH EXPLORATION- 1.005
RADIO- RADIO SATELLITE FIXED (Earth-to-space)
FIXED MOBILE (Earth-to-space) SATELLITE (space-to-Earth) AERONAUTICAL MOBILE (R)
(space-to-Earth) (no airborne) 1.01
LOCATION ASTRONOMY 8.215 AMATEUR
Mobile-satellite
94.0 EARTH EXPLORATION-SATELLITE FIXED-SATELLITE (Earth-to-space)
10.15
(space-to-Earth) (Earth-to-space) FIXED
(no airborne)
FIXED
RADIO FIXED
8.4
SPACE RESEARCH (deep space)(space-to-Earth) Space research (deep space)(space-to-Earth)
AERONAUTICAL
(R)
(active)
EARTH
(active)
SPACE
8.45
except
RADIONAVIGATION
ASTRONOMY
(FM RADIO)
RADIO-
SATELLITE
SPACE RESEARCH (space-to-Earth) FIXED
RESEARCH
FIXED
Mobile
LOCATION
EXPLORATION-
8.5
BROADCASTING
RADIOLOCATION
aeronautical mobile
94.1 Radiolocation
8.55 1164.0 11.175
RADIO
MARITIME MOBILE
Earth
AERONAUTICAL MOBILE (OR)
EARTH
(active)
(active)
satellite
Space
RADIO-
LOCATION ASTRONOMY
Radio-
SPACE
SATELLITE
(active)
(active)
(space-to-Earth)(space-to-space) RADIONAVIGATION
research
location
exploration -
LOCATION
EXPLORATION-
RESEARCH
95.0 1215.0 11.275
8.65 RADIONAVIGATION-
RADIOLOCATION Radiolocation SATELLITE AERONAUTICAL MOBILE (R)
Earth
(active)
EARTH
satellite
(active)
(active)
(active)
SPACE
RADIO-
(space-to-Earth)
SATELLITE
exploration-
RESEARCH
9.0
LOCATION
Space research
AERONAUTICAL RADIONAVIGATION
EXPLORATION-
Radiolocation (space-to-space)
11.4
RADIO
RADIO-
RADIO-
FIXED
RADIO-
MOBILE
108.0 FIXED
SATELLITE
9.2
ASTRONOMY
NAVIGATION-
LOCATION
MARITIME RADIONAVIGATION
NAVIGATION
Radiolocation 1240.0
100.0 9.3 11.6
Earth
RADIONAVIGATION
(active)
(active)
Meteorological Aids Radiolocation
EARTH
satellite
RADIO -
(active)
RADIO-
EARTH
SPACE
Amateur
(active)
SATELLITE
exploration-
SPACE
NAVIGATION
BROADCASTING
LOCATION
Space research
AERONAUTICAL
EXPLORATION-
RESEARCH
RADIO EXPLORATION- 9.5
RESEARCH 1300.0 12.1
ASTRONOMY SATELLITE
Earth
EARTH
(active)
(active)
satellite
(passive)
Space
RADIO-
Radio-
SPACE
SATELLITE
(active)
(active)
location
exploration -
research
(passive) AERONAUTICAL RADIONAVIGATION Radiolocation FIXED
LOCATION
EXPLORATION
RESEARCH
102.0 1350.0
9.8 12.23
RADIO RADIOLOCATION Radiolocation FIXED MOBILE RADIOLOCATION
MOBILE FIXED 10.0 1390.0
ASTRONOMY RADIOLOCATION Radiolocation Amateur FIXED MOBILE ** Fixed-satellite (Earth-to-space)
MARITIME
105.0 10.45 1392.0
AERONAUTICAL
RADIOLOCATION Radiolocation Amateur Amateur-satellite MOBILE ** MOBILE
SPACE FIXED
RADIONAVIGATION
RADIO 10.5 1395.0
RESEARCH MOBILE FIXED RADIOLOCATION LAND MOBILE (medical telemetry and medical telecommand)
ASTRONOMY 10.55 1400.0 117.975 13.2
(passive) FIXED EARTH EXPLORATION - SATELLITE SPACE RESEARCH AERONAUTICAL MOBILE (OR)
109.5 10.6 (passive) RADIO ASTRONOMY (passive)
AERONAUTICAL
EARTH SPACE RESEARCH (passive) EARTH EXPLORATION-SATELLITE (passive) 1427.0 13.26
SPACE EXPLORATION-
FIXED MOBILE (R)
RADIO LAND MOBILE LAND MOBILE Fixed AERONAUTICAL MOBILE (R)
Non-Federal Travelers Information Stations (TIS), a mobile service, are authorized in the 535-1705 kHz band.
Earth
MOBILE SATELLITE (space-to-Earth)
(active)
(active)
EARTH
satellite
116.0
Space
14.0
SPACE
(active)
(active)
SATELLITE
exploration -
research
EARTH MOBILE (R) 61
EXPLORATION -
RESEARCH
SPACE Radionavigation 1559.0 132.0125 AMATEUR AMATEUR SATELLITE
INTER- EXPLORATION- 13.4 RADIONAVIGATION-SATELLITE AERONAUTICAL
RESEARCH SATELLITE AERONAUTICAL AERONAUTICAL MOBILE (R) 14.25
SATELLITE (passive) (space-to-Earth)(space-to-space) RADIONAVIGATION
Earth
(passive)
Earth
AMATEUR
(active)
satellite
(active)
satellite
136.0
SPACE
Radio-
Space
exploration -
location
RADIO -
exploration -
1610.0
research
RESEARCH
LOCATION
122.25
FIXED MOBILE INTER-SATELLITE Amateur RADIODETERMINATION- AERONAUTICAL MOBILE SATELLITE AERONAUTICAL MOBILE (R) 14.35
satellite
1610.6 (space-to-Earth) (space-to-Earth) (space-to-Earth) (space-to-Earth)
RADIO
FIXED
14.99
RADIO -
FIXED-
(Earth-to-space)
Space
(Earth-to-space) 137.025
research
SATELLITE (Earth-to-space) STANDARD FREQUENCY AND TIME SIGNAL (15 MHz)
Radio -
MARITIME
LOCATION
location
ASTRONOMY
research
Mobile-satellite SPACE RESEARCH SPACE OPERATION MET. SATELLITE
SATELLITE
MOBILE-
FIXED-
Standard frequency
Radio
RADIO-
RADIO-
SATELLITE
(Earth-to-space)
15.01
SATELLITE
(space-to-Earth) (space-to-Earth)
SATELLITE
(space-to-Earth)
(space-to-Earth)
1613.8 (space-to-Earth)
NAVIGATION
astronomy
NAVIGATION-
AERONAUTICAL MOBILE SATELLITE
(space-to-Earth)
Space
14.0 Mobile-satellite RADIODETERMINATION- 137.175 AERONAUTICAL MOBILE (OR) 70
130.0 FIXED-SATELLITE (Earth-to-space) (space-to-Earth) SATELLITE (Earth-to-space) RADIONAVIGATION (Earth-to-space) MOBILE-SATELLITE SPACE RESEARCH SPACE OPERATION MET. SATELLITE
Mobile-satellite (Earth-to-space) 15.1
EARTH research (space-to-Earth) (space-to-Earth) (space-to-Earth) (space-to-Earth)
EXPLORATION- RADIO 14.2 1626.5 137.825
Mobile-satellite (space-to-Earth) FIXED-SATELLITE (Earth-to-space) MOBILE SATELLITE(Earth-to-space) Mobile-satellite SPACE RESEARCH SPACE OPERATION MET. SATELLITE 1605
SATELLITE ASTRONOMY
INTER-
FIXED
(space-to-Earth) (space-to-Earth) (space-to-Earth) (space-to-Earth) BROADCASTING
MOBILE
SATELLITE
(active) 14.4 1660.0 138.0 MOBILE BROADCASTING
134.0 Mobile-satellite FIXED-SATELLITE 1615
Fixed Mobile (Earth-to-space) MOBILE SATELLITE
AMATEUR - SATELLITE (Earth-to-space) RADIO ASTRONOMY (Earth-to-space)
Radio astronomy AMATEUR 14.5 FIXED MOBILE 15.8
136.0 FIXED Mobile Space research 1660.5
RADIO RADIO 14.7145 144.0
Amateur Amateur - satellite MOBILE Fixed Space research
RADIO ASTRONOMY SPACE RESEARCH (passive) AMATEUR AMATEUR- SATELLITE
LOCATION ASTRONOMY 146.0 FIXED
FIXED
BROADCASTING
151.5
FIXED MOBILE RADIO ASTRONOMY RADIOLOCATION 15.4 METEOROLOGICAL METEOROLOGICAL AIDS (Earth-to-space) MOBILE
155.5 AERONAUTICAL RADIONAVIGATION SATELLITE (space-to-Earth) (radiosonde) 150.05
EARTH EXPLORATION- SPACE 15.43 90
RADIO SATELLITE RESEARCH AERONAUTICAL FIXED-SATELLITE 1700.0 FIXED MOBILE FIXED
17.41
MOBILE METEOROLOGICAL 150.8
(Earth-to-space) Fixed FIXED
FIXED
ASTRONOMY (passive) (passive) RADIONAVIGATION SATELLITE (space-to-Earth) FIXED LAND MOBILE 17.48 1705
158.5 15.63 1710.0 152.855
MOBILE- FIXED- AERONAUTICAL RADIONAVIGATION LAND MOBILE
BROADCASTING
15.7 FIXED MOBILE 17.9
FIXED MOBILE SATELLITE SATELLITE Radiolocation RADIOLOCATION 1755.0 154.0 AERONAUTICAL MOBILE (R)
(space-to-Earth) (space-to-Earth)
16.6 SPACE OPERATION (Earth-to-space) MOBILE FIXED FIXED LAND MOBILE
Space research (deep space)(Earth-to-space)
FIXED
RADIO-
Radiolocation RADIOLOCATION 156.2475 17.97
MOBILE
AERONAUTICAL MOBILE (OR)
LOCATION
164.0 17.1 1850.0 MARITIME MOBILE Radiolocation
EARTH RADIO SPACE RESEARCH Radiolocation RADIOLOCATION MOBILE FIXED 156.725 FIXED
18.03
EXPLORATION- 17.2 MARITIME MOBILE (distress, urgency, safety and calling) 1800
SATELLITE (passive) ASTRONOMY (passive) 2000.0 18.068
MOBILE SATELLITE 156.8375 AMATEUR AMATEUR SATELLITE
MOBILE FIXED MARITIME MOBILE
Earth
167.0
(active)
EARTH
(Earth-to-space)
Space
RADIO-
SPACE
(active)
(active)
SATELLITE
Radio-
research
exploration-
FIXED-
location
LOCATION
157.0375
EXPLORATION-
satellite (active)
RESEARCH
INTER- 2020.0 MARITIME MOBILE FIXED 18.168
FIXED MOBILE SATELLITE 17.3 Mobile AMATEUR
SATELLITE FIXED-SATELLITE FIXED MOBILE 157.1875 110
(space-to-Earth) BROADCASTING-SATELLITE Radiolocation MOBILE except aeronautical mobile MARITIME MOBILE 18.78
(Earth-to-space) 2025.0
174.5 17.7 157.45 18.9
INTER- FIXED-SATELLITE (Earth-to-space) FIXED BROADCASTING
FIXED MOBILE 17.8 SPACE OPERATION FIXED LAND MOBILE 19.02 1900
SATELLITE FIXED FIXED-SATELLITE (space-to-Earth) (Earth-to-space) 161.575 FIXED
EARTH
SPACE
RSEARCH
SATELLITE
(space-to-space) MARITIME MOBILE
FIXED
174.8 18.3
MOBILE
(Earth-to-space)
EXPLORATION-
(space-to-space)
(Earth-to-space)
FIXED-SATELLITE (space-to-Earth)
(space-to-space)
SPACE RESEARCH EARTH 161.625 MARITIME MOBILE 19.68
INTER- EXPLORATION- 18.6
SATELLITE (passive) SPACE RESEARCH FIXED-SATELLITE EARTH EXPLORATION - 2110.0 LAND MOBILE 19.8 RADIOLOCATION
SATELLITE (passive) 161.775 FIXED
(passive) MOBILE FIXED
FIXED
EARTH
SPACE
163.0375 BROADCASTING 2065
SATELLITE
190.0 FIXED-SATELLITE (space-to-Earth) (space-to-space)
FIXED
RESEARCH
(line of sight only)
(space-to-Earth)
MOBILE
EXPLORATION-
(space-to-space)
(space-to-Earth)
(space-to-space)
21.85 MARITIME MOBILE
and
EARTH EXPLORATION- FIXED
Earth)
satellite
Standard
frequency
(space-to-
SATELLITE (passive)
time signal
(passive) MOBILE-SATELLITE (space-to-Earth) SPACE RESEARCH
2290.0 MOBILE 2107
FIXED 21.924 MOBILE
191.8 21.2 (space-to-Earth) MOBILE** FIXED AERONAUTICAL MOBILE (R)
RADIONAVIGATION SPACE RESEARCH EARTH EXPLORATION - MOBILE (deep space)
FIXED MOBILE except aeronautical mobile
FIXED MARITIME MOBILE
22.0
INTER- MOBILE (passive) SATELLITE (passive) 2300.0 173.2
FIXED
RADIONAVIGATION- FIXED Land mobile 2170
MOBILE
SATELLITE SATELLITE 21.4 Amateur 22.855
FIXED
FIXED 173.4 FIXED
MARITIME
MOBILE
MOBILE
SATELLITE 2305.0 MARITIME MOBILE
200.0 22.0
Amateur RADIOLOCATION MOBILE** FIXED FIXED MOBILE Mobile 23.0 MARITIME MOBILE
FIXED MOBILE** FIXED
(telephony)
RADIO EARTH SPACE RESEARCH 2310.0 174.0
22.21 BROADCASTING Radio- except aeronautical mobile (R)
EXPLORATION- RADIOLOCATION MOBILE FIXED SATELLITE location Mobile Fixed 2173.5
ASTRONOMY SATELLITE (passive) (passive) SPACE 2320.0 23.2 160
RADIO AERONAUTICAL MOBILE (OR) MOBILE (distress and calling)
EARTH
RESEARCH
(passive)
209.0 Radiolocation Fixed BROADCASTING - SATELLITE
SATELLITE
ASTRONOMY
FIXED
EXPLORATION-
MOBILE**
FIXED- (passive) 2345.0 MOBILE 23.35
RADIO BROADCASTING Radio- FIXED 2190.5
RADIOLOCATION MOBILE FIXED SATELLITE location Mobile Fixed
FIXED MOBILE SATELLITE 22.5 except aeronautical mobile
(Earth-to-space) ASTRONOMY FIXED MOBILE 2360.0 24.89
22.55 MOBILE Fixed RADIOLOC ATION MARITIME MOBILE
217.0 FIXED MOBILE INTER-SATELLITE 2390.0 AMATEUR AMATEUR SATELLITE MARITIME MOBILE
FIXED-SATELLITE (Earth-to-space) MOBILE AMATEUR 24.99 (telephony)
RADIO 23.55
FIXED
FIXED MOBILE 2395.0 STANDARD FREQ. AND TIME SIGNAL (25 MHz)
MOBILE
ASTRONOMY
FIXED
SPACE RESEARCH (passive)
23.6 25.01 2194
MOBILE
LAND MOBILE
MARITIME
RADIO SPACE RESEARCH EARTH EXPLORATION - AMATEUR
226.0 ASTRONOMY (passive) SATELLITE - (passive) MARITIME MOBILE 25.07
BROADCASTING
24.0 2417.0
Radiolocation 25.21 190
SPACE
EARTH
ASTRONOMY 25.33
(Passive)
(Passive)
Radiolocation MOBILE FIXED MOBILE except aeronautical mobile
SATELLITE
FIXED
RESEARCH
RADIO- Radio- 216.0 AERONAUTICAL
EXPLORATION-
Amateur 2483.5 MOBILE
Earth
231.5 location RADIODETERMINATION-
(active)
MOBILE SATELLITE
satellite
LOCATION except aeronautical
25.55
MOBILE
FIXED
FIXED Fixed RADIO ASTRONOMY RADIONAVIGATION
exploration -
MOBILE
FIXED MOBILE SATELLITE (space-to-Earth) (space-to-Earth) mobile
Land mobile
Radiolocation 24.25 2495.0 BROADCASTING 25.67 200
232.0 FIXED RADIODETERMINATION- MOBILE SATELLITE
217.0
MOBILE** MOBILE except
SPECTRUM OCCUPIED.
INTER-SATELLITE RADIOLOCATION-SATELLITE (Earth-to-space) MOBILE** FIXED Mobile FIXED
(space-to-Earth) 24.75 Fixed aeronautical mobile Amateur 26.48 2495
FIXED-SATELLITE MOBILE
satellite
SATELLITE (passive) AMATEUR
(passive)
(passive)
FIXED
Radio
FIXED
(Earth-to-space)
Space research
225.0
Earth exploration-
MOBILE**
238.0 MOBILE 26.96
astronomy
INTER-SATELLITE
25.25
RADIO- FIXED MOBILE 2690.0 except aeronautical mobile 2505
RADIO- 25.5 27.23
Mobile
FIXED-
FIXED MOBILE except aeronautical mobile
RADIO-
NAVIGATION NAVIGATION-
ISM –FIXED
SPACE RESEARCH
LOCATION
MOBILE
SATELLITE
(space-to-Earth)
EARTH
(passive)
RADIO
(passive)
ISM - 27.12 ± .163 MHz
SATELLITE
Aeronautical
27.41
EXPLORATION-
ASTRONOMY
EARTH
240.0
SPACE
FIXED LAND MOBILE
Inter-satellite
FIXED
SATELLITE
RESEARCH
(Earth-to-space)
MOBILE
AERONAUTICAL
(space-to-Earth)
MOBILE
(space-to-Earth)
EXPLORATION -
RADIOLOCATION 2700.0
INTER-SATELLITE
FIXED
MOBILE
RADIONAVIGATION
METEOROLOGICAL AERONAUTICAL MOBILE
24.125 ± 0.125
Amateur-satellite
241.0
FIXED
RADIOLOCATION RADIOASTRONOMY Amateur 27.0 Radiolocation AIDS RADIONAVIGATION
MOBILE
28.0
248.0 Inter-satellite FIXED MOBILE INTER-SATELLITE AMATEUR AMATEUR SATELLITE
Radioastronomy AMATEUR-SATELLITE AMATEUR 2900.0
FIXED
RADIOASTRONOMY (passive) FIXED-SATELLITE AERONAUTICAL Aeronautical Maritime
MOBILE
SATELLITE (passive) FIXED MOBILE 29.8 Radionavigation
252.0 (Earth-to-space) FIXED RADIONAVIGATION Mobile
RADIONAVIGATION-SATELLITE MOBILE-SATELLITE
(radiobeacons)
RADIO 29.5 29.89
2850 285
FIXED
RADIO-
(Earth-to-space) MOBILE
FIXED
ASTRONOMY
MOBILE
RADIO NAVIGATION
MARITIME
MOBILE-SATELLITE AERONAUTICAL MARITIME RADIONAVIGATION Aeronautical Radionavigation
NAVIGATION
FIXED-SATELLITE
Radiolocation
FIXED-SATELLITE 265.0 29.91
RADIOLOCATION
FIXED MOBILE RADIO ASTRONOMY (Earth-to-space) (Earth-to-space) MOBILE (R)
300 GHz
275.0 (Earth-to-space) FIXED (radiobeacons) (radiobeacons)
30 GHz
300 kHz
3 GHz
300 MHz
NOT ALLOCATED
3 MHz
30 MHz
example 7.3 (Refraction) Radio signals are refracted by the ionosphere sur-
rounding the Earth. Different layers of the ionosphere have different ionization
densities, hence different refraction indices. As a result, signals can be bent by a
layer or can be trapped between layers. This phenomenon concerns mainly the MF
and HF range (300 kHz to 30 MHz) but can also affect the MF through the LF and
VLF range. As a consequence, radio signals emitted from a ground station can be
bent back to Earth, sometimes after traveling a long distance trapped between layers
of the ionosphere. This mode of propagation, called sky wave (as opposed to ground
wave) is exploited, for instance, by amateur radio operators to reach locations on
Earth that could not be reached if their signals traveled in straight lines. In fact,
under particularly favorable circumstances, the communication between any two
regions on Earth can be established via sky waves. Although the bending caused by
the ionosphere is desirable for certain applications, it is a nuisance for Earth-to-
satellite communication. This is why satellites use higher frequencies for which the
ionosphere is essentially transparent (typically GHz range).
7.2. The baseband-equivalent of a passband signal 235
= xF (−f ).
1
In principle, the notation x∗F (f ) could mean (xF )∗ (f ) or (x∗ )F (f ), but it should be
clear that we mean the former because the latter is not useful when x(t) is real-valued,
in which case (x∗ )F (f ) = xF (f ).
236 7. Third layer
2
Note that h>, F (f ) is not an L2 function, but it can be made into one by setting it to
zero at all frequencies that are outside the support of xF (f ). Note also that we can
arbitrarily choose the value of h>, F (f ) at f = 0, because two functions that differ at a
single point are L2 equivalent.
7.2. The baseband-equivalent of a passband signal 237
|xF (f )|
f
−fc 0 fc
|x̂F (f )|
√
2a
f
0 fc
|xE,F (f )|
√
2a
f
0
xE (t) = x̂(t)e−j2πfc t
xE,F (f ) = x̂F (f + fc ).
Figure 7.3 depicts the relationship between |xF (f )|, |x̂F (f )|, and |xE,F (f )|. We
plot the absolute value to avoid plotting the real and the imaginary components.
We use dashed lines to plot |xF (f )| for f < 0 as a reminder that it is completely
determined by |xF (f )|, f > 0.
The operation that recovers x(t) from its baseband-equivalent xE (t) is
√
x(t) = 2 xE (t)ej2πfc t . (7.6)
The circuits to go from x(t) to xE (t) and back to x(t) are depicted in Figure 7.4,
where double arrows denote complex-valued signals. Exercises 7.3 and 7.5 derive
equivalent circuits that require only operations over the reals.
The following theorem and the two subsequent corollaries are important in that
they establish a geometrical link between baseband and passband signals.
238 7. Third layer
x(t) xE (t)
- √2h> (t) - × -
6
e−j2πfc t
(a)
xE (t) √ x(t)
- × - 2{·} -
6
ej2πfc t
(b)
theorem 7.5 (Inner product of passband signals) Let x(t) and y(t) be (real-
valued) passband signals, let x̂(t) and ŷ(t) be the corresponding analytic signals, and
let xE (t) and yE (t) be the baseband-equivalent signals (with respect to a common
carrier frequency fc ). Then
x, y = {x̂, ŷ} = {xE , yE }.
Note 1: x, y is real-valued, whereas x̂, ŷ and xE , yE are complex-valued in gen-
eral. This helps us see/remember why the theorem cannot hold without taking the
real part of the last two inner products. The reader might prefer to remember the
more symmetric (and more redundant) form {x, y} = {x̂, ŷ} = {xE , yE }.
Note 2: From the proof that follows, we see that the second equality holds also for
the imaginary parts, i.e. x̂, ŷ = xE , yE .
Proof Let x̂(t) = xE (t)ej2πfc t . Showing that x̂, ŷ = xE , yE is immediate:
x̂, ŷ = xE (t)ej2πfc t , yE (t)ej2πfc t = ej2πfc t e−j2πfc t xE (t), yE (t) = xE , yE .
To prove x, y = {x̂, ŷ}, we use Parseval’s relationship (first and last equality
below), we use the fact that the Fourier transform of x(t) = √12 [x̂(t) + x̂∗ (t)] is
xF (f ) = √12 [x̂F (f ) + x̂∗F (−f )] (second equality), that x̂F (f )ŷF (−f ) = 0 because
the two functions have disjoint support and similarly x̂∗F (−f )ŷF ∗
(f ) = 0 (third
7.2. The baseband-equivalent of a passband signal 239
equality), and finally that the integral over a function is the same as the integral
over the time-reversed function (fourth equality):
∗
x, y = xF (f )yF (f )df
1 & '& ∗ '
= x̂F (f ) + x̂∗F (−f ) ŷF (f ) + ŷF (−f ) df
2
1 & ∗
'
= x̂F (f )ŷF (f ) + x̂∗F (−f )ŷF (−f ) df
2
1 & ∗
'
= x̂F (f )ŷF (f ) + x̂∗F (f )ŷF (f ) df
2
∗
= x̂F (f )ŷF (f )df
= x̂, ŷ .
3
It would be a misnomer to call xE (t) a baseband signal if x(t) is not passband.
240 7. Third layer
Proof If g(t) satisfies the stated condition, then g(t)ej2πfc t has √ no negative fre-
quencies. Hence g(t)ej2πfc t is the analytic signal x̂(t) of x(t) = 2{g(t)ej2πfc t },
which implies that g(t) is the baseband-equivalent xE (t) of x(t).
Hereafter all passband signals are assumed to be real-valued as they represent
actual communication signals. Baseband signals can be signals that we use for
baseband communication on real-world channels or can be baseband-equivalents
of passband signals. In the latter case, they are complex-valued in general.
|bF (f )|
f
−B 0 B
(a) Information signal.
|xF (f )|
1
√ |b (f
2 F
+ fc )| 1
√ |b (f
2 F
− fc )|
f
−fc 0 fc
(b) DSB-SC modulated signal.
baseband information signal b(t) and the modulated signal x(t). The dashed parts
of the plots are meant to remind us that they can be determined from the solid
parts.
This modulation scheme is called “double-sideband” because of the two bands on
the left and right of fc , only one is needed to recover b(t). Specifically, we could
eliminate the sideband below fc ; and to preserve the conjugacy symmetry required
by real-valued signals, we would eliminate also the sideband above −fc and still
be able to recover the information signal b(t) from the resulting passband signal.
Hence, we could eliminate one of the sidebands and thereby reduce the bandwidth
and the energy by a factor 2. (See Example 7.11.) The SC (suppressed carrier)
part of the name distinguishes this modulation technique from AM (amplitude
modulation, see next example), which is indeed a double-sideband modulation with
carrier (at ±fc ).
example 7.10 (AM modulation) AM modulation is by far the most popular
member of the family of amplitude modulations. Let b(t) be the source signal,
and assume that it is zero-mean and |b(t)| ≤ 1 for all t. AM modulation of
b(t) is DSB-SC modulation of 1 + mb(t) for some modulation index m such
that 0 < m ≤ 1. Notice that 1 + mb(t) is always non-negative. By using this
fact, the receiver can be significantly simplified (see Exercise 7.7). The possibility
of building inexpensive receivers is what made AM modulation the modulation of
choice in early radio broadcasting. AM is also a double-sideband modulation but,
unlike √ √ at ±fc . We see√the carrier by expanding x(t) = (1+
DSB-SC, it has a carrier
mb(t)) 2 cos(2πfc t) = mb(t) 2 cos(2πfc t) + 2 cos(2πfc t). The carrier consumes
energy without carrying any information. It is the “price” that broadcasters are
willing to pay to reduce the cost of the receiver. The trade-off seems reasonable
given that there is one sender and many receivers.
The following two examples are bandwidth-efficient variants of double-sideband
modulation.
example 7.11 (Single-sideband modulation (SSB)) As in the previous example,
let b(t) be the real-valued baseband information signal. Let b̂(t) = (b h> )(t) be the
analytic-equivalent of b(t). We define x(t) to be the passband signal that has b̂(t) as
its baseband-equivalent (with respect to the desired carrier frequency). Figure 7.6
shows the various frequency-domain signals. A comparison with Figure 7.5 should
suffice to understand why this process is called single-sideband modulation. Single-
sideband modulation is widely used in amateur radio communication. Instead of
removing the negative frequencies of the original baseband signal we could remove
the positive frequencies. The two alternatives are called SSB-USB (USB stands for
upper side-band) and SSB-LSB (lower side-band), respectively. A drawback of SSB
is that it requires a sharp filter to remove the negative frequencies. Amateur radio
people are willing to pay this price to make efficient use of the limited spectrum
allocated to them.
example 7.12 (Quadrature amplitude modulation (QAM)) The idea consists
of taking two real-valued baseband information signals, say bR (t) and bI (t), and
forming the signal b(t) = bR (t) + jbI (t). As b(t) is complex-valued, its Fourier
242 7. Third layer
|bF (f )|
f
−B 0 B
(a) Information signal.
|b̂F (f )|
f
0 B
(b) Analytic-equivalent of b(t) (up to scaling).
√ |xF (f )| √
2|b̂F (−f − fc )| 2|b̂F (f − fc )|
f
−fc 0 fc
(c) SSB modulated signal.
|bF (f )|
f
−B 0 B
(a) Information signal.
|xF (f )|
1
√ |b (−f
2 F
− fc )| 1
√ |b (f
2 F
− fc )|
f
−fc 0 fc
(b) Modulated signal.
QAM over SSB is that it does not require a sharp filter to remove one of the two
sidebands. The drawback is that typically a sender has one, not two, analog signals
to send. QAM is not popular as an analog modulation technique. However, it is
a very popular technique for digital communication. The idea is to split the bits
into two streams, with each stream doing symbol-by-symbol on a pulse train to
obtain, say, bR (t) and bI (t) respectively, and then proceeding as described above.
(See Example 7.15.)
desired frequency band. This can be done quite effectively; we will see how. But if
we re-design the receiver starting with a new arbitrarily-selected orthonormal basis
for the new signal set, then we see that the n-tuple former as well as the decoder
could end up being totally different from the original ones. Using the results of
Section 7.2, we can find a flexible and elegant solution to this problem, so that we
can frequency-translate the signal’s band to any desired location without affecting
the n-tuple former and the decoder. (The encoder and the waveform former are
not affected either.)
Let wE,0 (t), . . . , wE,m−1 (t) be the baseband-equivalent signal constellation. We
assume that they belong to a complex inner product space and let ψ1 (t), . . . , ψn (t)
be an orthonormal basis for this space. Let ci = (ci,1 , . . . , ci,n )T ∈ Cn be the
codeword associated to wE,i (t), i.e.
n
wE,i (t) = ci,l ψl (t),
l=1
√
wi (t) = 2 wE,i (t)ej2πfc t .
The orthonormal basis for the baseband-equivalent signal set can be lifted up to
an orthonormal basis for the passband signal set as follows.
√
wi (t) = 2 wE,i (t)ej2πfc t
√ n
= 2 ci,l ψl (t)ej2πfc t
l=1
√ n
= 2 ci,l ψl (t)ej2πfc t
l=1
√ n
= 2 {ci,l } ψl (t)ej2πfc t
l=1
√ n
− 2 {ci,l } {ψl (t)ej2πfc t }
l=1
n
n
= {ci,l }ψ1,l (t) + {ci,l }ψ2,l (t), (7.7)
l=1 l=1
From (7.7), we see that the set {ψ1,1 (t), . . . , ψ1,n (t), ψ2,1 (t), . . . , ψ2,n (t)} spans
a vector space that contains the passband signals. As stated by the next
theorem, this set forms an orthonormal basis, provided that the carrier frequency
is sufficiently high.
7.3. The third layer 245
√
n
n
wi (t) = 2{wE,i (t)ej2πfc t } = {ci,l }ψ1,l (t) + {ci,l }ψ2,l (t).
l=1 l=1
Proof The last statement is (7.7). Hence (7.10) spans a vector space that contains
the passband signals. It remains to be shown that this set is orthonormal. From
Lemma 7.8, √ the baseband-equivalent signal of ψ1,l (t) is ψl (t). Similarly, by writing
ψ2,l (t) = 2{[jψl (t)]ej2πfc t }, we see that the baseband-equivalent of ψ2,l (t) is
jψl (t). From Corollary 7.7, ψ1,k (t), ψ1,l (t) = {ψk (t), ψl (t)} = 1{k = l},
showing that the set {ψ1,l (t) : l = 1, . . . , n} is made of orthonormal functions.
Similarly, ψ2,k (t), ψ2,l (t) = {jψk (t), jψl (t)} = {ψk (t), ψl (t)} = 1{k = l},
showing that also {ψ2,l (t) : l = 1, . . . , n} is made of orthonormal functions.
To conclude the proof, it remains to be shown that functions from the first
set are orthogonal to functions from the second set. Indeed ψ1,k (t), ψ2,l (t) =
{ψk (t), jψl (t)} = {−jψk (t), ψl (t)} = 0. The last equality holds for k = l
because ψk and ψl are orthogonal and it holds for k = l because ψk (t), ψk (t) =
ψk (t)2 is real.
From the above theorem, we see that if the vector space spanned by the
baseband-equivalent signals has dimensionality n, the vector space spanned by the
corresponding passband signals has dimensionality 2n. However, the number of
real-valued “degrees of freedom” is the same in both spaces. In fact, the coefficients
used in the orthonormal expansion of the baseband signals are complex, hence
with two degrees of freedom per coefficient, whereas those used in the orthonormal
expansion of the passband signals are real.
Next we re-design the receiver using Theorem 7.13 to construct an orthonor-
mal basis for the passband signals. The 2n-tuple former now computes Y1 =
(Y1,1 , . . . , Y1,n )T and Y2 = (Y2,1 , . . . , Y2,n )T , where for l = 1, . . . , n
and similarly
Y2,l = R(t), ψ2,l (t) (7.13)
√
= 2e−j2πfc t R(t), ψl (t) . (7.14)
246 7. Third layer
wE,i (t) √
c i ∈ Cn - ψl (t) - × - 2{·} - wi (t)
l = 1, . . . , n
6
waveform former
ej2πfc t
up-converter
(a) Transmitter back end.
·, ψl (t)
R(t) - × - - Y ∈ Cn
l = 1, . . . , n
6
√ n-tuple former
2e−j2πfc t
down-converter
(b) Receiver front end.
√
2 cos(2πfc t)
{ci } ∈ Rn ?
- ψl (t) - ×
l = 1, . . . , n
@
R
@ wi (t)
-
- ψl (t) - ×
{ci } ∈ Rn
l = 1, . . . , n
6
√
− 2 sin(2πfc t)
√
2 cos(2πfc t)
?
{Y } ∈ Rn
- ·, ψl (t) -
×
l = 1, . . . , n
R(t)
·, ψl (t)
× - -
l = 1, . . . , n
{Y } ∈ Rn
6
√
− 2 sin(2πfc t)
in a DSP, the programmer might be able to rely on functions that can cope with
complex numbers. If done with analog electronics, the real and the imaginary parts
are kept separate. This is shown in Figure 7.9, for the common situation where
the orthonormal basis is real-valued. There is no loss in performance in choosing
a real-valued basis and, if we do so, the implementation complexity using analog
circuitry is essentially halved (see Exercise 7.9).
We have reached a conceptual milestone, namely the point where working with
complex-valued signals becomes natural. It is worth being explicit about how and
why we make this important transition. In principle, we are only combining two
real-valued vectors of equal length into a single complex-valued vector of the same
length (see (7.15)). Because it is a reversible operation, we can always pack a
248 7. Third layer
example 7.14 (PSK signaling via complex-valued symbols) Consider the signals
wE (t) = sl ψ(t − lT )
l
√
w(t) = 2 wE (t)ej2πfc t ,
l
√ √
= 2 Eej(2πfc t+ϕl ) ψ(t − lT )
l
√ √
= 2 E ej(2πfc t+ϕl ) ψ(t − lT )
l
√
= 2E cos(2πfc t + ϕl )ψ(t − lT ).
l
$
Figure 7.10 shows a sample w(t) with ψ(t) = 1
T 1{0 ≤ t < T }, T = 1, fc T = 3
(there are three periods in a symbol interval T ), E = 12 , ϕ0 = 0, ϕ1 = π, ϕ2 = π
2,
and ϕ3 = 3π 2 .
If we plug sl = {sl } + j {sl } into wE (t) we obtain
!
√
w(t) = 2 ({sl } + j {sl }) ψ(t − lT ) e j2πfc t
l
√
= 2 ({sl } + j {sl }) ej2πfc t ψ(t − lT )
l
√
= 2 {sl }{ej2πfc t } − {sl } {ej2πfc t } ψ(t − lT )
l
7.3. The third layer 249
w(t) 0
−1
0 0.5 1 1.5 2 2.5 3 3.5 4
t
Figure 7.10. Sample PSK modulated signal.
√
= 2 {sl }ψ(t − lT ) cos(2πfc t)
l
√
− 2 {sl }ψ(t − lT ) sin(2πfc t). (7.18)
l
√ √
For a rectangular pulse ψ(t), 2ψ(t − lT ) cos(2πfc t) is orthogonal to 2ψ(t −
iT ) sin(2πfc t) for all integers l and i, provided that 2fc T is an integer.4 From
(7.18), we see that the PSK signal is the superposition of two PAM signals. This
view is not very useful for PSK, because {sl } and {sl } cannot be chosen inde-
pendently of each other.5 Hence the two superposed signals cannot be decoded
independently. It is more useful for QAM. (See next example.)
example 7.15 (QAM signaling via complex-valued symbols) Suppose that the
signaling method is as in Example 7.14 but that the symbols take value in a QAM
alphabet. As in Example 7.14, it is instructive to write the symbols in two ways.
If we write sl = al ejϕl , then proceeding as in Example 7.14, we obtain
!
√
w(t) = 2 al e ψ(t − lT ) e
jϕl j2πfc t
l
√
= 2 al e j(2πfc t+ϕl )
ψ(t − lT )
l
√
= 2 al ej(2πfc t+ϕl ) ψ(t − lT )
l
√
= 2 al cos(2πfc t + ϕl )ψ(t − lT ).
l
4
See the argument in Example 3.10. In practice, the integer condition can be ignored
because 2fc T is large, in which case the inner product between the two functions is
negligible compared to 1 – the
√ norm of both functions. For
√ a general bandlimited ψ(t),
the orthogonality between 2ψ(t − lT ) cos(2πfc t) and 2ψ(t − iT ) sin(2πfc t), for a
sufficiently large fc , follows from Theorem 7.13.
5
Except for 2-PSK, for which {sl } is always 0.
250 7. Third layer
Figure 7.11 shows a sample w(t) with ψ(t) and fc as in Example 7.14, with s0 =
√ π √ −1 1 √ −1 1
1 + j = 2ej 4 ,√s1 = 3 + j = 10ej tan ( 3 ) , s2 = −3 + j = 10ej(tan (− 3 )+π) ,
3π
s3 = −1 + j = 2ej 4 .
4
2
w(t)
0
−2
−4
0 0.5 1 1.5 2 2.5 3 3.5 4
t
Figure 7.11. Sample QAM signal.
but unlike for PSK, the {sl } and the {sl } of QAM can be selected indepen-
dently. Hence, the two superposed PAM signals√ can be decoded independently, with
no
√ interference between the two because 2ψ(t − lT ) cos(2πfc t) is orthogonal to
2ψ(t − iT ) sin(2πfc t). Using (5.10), it is straightforward to verify that the band-
width of the QAM signal is the same as that of the individual PAM signals. We con-
clude that the bandwidth efficiency (bits per Hz) of QAM is twice that of PAM.
Stepping back and looking at the big picture, we now view the physical layer of
the OSI model (Figure 1.1) for the AWGN channel as consisting of the three sub-
layers shown in Figure 7.12. We are already familiar with all the building blocks
of this architecture. The channel models “seen” by the first and second sub-layer,
respectively, still need to be discussed. New in these channel models is the fact
that the noise is complex-valued. (The signals are complex-valued as well, but we
are already familiar with complex-valued signals.)
Under hypothesis H = i, the discrete-time channel seen by the first (top) sub-
layer has input ci ∈ Cn and output
Y = ci + Z,
where, according to (7.16), (7.11) and (7.13), the lth component of Y is Y1,l +
jY2,l = ci,l + Zl and Zl = Z1,l + jZ2,l , where Z1,1 , . . . , Z1,n , Z2,1 , . . . , Z2,n is
a collection of iid zero-mean Gaussian random variables of variance N0 /2. We
have all the ingredients to describe the statistical behavior of Y via the pdf of
Z1,1 , . . . , Z1,n , Z2,1 , . . . , Z2,n , but it is more elegant to describe the pdf of the
complex-valued random vector Y . To find the pdf of Y , we introduce the random
7.3. The third layer 251
6
i ı̂
?
Encoder Decoder
discrete-time 6
ci Y
- -
?
6 n-Tuple
Waveform
Former Z ∼ NC (0, N0 In ) Former
6
baseband-equivalent RE (t)
wE,i (t)
- -
?
Up-
6
Down-
Converter NE (t) Converter
6
wi (t) R(t)
-
6
N (t)
vector Ŷ that consists of the (column) n-tuple Y1 = {Y } on top of the (column)
n-tuple Y2 = {Y }. This notation extends to any complex n-tuple: if a ∈ Cn
(seen as a column n-tuple), then â is the element of R2n consisting of {a} on
top of {a} (see Appendix 7.8 for an in-depth treatment of the hat operator). By
definition, the pdf of a complex random vector Y evaluated at y is the pdf of Ŷ
at ŷ (see Appendix 7.9 for a summary on complex-valued random vectors).
Hence,
fY |H (y|i) = fŶ |H (ŷ|i)
= fY1 ,Y2 |H ({y}, {y}|i)
= fY1 |H ({y}|i)fY2 |H ( {y}|i)
n
l=1 ({yl } − {ci,l })
2
1
= √ exp −
( πN0 )n N0
n
l=1 ( {yl } − {ci,l })
2
1
× √ exp −
( πN0 )n N0
1 y − ci 2
= exp − . (7.19)
(πN0 )n N0
252 7. Third layer
complex n-tuple can be obtained by adding the squares of the real components
and the squares of the imaginary components.) Similarly, ŷ, ĉi = {y, ci }.
In fact, if y = yR + jyI and c = cR + jcI are (column vectors) in Cn , then
{y, c} = yR T
cR + yIT cI , but this is exactly the same as ŷ, ĉ.6
We conclude that an ML decision rule for the complex-valued decoder-input
y ∈ Cn of Figure 7.12 is
ĤM L (y) = arg min y − ci
i
ci 2
= arg max {y, ci } −
.
i 2
Describing the baseband-equivalent channel model, as seen by the second sub-layer
of Figure 7.12, requires slightly more work. We do this in the next section for
completeness, but it is not needed in order to prove that the receiver structure of
Figure 7.12 is completely general (for the AWGN channel) and that it minimizes
the error probability. That part is done.
6
For an alternative proof that ŷ, ĉi = {y, ci }, subtract the two equations y −ci 2 =
y2 + ci 2 − 2{y, ci } and ŷ − ĉi 2 = ŷ2 + ĉi 2 − 2ŷ, ĉi and use the fact that
the hat on a vector has no effect on the vector’s norm.
7.4. Baseband-equivalent channel model 253
baseband-equivalent channel
(b)
which says that if a signal w(t) is passed through a filter of impulse response h(t)
and the filter output is multiplied by e−j2πfc t , we obtain the same as passing the
signal w(t)e−j2πfc t through the filter with impulse response h(t)e−j2πfc t . A direct
(time-domain) proof of this result is a simple exercise,7 but it is more insightful if
we take a look at what it means in the frequency domain. In fact, in the frequency
domain, the convolution on the left becomes wF (f )hF (f ), and the subsequent
multiplication by e−j2πfc t leads to wF (f + fc )hF (f + fc ). On the right side we
multiply wF (f + fc ) with hF (f + fc ).
The above relationship should not be confused with the following equalities that
hold for any constant c ∈ C
[w(t) h(t)]c = [w(t)c] h(t) = w(t) [h(t)c]. (7.21)
This holds because the left-hand side at an arbitrary time t is c times the integral
of the product of two functions. If we bring the constant inside the integral and
use it to scale the first function, we obtain the expression in the middle; whereas
we obtain the expression on the right if we use c to scale the second function. In
the derivation that follows, we use both relationships.
The up-converter, the actual channel, and the down-converter perform linear
operations, in the sense that their action on the sum of two signals is the sum of
the individual actions. Linearity implies that we can consider the signal and the
noise separately. We start with the signal part (assuming that there is no noise).
7
Relationship (7.20) is a form of distributivity law, like [a + b]c = [ac] + [bc].
254 7. Third layer
& '√
Ul = wi (t) h(t) 2e−j2πfc t , gl (t)
& √
'
= wi (t) 2 h(t) e−j2πfc t , gl (t)
√
= wi (t) 2e−j2πfc t h(t)e−j2πfc t , gl (t)
√
= (wi (t) 2e−j2πfc t ) h0 (t), gl (t)
∗
= (wE,i (t) + wE,i (t)e−j4πfc t ) h0 (t), gl (t)
= wE,i (t) h0 (t), gl (t), (7.22)
where in the second line we use (7.21), in the third we use (7.20), in the fourth we
introduce the notation
h0 (t) = h(t)e−j2πfc t ,
in the fifth we use
1 & ∗
'
wi (t) = √ wE,i (t)ej2πfc t + wE,i (t)e−j2πfc t ,
2
and in the sixth we remove the term
∗
wE,i (t)e−j4πfc t h0 (t),
which is bandlimited to [−2fc − B, −2fc + B] and therefore has no frequencies in
common with gl (t). By Parseval’s relationship, the inner product of functions that
have disjoint frequency support is zero.
From (7.22), for all wE,i (t) and all gl (t) that are bandlimited to [−B, B], the
noiseless output of Figure 7.13a is identical to that of Figure 7.13b.
Notice that, not surprisingly, the Fourier transform of h0 (t) is hF (f +fc ), namely
hF (f ) frequency-shifted to the left by fc .
The reader might wonder if h0 (t) is the same as the baseband-equivalent hE (t)
of h(t) (with respect to fc ). In fact it is not, but we can use h√
E (t)
2
instead of h0 (t).
The two functions are not the same, but it is straightforward to verify that their
Fourier transforms agree for f ∈ [−B, B].
Next we consider the noise alone. To specify NE (t), we need the following notion
of independent noises.8
definition 7.16 (Independent white Gaussian noises) NR (t) and NI (t) are
independent white Gaussian noises if the following two conditions are satisfied.
(i) NR (t) and NI (t) are white Gaussian noises in the sense of Definition 3.4.
(ii) For any two real-valued functions
h1 (t) and h2 (t)
(possibly the same), the
Gaussian random variables NR (t)h1 (t)dt and NI (t)h2 (t)dt are indepen-
dent.
The noise at the output of the down-converter has the form
ÑE (t) = ÑR (t) + jÑI (t) (7.23)
8
The notion of independence is well-defined for stochastic processes, but we do not model
the noise as a stochastic process (see Definition 3.4).
7.4. Baseband-equivalent channel model 255
with
√
ÑR (t) = N (t) 2 cos(2πfc t) (7.24)
√
ÑI (t) = −N (t) 2 sin(2πfc t). (7.25)
ÑR (t) and ÑI (t) are not independent white Gaussian noises in the sense of
Definition 7.16 (as can be verified by setting fc = 0), but we now show that they
do fulfill the conditions of Definition 7.16 when the functions used in the definition
are bandlimited to [−B, B] and B < fc .
Let hi (t), i = 1, 2, be real-valued L2 functions that are bandlimited to [−B, B]
and define
Zi = ÑR (t)hi (t)dt.
√
Zi , i = 1, 2, is Gaussian, zero-mean, and of variance N20 2 cos(2πfc t)hi (t)2 .
√
The function 2 cos(2πfc t)hi (t) is passband with baseband-equivalent hi (t). By
Definition 3.4 and Theorem 7.5,
N0 √ √
cov(Z1 , Z2 ) = 2 cos(2πfc t)h1 (t), 2 cos(2πfc t)h2 (t)
2
N0
= h1 (t), h2 (t)
2
N0
= h1 (t), h2 (t).
2
This proves that under the stated bandwidth limitation, ÑR (t) behaves as white
Gaussian noise of power spectral density N20 . The proof that the same is true for
√
ÑI (t) follows similar patterns, using the fact that − 2 sin(2πfc t)hi (t) is passband
with baseband-equivalent jhi (t). It remains to be shown that ÑR (t) and ÑI (t) are
independent noises in the sense of Definition 7.16. Let
Z3 = ÑI (t)h3 (t)dt.
where NR (t) and NI (t) are independent white Gaussian noises of spectral density
N0 /2.
This last characterization of NE (t) suffices to describe the statistic of U ∈ Ck ,
even when the gl (t) are complex-valued, provided they are bandlimited as specified.
For the statistical description of a complex random vector, the reader is referred
to Appendix 7.9 where, among other things, we introduce and discuss circularly
symmetric Gaussian random vectors (which are complex-valued) and prove that
the U at the output of Figure 7.13b is always circularly symmetric (even when the
gl (t) are not bandlimited to [−B, B]).
where NE (t) is complex white Gaussian noise of power spectral density N0 (N0 /2
in both real and imaginary parts) and θ ∈ [0, θmax ] accounts for the channel delay
and the time offset. For this section, the function sE (t) represents a training signal
known to the receiver, used to estimate θ, a, ϕ. Once estimated, these channel
parameters are used as the true values for the communication that follows. Next
we derive the joint ML estimates of θ, a, ϕ. The good news is that the solution
to this joint estimation problem essentially decomposes into three separate ML
estimation problems.
The derivation that follows is a straightforward generalization of what we have
done in Section 5.7, with the main difference being that signals are now complex-
valued. Accordingly, let Y = (Y1 , . . . , Yn )T be the random vector obtained by
7.5. Parameter estimation 257
projecting RE (t) onto the elements of an orthonormal basis9 for an inner product
space that contains sE (t − θ̂) for all possible values of θ̂ ∈ [0, θmax ]. The likelihood
function with parameters θ̂, â, ϕ̂ is
1 y−âejϕ̂ m(θ̂)2
−
f (y; θ̂, â, ϕ̂) = e N0
,
(πN0 )n
where m(θ̂) is the n-tuple of coefficients of sE (t − θ̂) with respect to the chosen
orthonormal basis.
A joint maximum likelihood estimation of θ, a, ϕ is a choice of θ̂, â, ϕ̂ that
maximizes the likelihood function or, equivalently, that maximizes any of the
following three expressions
− y − âejϕ̂ m(θ̂)2 ,
− y2 + âejϕ̂ m(θ̂)2 − 2 y, âejϕ̂ m(θ̂) ,
|âejϕ̂ |2
y, âejϕ̂ m(θ̂) − m(θ̂)2 . (7.27)
2
Notice that m(θ̂)2 = sE (t − θ̂)2 = sE (t)2 . Hence, for a fixed â, the second
term in (7.27) is independent of θ̂, ϕ̂. Thus, for any fixed â, we can maximize over
θ̂, ϕ̂ by maximizing any of the following three expressions
y, âejϕ̂ m(θ̂) ,
e−jϕ̂ y, m(θ̂) ,
e−jϕ̂ rE (t), sE (t − θ̂) , (7.28)
where the last line is justified by the argument preceding (5.18). The maximum of
e−jϕ̂ rE (t), sE (t − θ̂)
is achieved when θ̂ is such that the absolute value of rE (t), sE (t− θ̂) is maximized
and ϕ̂ is such that e−jϕ̂ rE (t), sE (t − θ̂) is real-valued and positive. The latter
happens when ϕ̂ equals the phase of rE (t), sE (t − θ̂). Thus
θ̂M L = arg max |rE (t), sE (t − θ̂)|, (7.29)
θ̂
Finally, for θ̂ = θ̂M L and ϕ̂ = ϕ̂M L , the maximum of (7.27) with respect to â is
achieved by
|âejϕ̂M L |2
âM L = arg maxy, âejϕ̂M L m(θ̂M L ) − m(θ̂M L )2
â 2
9
As in Section 5.7.1, for notational simplicity we assume that the orthonormal basis has
finite dimension n. The final result does not depend on the choice of the orthonormal
basis.
258 7. Third layer
E
= arg max âe−jϕ̂M L rE (t), sE (t − θ̂M L ) − â2
â 2
E
= arg max â|rE (t), sE (t − θ̂M L )| − â2 ,
â 2
where E is the energy of sE (t), and in the last line we use the fact that
is real-valued and positive (by the choice of ϕ̂M L ). Taking the derivative of
â|rE (t), sE (t − θ̂M L )| − â2 E2 with respect to â and equating to zero yields
K−1
sE (t) = cl ψ(t − lT ), (7.31)
l=0
where the symbols c0 , . . . , cK−1 and the pulse ψ(t) can be real- or complex-valued
and ψ(t) has unit norm and is orthogonal to its T -spaced translates.
The essence of what follows applies whether the n-tuple former incorporates
a correlator or a matched filter. For the sake of exposition, we assume that it
incorporates the matched filter of impulse response ψ ∗ (−t).
Once we have determined θ̂M L according to (7.29), we sample the matched filter
output at times t = θ̂M L + kT , k integer. The kth sample is
yk = αck + Zk . (7.33)
7.5. Parameter estimation 259
If N0 is not too large compared to the signal’s power, θ̂M L should be sufficiently
close to θ for (7.33) to be a valid model.
Next we re-derive the ML estimates of ϕ and a in terms of the matched filter
output samples. We do so because it is easier to implement the estimator in a
DSP that operates on the matched filter output samples rather than by analog
technology operating on the continuous-time signals. Using (7.31) and the linearity
K−1
of the inner product, we obtain rE (t), sE (t − θ̂M L ) = l=0 c∗l rE (t), ψ(t − lT −
θ̂M L ), and using (7.32) we obtain
K−1
rE (t), sE (t − θ̂M L ) = yl c∗l .
l=0
K−1
ϕ̂M L = ∠ yl c∗l .
l=0
It is instructive to interpret
K−1 ϕ̂M L without noise. In the absence of noise, θ̂M L = θ
K−1
and yk = αck . Hence l=0 yl c∗l = aejϕ l=0 cl c∗l = aejϕ E, where E is the energy
of the training sequence. From (7.30), we see that ϕ̂M L is the angle of ejϕ aE, i.e.
ϕ̂M L = ϕ.
Proceeding similarly, we obtain
K−1
| l=0 yl c∗l |
âM L = .
E
It is immediate to check that if there is no noise and ϕ̂M L = ϕ, then âM L = a.
Notice that both ϕ̂M L and âM L depend on the observation y0 , . . . , yK−1 only
K−1
through the inner product l=0 yl c∗l .
Depending on various factors and, in particular, on the duration of the trans-
mission, the stability of the oscillators and the possibility that the delay and/or
the attenuation vary over time, a one-time estimate of θ, a, and ϕ might not be
sufficient.
In Section 5.7.2, we have presented the delay locked loop to track θ, assuming
real-valued signals. The technique can be adapted to the situation of this section.
In particular, if the symbol sequence c0 , . . . , cK−1 that forms the training signal is
as in Section 5.7.2, once ϕ has been estimated and accounted for, the imaginary
part of the matched filter output contains only noise and the real part is as in
Section 5.7.2. Thus, once again, θ can be tracked with a delay locked loop.
The most critical parameter is ϕ because it is very sensitive to channel delay
variations and to instabilities of the up/down-converter oscillators.
example 7.17 A communication system operates at a symbol rate of 10 Msps
(mega symbols per second) with a carrier frequency fc = 1 GHz. The local oscillator
that produces ej2πfc t is based on a crystal oscillator and a phase locked loop (PLL).
The frequency of the crystal oscillator can only be guaranteed up to a certain
precision and it is affected by the temperature. Typical precisions are in the range
260 7. Third layer
where the approximation holds for small values of |Δϕ|. Assuming that |Δϕ| is
indeed small, the idea is to decode yk ignoring the rotation by Δϕ. With high
probability the decoded symbol ĉk equals ck and {yk ĉ∗k } ≈ Δϕ|ck |2 . The feedback
signal {yk ĉ∗k } can be used by the local oscillator to correct the phase error.
Alternatively, the decoder can use the feedback signal to find an estimate Δϕ ˆ
of Δϕ and to rotate yk by −Δϕ. ˆ This method works well also in the presence
of noise, assuming that the noise is zero-mean and independent from sample to
sample. Averaging over subsequent samples helps to mitigate the effect of the
noise.
Another possibility of tracking ϕ is to use a phase locked loop – a technique
similar to the delay locked loop discussed in Section 5.7.2 to track θ.
Differential encoding is a different technique to deal with a constant or slowly
changing phase. It consists in encoding the information in the phase difference
between consecutive symbols.
When the phase ϕk is either constant or varies slowly, as assumed in this section,
we say that the phase comes through coherently. In the next section, we will see
what we can do when this is not the case.
10
There is much literature on spread spectrum. The interested reader can find introduc-
tory articles on the Web.
262 7. Third layer
The steps to maximize the likelihood function mimic what we have done in the
previous section. Let ci be the codeword associated with wE,i (t) (with respect
to some orthonormal basis). Let y be the n-tuple former output. The likelihood
function is
1 y−âejϕ̂ cı̂ 2
−
f (y; ı̂, â, ϕ̂) = e N0
.
(πN0 )n
We seek the ı̂ that maximizes
1
g(cı̂ ) = max {y, âejϕ̂ cı̂ } − âejϕ̂ cı̂ 2
â,ϕ̂ 2
â2
= max â{e−jϕ̂ y, cı̂ } − cı̂ 2 .
â,ϕ̂ 2
The ϕ̂ that achieves the maximum is the one that makes e−jϕ̂ y, cı̂ real-valued
and positive. Let ϕ̂M L be the maximizing ϕ̂ and observe that {e−jϕ̂M L y, cı̂ } =
|y, cı̂ |. Hence,
â2
g(cı̂ ) = max â|y, cı̂ | − cı̂ 2 . (7.34)
â 2
â2
By taking the derivative of â|y, cı̂ | − 2 cı̂
2
with respect to â and setting to
zero, we obtain the maximizing â
|y, cı̂ |
âM L = .
cı̂ 2
Inserting into (7.34) yields
1 |y, cı̂ |2
g(cı̂ ) = .
2 cı̂ 2
Hence
that maximizes {y, cı̂ }/cı̂ . Next, assume that the channel can also rotate the
signal by an arbitrary phase ϕ (i.e. the channel multiplies the signal by ejϕ ). As we
increase the phase by π/2, the imaginary part of the new inner product becomes
the real part of the old (with a possible sign change). One way to make the decoder
insensitive to the phase, is to substitute {y, cı̂ } with |y, cı̂ |. The result is the
decoding rule (7.35).
example 7.19 (A bad choice √of signals) Consider m-ary phase-shift keying, i.e.
wE,i (t) = ci ψ(t), where ci = Es ej2πi/m , i = 0, . . . , m−1, and ψ(t) is a unit-norm
pulse. If we plug into (7.35), we obtain
7 √ j2πı̂/m 7
7y, Es e 7
ı̂M L = arg max √
ı̂ Es
7 7
7 −j2πı̂/m 7
= arg max 7e y, 17
ı̂
= arg max |y, 1|
ı̂
= arg max |y| ,
ı̂
which means that the decoder has no preference among the ı̂ ∈ H, i.e. the error
probability is the same independently of the decoder’s choice. In fact, a PSK
constellation is a bad choice for a codebook, because it conveys information in
the phase and the phase information is destroyed by the channel.
example 7.20 (A good choice) Two vectors in Cn that are orthogonal to each
other cannot be made equal by multiplying one of the two by a scalar aejϕ , which
was the underlying issue in Example 7.19. Complex-valued orthogonal signals
remain orthogonal after we multiply them by aejϕ . This suggests that they are a
good choice for the channel model assumed in this√section. Specifically, suppose
that the ith codeword ci ∈ Cm , i √ = 1, . . . , m, has Es at position i and is zero
elsewhere. In this case, |y, ci | = Es |yi | and
|y, ci |
ı̂M L = arg max √
i Es
= arg max |yi |.
i
(Compare this rule to the decision rule of Example 4.6, where the signaling scheme
is the same but there is neither amplitude nor phase uncertainty.)
264 7. Third layer
7.7 Summary
The fact that each passband signal (real-valued by definition) has an equivalent
baseband signal (complex-valued in general) makes it possible to separate the
communication system into two parts: a part (top two layers) that processes base-
band signals and a part (bottom layer) that implements the conversion to/from
passband. With the bottom layer in place, the top two layers are designed to
communicate over a complex-valued baseband AWGN channel. This separation
has several advantages: (i) it simplifies the design and the analysis of the top two
layers, where most of the system complexity lies; (ii) it reduces the implementation
costs; and (iii) it provides greater flexibility by making it possible to choose
the carrier frequency, simply by changing the frequency of the oscillator in the
up/down-converter. For instance, for frequency hopping (Example 7.18), as long
as the down-converter is synchronized with the up-converter, the top two layers
are unaware that the carrier frequency is hopping. Without the third layer in
place, we change the carrier frequency by changing the pulse, and the options that
we have in choosing the carrier frequency are limited. (If the Nyquist criterion
is fulfilled for |ψF (f + f1 )|2 + |ψF (f − f1 )|2 , it is not necessarily fulfilled for
|ψF (f + f2 )|2 + |ψF (f − f2 )|2 , f1 = f2 .)
Theorem 7.13 tells us how to transform an orthonormal basis of size n for the
baseband-equivalent signals into an orthonormal basis of size 2n for the corres-
ponding passband signals. The factor 2 is due to the fact that the former is
used with complex-valued coefficients, whereas the latter is used with real-valued
coefficients.
For mathematical convenience, we assume that neither the up-converter nor the
down-converter modifies the signal’s norm. This is not what happens in reality,
but the system-level designer (as opposed to the hardware designer) can make this
assumption because all the scaling factors can be accounted for by a single factor
in the channel model. Even this factor can be removed (i.e. it can be made to
be 1) without affecting the system-level design, provided that the power spectral
density of the noise is adjusted accordingly so as to keep the signal-energy to
noise-power-density ratio unchanged.
In practice, the up-converter, as well as the down-converter, amplifies the signals,
and the down-converter contains a noise-reduction filter that removes the out-of-
band noise (see Section 3.6). The transmitter back end (the physical embodi-
ment of the up-converter) deals with high power, high frequencies, and a variable
carrier frequency fc . The skills needed to design it are quite specific. It is very
convenient that the transmitter back end can essentially be designed and built
separately from the rest of the system and can be purchased as an off-the-shelf
device.
With the back end in place, the earlier stages of the transmitter, which perform
the more sophisticated signal processing, can be implemented under the most
favorable conditions, namely in baseband and using voltages and currents that are
in the range of standard electronics, rather than being tied to the power of the
transmitted signal. The advantage of working in baseband is two-fold: the carrier
frequency is fixed and working with low frequencies is less tricky.
7.8. Real- vs. complex-valued operations 265
11
Realistically, a specific back/front end implementation has certain characteristics that
limit its usage to certain applications. In particular, for the back end we consider its
gain, output power, and bandwidth. For the front end, we consider its bandwidth,
sensitivity, gain, and noise temperature.
266 7. Third layer
To remember the form of Â, observe that the top half of û is the real part of
Av, i.e. AR vR − AI vI . This explains the top half of Â. Similarly, the bottom half
of û is the imaginary part of Av, i.e. AR vI + AI vR , which explains the bottom half
of Â. The following lemma summarizes a number of useful properties.
lemma 7.21 The following properties hold
: = Âû
Au (7.39a)
u
+ v = û + v̂ (7.39b)
†
{u v} = û v̂ T
(7.39c)
u = û
2 2
(7.39d)
; = ÂB̂
AB (7.39e)
A + B = Â + B̂ (7.39f)
:† = (Â)
A T
(7.39g)
I<n = I2n (7.39h)
;
A−1 = Â−1 (7.39i)
†
det(Â) = | det(A)| = det(AA )
2
(7.39j)
u† Qu = {u† (Qu)} = ûT (Qu)
= ûT Q̂û,
where in the last two equalities we use (7.39c) and (7.39a), respectively.
Z = X + jY,
∂2
f{Z},{Z} (x, y) = F{Z},{Z} (x, y)
∂x∂y
the joint density of ({Z}, {Z}), and again associating with (x, y) the complex
number z = x + jy, we will call the function
FZ (z) =
Pr({Z1 } ≤ {z1 }, . . . , {Zn } ≤ {zn }, {Z1 } ≤ {z1 }, . . . , {Zn } ≤ {zn }),
∂ 2n
fZ (x1 + jy1 , . . . , xn + jyn ) = FZ (x1 + jy1 , . . . , xn + jyn ).
∂x1 · · · ∂xn ∂y1 · · · ∂yn
π
1 †
= n e−v v . (7.52)
π
Notice that although fV (v) is derived via v̂, it can be expressed in compact form as
a function of v. Notice also that fV (v) only depends on v. Hence ejθ V , which is
V with each component rotated by the angle θ, has the same pdf as V . Gaussian
random vectors that have this property are of particular interest to us for two
reasons: (i) all the noise vectors of interest to us are of this kind, and (ii) the pdf
of a Gaussian random vector that has this property takes on a simplified form. For
these two reasons, it is worthwhile investing in the study of such random vectors,
called circularly symmetric.
7.9. Appendix: Complex-valued random vectors 271
The above two definitions are related. To see how, let Z be circularly symmetric.
Then it is zero-mean and, for every θ ∈ [0, 2π),
we see that the pseudo-covariance matrix of ejθ Z vanishes as well. Finally, the
covariance matrix of ejθ Z is
& ' & '
E (ejθ Z)(ejθ Z)† = E ZZ † = KZ ,
proving that the covariance matrices are identical. We summarize this result in a
lemma.
12
A real-valued matrix A is skew-symmetric if AT = −A.
7.9. Appendix: Complex-valued random vectors 273
Proof From
E[Z̃] = AE[Z] + b,
it follows that
Z̃ − E[Z̃] = A(Z − E[Z]).
Hence we have
JZ̃ = E[(Z̃ − E[Z̃])(Z̃ − E[Z̃])T ]
= E{A(Z − E[Z])(Z − E[Z])T AT }
= AJZ AT = 0.
is the diagonal matrix obtained by raising to the power α the diagonal elements
1
of Λ. Clearly V is a zero-mean Gaussian random vector and Z = U Λ 2 V has the
274 7. Third layer
1
form (7.53) for the nonsingular matrix A = U Λ 2 ∈ Cn×n . The covariance matrix
of V is
& '
KV = E V V †
8 1 9
= E Λ− 2 U † ZZ † U Λ− 2
1
& '
= Λ− 2 U † E ZZ † U Λ− 2
1 1
= Λ − 2 U † KZ U Λ − 2
1 1
= Λ− 2 U † U ΛU † U Λ− 2
1 1
= Λ− 2 ΛΛ− 2
1 1
= In .
Finally, V is proper (by Lemma 7.32) and circularly symmetric (by Lemma 7.28).
This completes the proof for the case that KZ is nonsingular. If KZ is singular,
then some of its components are linearly dependent on other components. In this
case, we can write Z = B Z̃ for some B ∈ Cn×m , where Z̃ ∈ Cm consists of linearly
independent components of Z. The covariance matrix of Z̃ is nonsingular. Hence we
can find a nonsingular matrix à ∈ Cm×m such that Z̃ = ÃV with V ∼ NC (0, Im ).
Finally, Z = B Z̃ = B ÃV = AV has the desired form with A = B Ã ∈ Cn×m .
We are now in the position to derive a general expression for a circularly
symmetric Gaussian random vector Z of nonsingular covariance matrix.
theorem 7.34 The probability density function of a circularly symmetric Gaus-
sian random vector Z ∈ Cn of nonsingular covariance matrix KZ can be written as
1 † −1
fZ (z) = e−z KZ z . (7.54)
πn det(KZ )
Furthermore,
(A−1 z)† (A−1 z) = z † (A−1 )† A−1 z
= z † (AA† )−1 z
= z † KZ−1 z, (7.57)
where in the second equality we use the fact that, for nonsingular n × n matrices,
(AB)−1 = B −1 A−1 and (A† )−1 = (A−1 )† . Inserting (7.56) and (7.57) into (7.55)
yields (7.54).
The above theorem justifies one of the two claims we have made at the beginning
of this appendix, specifically that the pdf of a circularly symmetric Gaussian
random vector takes on a simplified form. (Compare (7.54) and (7.41) when m̂ = 0,
keeping in mind (7.42) to compute KẐ from KX , KY , and KXY .) The next
theorem justifies the other claim: that the complex-valued noise vectors of interest
to us, those at the output of Figure 7.13b, are Gaussian and circularly symmetric.
theorem 7.35 Let NE (t) = NR (t) + jNI (t), where NR (t) and NI (t) are inde-
pendent white Gaussian noises of spectral density N0 /2. For any collection of
L2 functions gl (t), l = 1, . . . , k that belong to a finite-dimensional inner product
space V, the complex-valued random vector Z = (Z1 , . . . , Zk )T , Zl = NE (t), gl (t),
is circularly symmetric and Gaussian.
Proof Let ψ1 (t), . . . , ψn (t) be an orthonormal basis for V. First consider the
random vector V = (V1 , . . . , Vn )T , where Vi = NE (t), ψi (t). It is straightforward
n
to check that V ∼ NC (0, In ). Every gl (t) can be written as gl (t) = j=1 cl,j ψj (t),
where cl,j ∈ C. By the linearity of the inner product, Z = AV , where A ∈ Ck×n is
the matrix that has (c∗l,1 , . . . , c∗l,n ) in its lth row. By Lemma 7.33, Z is circularly
symmetric with KZ = AA† .
7.10 Exercises
Exercises for Section 7.2
(b) Draw the box diagram of an implementation that uses only real-valued
signals.
exercise 7.3 (Equivalent √ representations) A real-valued passband signal x(t)
can be written as x(t) = 2{xE (t)ej2πfc t }, where xE (t) is the baseband-equivalent
signal (complex-valued in general) with respect to the carrier frequency fc . Also,
a general complex-valued signal xE (t) can be written in terms of two real-valued
signals, either as xE (t) = u(t) + jv(t) or as α(t) exp(jβ(t)).
(a) Show that a real-valued passband signal x(t) can always be written as
xEI (t) cos(2πfc t) − xEQ (t) sin(2πfc t)
and relate xEI (t) and xEQ (t) to xE (t). Note: This formula can be used at the
sender to produce x(t) without doing complex-valued operations. The signals
xEI (t) and xEQ (t) are called the in-phase and the quadrature components,
respectively.
(b) Show that a real-valued passband signal x(t) can always be written as
a(t) cos[2πfc t + θ(t)]
and relate xE (t) to a(t) and θ(t). Note: This explains why sometimes people
make the claim that a passband signal is modulated in amplitude and in phase.
(c) Use part (b) to find the baseband-equivalent of the signal
x(t) = A(t) cos(2πfc t + ϕ),
where A(t) is a real-valued lowpass signal. Verify your answer with Example
7.9 where we assumed ϕ = 0.
exercise 7.4 (Passband) Let fc be a positive carrier frequency and consider
an arbitrary real-valued function w(t). You can visualize its Fourier transform as
shown in Figure 7.14.
(a) Argue that there are two different functions, a1 (t) and a2 (t), such that, for
i = 1, 2,
√
w(t) = 2{ai (t) exp(j2πfc t)}.
This shows that, without some constraint on the input signal, the operation
performed by the circuit of Figure 7.4b is not reversible, even in the absence
of noise. This was already pointed out in the discussion preceding Lemma 7.8.
(b) Argue that if we limit the input of Figure 7.4b to signals a(t) such that
aF (f ) = 0 for f < −fc , then the circuit of Figure 7.4a will retrieve a(t)
when fed with the output of Figure 7.4b.
(c) Find an example showing that the condition of part (b) is necessary. (Can
you find an example with a real-valued a(t)?)
(d) Argue that if we limit the input of Figure 7.4b to signals a(t) that are real-
valued, then the input of Figure 7.4b can be retrieved from the output. Hint
1: we are not claiming that the circuit of Figure 7.4a will retrieve a(t).
Hint 2: You may argue in the time domain or in the frequency domain.
7.10. Exercises 277
If you argue in the time domain, you can assume that a(t) is continuous.
In the frequency-domain argument, you can assume that a(t) has finite
bandwidth.
|wF (f )|
f
−fc 0 fc
Figure 7.14.
exercise 7.5 (From passband to baseband via real-valued √ operations) Let the
signal xE (t) be bandlimited to [−B, B] and let x(t) = 2{xE (t)ej2πfc t }, where
0 < B < fc . Show that the circuit of Figure 7.15, when fed with x(t), recovers the
real and imaginary part of xE (t). (The two boxes are ideal lowpass filters of cutoff
frequency B.) Note: The circuit uses only real-valued operations.
√
2 cos(2πfc t)
?
{xE (t)}
- × - 1{−B ≤ f ≤ B} -
x(t)
- × - 1{−B ≤ f ≤ B} -
{xE (t)}
6
√
− 2 sin(2πfc t)
Figure 7.15.
exercise 7.6 (Reverse engineering) Figure 7.16 shows a toy passband signal.
(Its carrier frequency is unusually the horizontal time scale, which is 1 ms per
square and the vertical scale is 1 unit per square. Specify the three layers of a
transmitter that generates the given signal, namely the following.
(a) The carrier frequency fc used by the up-converter.
(b) The orthonormal basis used by the waveform former to produce the baseband-
equivalent signal wE (t).
(c) The symbol alphabet, seen as a subset of C.
(d) An encoding map, the encoder input sequence that leads to w(t), the bit rate,
the encoder output sequence, and the symbol rate.
278 7. Third layer
w(t)
Figure 7.16.
√
exercise 7.7 (AM receiver) Let x(t) = (1 + mb(t)) 2 cos(2πfc t) be an AM
modulated signal as described in Example 7.10. We assume that 1 + mb(t) > 0,
that b(t) is bandlimited to [−B, B], and that fc > 2B.
√
(a) Argue that the envelope of |x(t)| is (1 + mb(t)) 2 (a drawing will suffice).
(b) Argue that with a suitable choice of components, the output in Figure 7.17 is
essentially b(t). Hint: Draw, qualitatively, the voltage on top of R1 and that
on top of R2 .
(c) As an alternative approach, prove that if we pass the signal |x(t)| through
an ideal lowpass filter of cutoff frequency f0 , we obtain 1 + mb(t) scaled by
some factor. Specify a suitable interval for f0 . Hint: Expand | cos(2πfc t)|
as a Fourier series. No need to find explicit values for the Fourier series
coefficients.
C2
x(t) C1 R1 R2 output
Figure 7.17.
exercise 7.8 (Alternative down-converter) Assuming that all the ψl (t) are
bandlimited to [−B, B] and that 0 < B < fc , show that the n-tuple former output
remains unchanged if we substitute the down-converter of Figure 7.8b with the
block diagram of Figure 7.4a.
exercise 7.9 (Real-valued implementation) Draw a block diagram for the imple-
mentation of the transmitter and receiver of Figure 7.8 by means of real-valued
operations. Unlike in Figure 7.9, do not assume that the orthonormal basis is real-
valued.
7.10. Exercises 279
(a) Suppose X and Y are real-valued iid random variables with probability density
function fX (s) = fY (s) = c exp(−|s|α ), where α is a parameter and c = c(α)
is the normalizing factor.
(i) Draw the contour of the joint density function for α = 0.5, α = 1, α = 2,
and α = 3. Hint: For simplicity, draw the set of points (x, y) for which
fX,Y (x, y) equals the constant c2 (α)e−1 .
(ii) For which value of α is the joint density function invariant under rota-
tion? What is the corresponding distribution?
(b) In general we can show that if X and Y are iid random variables and
fX,Y (x, y) is circularly symmetric, then X and Y are Gaussian. Use the
following steps to prove this.
(i) Show that if X and Y are iid and fX,Y (x, y) is circularly symmetric
then fX (x) fY (y) = ψ(r) where ψ is a univariate function and r =
x2 + y 2 .
(ii) Take the partial derivative with respect to x and y to show that
fX (x) ψ (r) f (y)
= = Y .
x fX (x) r ψ(r) y fY (y)
(iii) Argue that the only way for the above equalities to hold is that they be
f (x)
f (y)
equal to a constant value, i.e. x fXX (x) = rψψ(r)
(r)
= y fYY (y) = − σ12 .
(iv) Integrate the above equations and show that X and Y should be Gaus-
sian random variables.
Miscellaneous exercises
The symbols are mapped into a signal via symbol-by-symbol on a pulse train,
where the pulse is real-valued, normalized, and orthogonal to its shifts by multiples
of T . The channel adds white Gaussian noise of power spectral density N20 . The
receiver implements an ML decoder. For the two systems, determine (if possible)
and compare the following.
(c) The variance σ 2 of the noise seen by the decoder. Note: when the symbols are
real-valued, the decoder disregards the imaginary part of Y . In this case, what
matters is the variance of the real part of the noise.
(d) The symbol-to-noise power ratio σEs2 . Write them also as a function of the
power P and N0 .
(e) The bandwidth.
(f ) The expression for the signals at the output of the waveform former as a
function of the bit sequence produced by the source.
(g) The bit rate R.
Summarize, by comparing the two systems from a user’s point of view.
exercise 7.12 (Smoothness of bandlimited signals) We show that a continuous
signal of small bandwidth cannot vary much over a small interval. (This fact is used
in Exercise 7.13.) Let w(t) be a finite-energy continuous-time passband signal and
let wE (t) be its baseband-equivalent signal. We assume that wE (t) is bandlimited
to [−B, B] for some positive B.
(a) Show that the baseband-equivalent of w(t − τ ) can be modeled as wE (t − τ )ejφ
for some φ.
(b) Let hF (f ) be the frequency response of the ideal lowpass-filter, i.e. hF (f ) = 1
for |f | ≤ B and 0 otherwise. Show that
wE (t + τ ) − wE (t) = wE (ξ)[h(t + τ − ξ) − h(t − ξ)]dξ. (7.58)
• • • • •
5 4 3 2 1
d
Figure 7.18.
plus noise. Choose the vector β = (β1 , β2 , . . . , βL )T that maximizes the energy
|rE (t)| dt, subject to the constraint β = 1. Hint: Use the Cauchy–Schwarz
2
(c) Assume that f0 − B2 = B and consider the infinite set of functions {ψ(t −
1
lT )}l∈Z . Do they form an orthonormal set for T = 2B ? (Explain.)
(d) Determine all possible values of f0 − 2 so that {ψ(t − lT )}l∈Z forms an
B
1
orthonormal set for T = 2B .
pF (f )
1
f
−f0 − B
2
−f0 + B
2
f0 − B
2
f0 + B
2
Figure 7.19.
where the latter is the periodic extension of wF (f ). Prove that for all f ∈ R,
wF (f ) = w̃F (f )hF (f ).
− −
Hint: Write wF (f ) = wF (f ) + wF+
(f ) where wF (f ) = 0 for f ≥ 0 and
+
wF (f ) = 0 for f < 0 and consider the support of wF +
(f − 2T
k
) and that of
−
wF (f − 2T ), k integer.
k
where
1 t
h(t) = sinc cos(2πfc t)
T 2T
7.10. Exercises 283
l 1
is the inverse Fourier transform of hF (f ) and fc = 2T + 4T is the center
& l l+1 '
frequency of the interval 2T , 2T . Hint: Neglect convergence issues, use the
Fourier series to write
w̃F (f ) = l. i. m. Ak ej2πf T k
k
hF (f )
f [M Hz]
−15 −10 10 15
Figure 7.20.
Bibliography
284
Bibliography 285
[19] T. M. Cover and J. A. Thomas, Elements of Information Theory. New York: Wiley,
2nd edn, 2006.
[20] R. G. Gallager, Information Theory and Reliable Communication. New York: Wiley,
1968.
[21] D. MacKay, Information Theory, Inference, and Learning Algorithms. New York:
Cambridge University Press, 2003.
[22] S. Lin and D. J. Costello, Error Control Coding. Englewood Cliffs: Prentice-Hall,
2nd edn, 2004.
[23] T. Richardson and R. Urbanke, Modern Coding Theory. New York: Cambridge
University Press, 2008.
[24] C. Shannon, “A mathematical theory of communication,” Bell System Tech. J.,
vol. 27, pp. 379–423 and 623–656, 1948.
[25] H. Nyquist, “Thermal agitation of electric charge in conductors,” Physical Review,
vol. 32, pp. 110–113, July 1928.
[26] A. W. Love, “Comment: On the equivalent circuit of a receiving antenna,” IEEE
Antenna’s and Propagation Magazine, vol. 44, pp. 124–125, October 2001.
[27] D. Slepian, “On bandwidth,” Proceedings of the IEEE, vol. 64, pp. 292–300, March
1976.
Index
286
Index 287