Cs-2 Nptel Notes
Cs-2 Nptel Notes
Cs-2 Nptel Notes
Lecture - 01
Introduction to Digital Communication Systems
Hello, welcome to another module in this massive open online course. So, this course will
introduce you to the typical structure and working of a digital communication system.
So, this module will focus on introduction to digital communications systems and if you look
at typical digital communication system; A schematic diagram of a digital communication
system will be as follows.
1
(Refer Slide Time: 01:31)
I have a message signal that I want to communicate m(t). In a typical analog signal, you simply
modulate it and transmit this message signal m(t); However, in a digital communication system,
the process is very elaborate. First it has to go it has to has to pass through an encoder which
basically the encoder converts the message into a stream of bits.
So, what the encoder is doing is taking your message and it is converting into the stream of bits
and you might have seen in previous modules, this is achieved through the first sampling
followed by quantization of the samples and representing the quantized samples, alright, using
an appropriate number of bits..
Now, this stream of bits from the encoder is given as input to a digital modulation scheme
which is passed through the; which is passed through a channel. This then subsequently passes
through a channel. The channel can be various types, it can be a wire line, two state pair or a
coaxial cable. It can be also be a wireless channel that is it can be a radio channel. We will look
at some of these different channel types. The output is then passed through a demodulator.
So, we have modulation operation at the transmitter and at the receiver we have a demodulator.
So, at this point now you have the demodulated bits. So, basically you have 0, 1, 1, 0, 1, 1.
Now notice that purposefully I have written a different set of bits, for instance if you look at
the input the fifth bit is a 0, but the fifth bit at the output is a 1 and this is a very important point
The demodulated stream of the bits need not be exactly equal to the transmitted stream of the
bits or the modulated stream of bits. This arises from bit errors at the receiver.
2
Bit errors are a major challenge in digital communication system. It is desirable to have
minimum number of bit errors, that is to lower or minimize the probability of bit error. The
probability of bit error that is the probability with which a received bit is in error is a very
important metric to characterize the performance of communication system. This arises due to
various factors such as the noise at the receiver the feeding wire the fading channel if the
channel is wireless.
The stream of bits from a demodulator is followed by a decoder which gives you your estimated
signal 𝑚
̂ (𝑡).
3
(Refer Slide Time: 07:50)
Now, the key parts of the theory digital communications systems are basically how to design
the various modules. Namely what is the optimal transmit processing. how to receive for
instance what is the optimal filtering operation that can be employed in the transmitter what is
the optimal modulation that can be employed at the transmitter.
What are the modulation schemes for instance when you look at transmission? One thing you
can look at is transmit filter what is the energy or power of the transmit signal. How to design
a filter such that the signal power is maximized noise power is minimized. So, that the signal
to noise power ratio is maximized what is the proper performance of the different modulation
schemes for instance what is the probability of error associated with the modulation schemes
which characterizes the ultimate performance.
So, there are several issues which we are going to look at through the various modules.
Now, how to transmit the signal? In a digital communication signal, the transmitted signal
corresponds to a series of symbols.
4
(Refer Slide Time: 10:11)
Let us take a simple example to start quickly. We take the information symbol 0 we map it. We
have 2 information; binary information symbol 0 and information symbol 1. In order to transmit
the signal these have to be mapped to a physical signal. So, we have to map it to some voltage
levels. Suppose 0 is mapped to +A and 1 is mapped to -A.
So, let us say these are 2 symbols belonging to the digital constellation.
So, there are 2 different voltage level minus A and plus A and both of them are opposite in
phase. So, one has phase 0 the other has phase 180 degree alright plus A and minus A.
5
We have to 2 different phases. So, binary phase constellation; This is known as the binary phase
shift keying in constellation which is one of the simplest constellations we will look at. There
will be several others as we proceed through the various modules. This mapping operation is
termed as digital modulation.
Although conventionally we can simply call it as modulation more specifically this is digital
modulation as we are basically mapping the information bits belonging into a digital
constellation.
Suppose we have a bit-stream 0,1,1,0,01…..., this stream can be mapped to digitally modulates
symbols by assigning +A to 0 and -A to 1. This gives us the stream of digitally modulated
symbols A, -A, -A, A, A, -A.
Now, how do you transmit this; obviously, we cannot just simply transmit the stream of
digitally modulated symbols. I cannot simply transmit voltage level. I have to transmit it over
some signal right a some physical signal that can be transmitted over the channel, alright. So,
this is this signal, this particular template signal that is used to transmit the; this voltage levels
of the digital modulation termed as a pulse.
6
(Refer Slide Time: 16:25)
This particular signal is termed as a pulse. How do you transmit the digital modulated symbols?
So, we transmit the digitally modulated symbols using a pulse. Suppose a pulse 𝑃𝑇 (𝑡).Pulse
can be of height 1 from -T/2 to T/2. T is the width of the pulse.
7
(Refer Slide Time: 18:15)
Suppose the first symbol is 𝑎0 .This is multiplied by the original pulse. This results in 𝑎0 𝑃𝑇 (𝑡).
For the Second symbol the pulse is shifted by T and then, the symbol is multiplied by the shifted
pulse this gives, 𝑎1 𝑝𝑇 (𝑡 − 𝑇). For the third symbol, the pulse is shifted by 2T, this gives us:
𝑎2 𝑃𝑇 (𝑡 − 2𝑇). And the process is continued for further symbols.
So, what you are doing is corresponding to the kth pulse, shifting the pulse by kT times and
multiplying it by ak. Each symbol ak is transmitted in the kth duration.
8
The kth symbol is transmitted as 𝑎𝑘 𝑃𝑇 (𝑡 − 𝑘𝑇)
𝑥(𝑡) = ∑ 𝑎𝑘 𝑃𝑇 (𝑡 − 𝑘𝑇)
𝑘=−∞
So, basically in this module we have seen a brief introduction to a digital communication
system at typical schematic diagram illustrating the various components or modules at the
9
transmitter and receiver and all also incorporating the also showing the place of the relevance
of the channel in this typical schematic of a digital communication system and also we have
seen overview or brief introduction to the process of mapping or modulation of the information
bits to a digital modulation consultation followed by the transmitted the structure of a typical
digital digitally modulated communication signal employing a pulse shaping filter, alright.
So, this PT which is basically your pulse is a filter. It can be implemented as a filter. This is
also known as pulse shipping filter, alright. So, we will stop here and look at a; look at other
aspect in the subject with modules.
Thank you.
10
Principles of Communication Systems - Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture - 02
Spectrum of Transmitted Digital Communication Signal, Wide Sense Stationarity
Hello, welcome to another module in this massive open online course. In this module, let
us start looking at the spectrum of the transmitted signal in the digital communication
signal. In the last module we saw that a digital transmitted signal is given by:
𝑥(𝑡) = ∑ 𝑎𝑘 𝑃𝑇 (𝑡 − 𝑘𝑇)
𝑘=−∞
11
(Refer Slide Time: 01:53)
In general, PT(t) can be any pulse. In this example a rectangular pulse has been considered
The kth bit can randomly be 0 or 1. Thus ak can randomly be A or -A depending on the kth
bit. Now, we want to characterize the spectrum of the transmitted signal x(t). The PT(t) in
general can be any pulse but for this example we consider a rectangular pulse given by:
1 𝑖𝑓 |𝑡| ≤ 𝑇/2
𝑃𝑇 (𝑡) = {
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
12
(Refer Slide Time: 04:19)
𝑇𝑠𝑖𝑛(𝜋𝐹𝑇)
𝑃𝑇 (𝑡) ↔ 𝑇𝑠𝑖𝑛𝑐𝐹𝑇 =
𝜋𝐹𝑇
Now, we will invoke our knowledge of the properties of probability. Since ak is random,
we will set probability ak equals A equal to half, and probability ak equals -A equal to half.
So, we are assuming this ak and A are equiprobable. Now when we use our knowledge of
probability if you calculate the expected value of ak that will be:
13
𝐸{𝑎𝑘 } = 0.5 × 𝐴 + 0.5 × (−𝐴)
Now, we will also assume that the symbols are independently distributed. I.e. symbols ak
are IID (Independent identically distributed). Identical distribution means each takes
minus A or plus A with probability half. Independent means basically these random
variables are independent, and it follows from independence that they are uncorrelated.
For the digital signals in our example E{ak} and E{am} are equal to zero thus,
𝐸{𝑎𝑘 𝑎𝑚 } = 0
14
(Refer Slide Time: 09:07)
Now, let us find the spectra. Let us employ a conventional approach. Whenever we want
to find the spectrum of a signal, we use the Fourier transform all right.
15
(Refer Slide Time: 12:13)
∞
𝑋(𝑓) = ∫ 𝑥(𝑡)𝑒 −𝑗2𝜋𝐹𝑡 𝑑𝑡
−∞
∞
𝐸{𝑋(𝑓)} = ∫ 𝐸{𝑥(𝑡)}𝑒 −𝑗2𝜋𝐹𝑡 𝑑𝑡 = 0 (𝑠𝑖𝑛𝑐𝑒 𝐸{𝑥(𝑡)} = 0)
−∞
16
Thus, the expectation value or the average value of the Fourier transform of the transmitted
signal is zero. This gives rise to a fallacy as this would mean we do not need a signal in
order to transmit (as the average value of the signal is zero). This is because Fourier
transform cannot be used to calculate the spectrum of a random signal.
To calculate the spectrum of a random signal we use another technique which uses power
spectral density.
17
Power spectral density (PSD) gives us the spectral distribution of power of a random signal
In order to compute PSD, we first need to compute the autocorrelation of the signal.
For a wide sense stationary (WSS) process the autocorrelation is defined as,
18
We will continue with this discussion in the next module.
19
Principles of Communication Systems - Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture – 03
Spectrum of Transmitted Digital Communication Signal, Autocorrelation Function
and Power Spectral Density
Hello, welcome to another module in this massive open online course. So, we are looking
at the autocorrelation function of the transmitted digital communication signal that is,
∞ ∞
20
(Refer Slide Time: 01:35)
∞ ∞
= 𝐸{ ∑ ∑ 𝑎𝑘 𝑎𝑚 𝑃𝑇 (𝑡 − 𝑘𝑇)𝑃𝑇 (𝑡 + 𝜏 − 𝑚𝑇)}
𝑘=−∞ 𝑚=−∞
21
∞ ∞
22
(Refer Slide Time: 05:05)
Therefore,
= 𝐴2
23
(Refer Slide Time: 06:33)
This is our autocorrelation function which depends only on t as well. Thus, the resulting
process is not wide sense stationary. We would now like to convert this process into a wide
24
sense stationary process because WSS signals are very helpful for study and we can
calculate their PSD toc characterize the spectral distribution of power.
In order to achieve this, we introduce a random delay to. This gives us,
𝑥(𝑡) = ∑ 𝑎𝑘 𝑃𝑇 (𝑡 − 𝑘𝑇 − 𝑡𝑜 )
𝑘=−∞
25
(Refer Slide Time: 10:15)
1
(𝑡 )
𝑓𝑇𝑜 𝑜 = { 𝑇 𝑖𝑓(0 ≤ 𝑡𝑜 ≤ 𝑇)
0 𝑜𝑡𝑒𝑟ℎ𝑤𝑖𝑠𝑒
26
∞
To take the average we multiply the expression by the distribution of the random delay.
∞ ∞
= 𝑃𝑑 ∑ ∫ 𝑓𝑇0 (𝑡𝑜 )𝑃𝑇 (𝑡 − 𝑘𝑇 − 𝑡𝑜 )𝑃𝑇 (𝑡 + 𝜏 − 𝑘𝑇 − 𝑡𝑜 )𝑑𝑡𝑜
𝑘=−∞ −∞
27
(Refer Slide Time: 13:47)
∞ 𝑇
𝑃𝑑
= ∑ ∫ 𝑃𝑇 (𝑡 − 𝑘𝑇 − 𝑡𝑜 )𝑃𝑇 (𝑡 + 𝜏 − 𝑘𝑇 − 𝑡𝑜 )𝑑𝑡𝑜
𝑇 0
𝑘=−∞
28
𝑡𝑜 + 𝑘𝑇 = 𝑡̃𝑜 𝑎𝑛𝑑 𝑑𝑡𝑜 = 𝑑𝑡̃𝑜
∞(𝑘+1)𝑇
𝑃𝑑
= ∑ ∫ 𝑃𝑇 (𝑡 − 𝑡̃𝑜 )𝑃𝑇 (𝑡 + 𝜏 − 𝑡̃𝑜 )𝑑𝑡̃𝑜
𝑇 𝑘𝑇
𝑘=−∞
Each integral in the above sum is going from kT to (k+1) T. and k itself goes from negative
infinity to infinity. Thus, the sum can be replaced by a single integral going from −∞ 𝑡𝑜 ∞
29
The summation now equals:
𝑃𝑑 ∞
= ∫ 𝑃𝑇 (𝑡 − 𝑡̃𝑜 )𝑃𝑇 (𝑡 + 𝜏 − 𝑡̃𝑜 )𝑑𝑡̃𝑜
𝑇 −∞
Then we substitute,
Note here that 𝑡𝑜 ′ goes from ∞ to −∞ (reverse order) but there will be minus sign in 𝑑𝑡𝑜 ′
Which will reverse the integration range again (Refer Slide Time: 20:19)
30
Thus, the autocorrelation will now be,
𝑃𝑑 ∞
= ∫ 𝑃𝑇 (𝑡𝑜′ )𝑃𝑇 (𝑡𝑜 ′ − 𝜏)𝑑𝑡𝑜 ′
𝑇 −∞
The integral in the above expression is independent of t and is in fact the autocorrelation
of PT(t) and can be termed as ‘𝑅𝑃𝑇 𝑃𝑇 (𝜏)’.
31
𝑃𝑑
𝑅𝑥𝑥 (𝜏) = 𝑅 (𝜏)
𝑇 𝑃𝑇 𝑃𝑇
32
Fourier transform of the autocorrelation will give the power spectral density of the signal.
Thus,
𝑃𝑑
𝑆𝑥𝑥 (𝑓) = 𝑆 (𝑓)
𝑇 𝑃𝑇 𝑃𝑇
Where, 𝑆𝑥𝑥 (𝑓) is the power spectral density of the transmitted signal and 𝑆𝑃𝑇 𝑃𝑇 (𝑓) is the
energy spectral density of the pulse (since the pulse is deterministic).
Remember if you take the Fourier transform the autocorrelation function of a deterministic
signal such as this pulse, you will get the energy spectral density. For a random signal it
gives the power spectral density.
In this module we have shown that the power spectral density of the transmitted signal
depends on the energy spectral density of the pulse, given by the expression above.
So, we stop here and look at this further in the subsequent modules.
33
Principles of Communication Systems - Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture – 04
Spectrum of Transmitted Digital Communication Signal, Relation to Energy
Spectral Density, Introduction to AWGN Channel
Hello, welcome to another module in this massive open online course on communication
We are looking at the power spectral density of a transmitted signal. In the last module we
derived the following expression for the autocorrelation of a transmitted signal:
𝑃𝑑
𝑅𝑥𝑥 (𝜏) = 𝑅 (𝜏)
𝑇 𝑃𝑃
Where 𝑅𝑃𝑃 (𝜏) is the autocorrelation function of the pulse. In general P(t) can be any pulse
not necessarily a rectangular pulse which has ben taken as example.
34
(Refer Slide Time: 01:13)
35
∞
𝑅𝑃𝑃 (𝜏) = ∫ 𝑃(𝑡)𝑃(𝑡 − 𝜏) 𝑑𝑡
−∞
Energy spectral density of the pulse is the Fourier transform of the autocorrelation function
of the pulse,
∞
𝑆𝑃𝑃 (𝑓) = ∫ 𝑅𝑃𝑃 (𝑡)𝑅𝑃𝑃 (𝑡 − 𝜏) 𝑑𝑡
−∞
∞
𝑃(𝑓) = ∫ 𝑃(𝑡)𝑒 −𝑗2𝜋𝑓𝑡 𝑑𝑡
−∞
36
(Refer Slide Time: 05:18)
For a wide sense stationary process, the power spectral density is given by Fourier
transform of the autocorrelation function. i.e.,
∞
𝑆𝑥𝑥 (𝑓) = ∫ 𝑅𝑥𝑥 (𝜏)𝑒 −𝑗2𝜋𝑓𝜏 𝑑𝜏
−∞
37
Thus we can say:
𝑃𝑑
𝑆𝑥𝑥 (𝑓) = 𝑆 (𝑓)
𝑇 𝑃𝑃
𝑃𝑑
= |𝑃(𝑓)|2
𝑇
So, basically what we have is a very interesting result that the power spectral density of
the transmitted signal of transmitted signal is proportional to energy spectral density of
pulse. This is a very important result that is the power spectral density of the transmitted
digital communication signal is basically proportional to the energy spectral density of the
pulse P. This i-s an important property which helps us characterize what is the power what
is the spectral distribution of power of the transmitted digital communication signal.
38
(Refer Slide Time: 08:54)
For a simple example consider the rectangular pulse 𝑃𝑇 (𝑡) defined by,
39
Fourier transform of this pulse is given by:
∞
𝑃𝑇 (𝑓) = ∫ 𝑃𝑇 (𝑡)𝑒 −𝑗2𝜋𝑓𝑡 𝑑𝑡
−∞
40
= 𝑇𝑠𝑖𝑛𝑐(𝑓𝑇)
𝑇𝑠𝑖𝑛𝑐(𝜋𝑓𝑇)
=
𝜋𝑓𝑇
𝑃𝑑
𝑃𝑆𝐷 = |𝑃 (𝑓)|2
𝑇 𝑇
This completes our analysis of power spectral density of the transmitted signal.
41
(Refer Slide Time: 13:36)
And now let us look at the properties of the digital communication channel. How do we
model a digital communication. Now, simply put the channel is the medium through which
the transmitted signal propagates from the signal propagates or signal traverses from the
transmitter to receiver in a communication system.
42
(Refer Slide Time: 15:41)
For instance, we can have some examples. Common examples are a for instance your
telephone lines, coaxial cables which connect basically your set top boxes okay all right.
Coaxial cables which transmit which are used for the transmission of TV signals correct
or your cable basically. And also the wireless channel, when there is no physical or there
is no particular guided propagation medium for the electromagnetic waves.. In fact, this is
turning out to be the most dominant mode of communication there is for instance your all
your broadcast services such as AM, FM your cellular at all this thing, cellular, Bluetooth,
Wi-Fi set all of these are based on the wireless channel.
43
(Refer Slide Time: 17:47)
Suppose we have a transmitted signal 𝑥(𝑡) which passes through a channel and at the
receiver we have a signal 𝑦(𝑡).
44
One of the simplest ways to model a communication channel is AWGN or Additive White
Gaussian Noise channel
This is a simple but very useful model that is used not only in digital communication but
also in analog communications.
45
Where 𝑛(𝑡) is the noise which is added to the transmitted signal (hence the name). And of
course, there are other aspect that is the white and the Gaussian which need to be defined
in order to complete the definition of the noise as well as the definition of this channel. So,
we will look at these aspects in the next module.
46
Principles of Communication Systems - Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture - 05
Additive White Gaussian Noise (AWGN) Properties, Gaussian Noise, White Noise
In this module we will look at the properties of Additive white gaussian noise.
In an AWGN channel, noise is added to the transmitted signal 𝑥(𝑡) to get the received
signal 𝑦(𝑡).
47
(Refer Slide Time: 00:59)
A popular model for a random noise process is the Gaussian random process model.
48
(Refer Slide Time: 02:45)
𝑁(𝑡) is considered to be a gaussian random process if the statistics of 𝑁(𝑡) of all orders
are gaussian.
This means, f we look at random noise samples, 𝑁(𝑡1 ), 𝑁(𝑡2 )…. 𝑁(𝑡𝐾 )
49
(Refer Slide Time: 04:26)
50
(Refer Slide Time: 06:23)
If this distribution follows multivariate gaussian density for all values of 𝑘 then it will be
considered as gaussian random process.
For a random process to be gaussian the above should be valid for all possible values of 𝑘
not just particular values of 𝑘.
51
Most naturally occurring random processes are Gaussian random process. To model the
thermal noise the circuits, we use Gaussian random process. This is one of the most popular
models popular noise models for a digital communication system and is employed almost
throughout a dominant fraction of literature.
If a Gaussian random process is wide sense stationary then it is also strict sense stationary.
This property is not true in general for random processes, but it holds for Gaussian random
processes.
52
So, strict sense stationary processes are a strict subset of white sense stationary processes.
So, if you look at the set of random processes, if we look at the set of random processes
then there are several random processes that are wide sense stationary, but only a few of
them are strict sense stationary. So, typically if it is strict sense stationary, it implies wide
sense stationary. Typically, wide sense stationarity does not because wide sense
stationarity is a much more relaxed condition. So, typically wide sense stationarity does
not imply strict sense stationary.
Strict sense stationary processes will be a subset of wide sense stationary process. The
criteria for wide sense stationary process are as follows,
𝐸{𝑁(𝑡)} = 𝜇 ∀ 𝑡
Thus, a process is wide sense stationary if it’s expected value is independent of time and
autocorrelation process depends only on the time shift between two samples.
53
(Refer Slide Time: 15:10)
Now let us look at the property of white noise. This property is motivated by the
observation that, noise generally has a very low temporal correlation. This is because noise
is in general very erratic, unlike a smooth signal.
54
(Refer Slide Time: 18:11)
𝜂
𝑅𝑁𝑁 (𝜏) = 𝐸{𝑁(𝑡)𝑁(𝑡 + 𝜏)} = 𝛿(𝑡)
2
55
(Refer Slide Time: 21:09)
𝑅𝑁𝑁 (𝜏) is an impulse at time 𝜏 = 0 scaled by 𝜂/2. This means value of the autocorrelation
is zero for 𝜏 ≠ 0.
This means that noise samples at two different time instants (𝜏 ≠ 0) are uncorrelated.
56
(Refer Slide Time: 25:44)
Power spectral density of the noise signal is given by the Fourier transform of the
autocorrelation function. i.e.,
𝜂
𝑆𝑁𝑁 (𝑓) =
2
Fourier transform of the delta function gives unity and thus we obtain that PSD is
uniformly distributed over the entire frequency range.
57
(Refer Slide Time: 29:01)
This is similar to white light which is a combination of radiation of all frequencies, hence
the name white noise.
58
(Refer Slide Time: 30:41)
Note that these three properties are independent of each other and do not follow from one
another. For a channel to be called AWGN channel it has to have all of the three properties
(i) Additive noise ii) Gaussian distribution of noise, iii) White noise or uniform distribution
of power over the entire frequency range.
59
(Refer Slide Time: 32:39)
Let me just quickly draw a simple schematic. This is a very simple yet a very powerful
channel model which can be used to characterize a general digital communication channel.
AWGN channel implies that basically noise is additive.
So, the noise is additive in addition if the noise is white and Gaussian this is known as an
additive white Gaussian noise channel. One of the simplest, one of the most frequently
used and one of the most popular and also one of the most practically applicable channel
models or models for a typical digital communication system.
60
So, we will stop here; and based on this model, we will analyse the performance look at
optimal schemes for this digital communication system, for a digital communication
system and also their performance in the subsequent modules.
61
Principles of Communication Systems - Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture – 06
Structure of Digital Communication Receiver, Receive
Filter, Signal-to-Noise Power Ratio (SNR)
Welcome to another module in this massive open online course, in this module we will
start looking at a digital communication receiver.
So, let us start looking at the operation and implementation of a digital communication
receiver.
At the receiver end the received signal 𝑦(𝑡) which is the sum of transmitted signal 𝑥(𝑡)
and noise signal 𝑛(𝑡), is passed through a LTI filter with impulse response ℎ(𝑡). On the
output of this filter (let’s call it 𝑟(𝑡)) a sampling operation is performed with a sampling
time T (which is the symbol rate). The transmitted signal is extracted from the sampling
of the signal 𝑟(𝑡)
62
(Refer Slide Time: 05:41)
𝑟(𝑡) will be given by the convolution of the received signal with the impulse response of
the filter.
∞
𝑟(𝑡) = ∫ 𝑦(𝑡 − 𝜏)ℎ(𝜏)𝑑𝜏
−∞
63
(Refer Slide Time: 07:02)
= 𝑎0 𝑝(𝑡) + 𝑛(𝑡)
Here for the sake of simplicity we will consider the transmission of only a single symbol
𝑎0 transmitted by a single pulse 𝑝(𝑡).
64
Substituting the value of y(t) in the expression for r(t) we get,
∞
𝑟(𝑡) = ∫ (𝑎0 𝑝(𝑡 − 𝜏) + 𝑛(𝑡 − 𝜏))ℎ(𝜏)𝑑𝜏
∞
∞
𝑟(𝑇) = ∫ (𝑎0 𝑝(𝑇 − 𝜏) + 𝑛(𝑇 − 𝜏))ℎ(𝜏)𝑑𝜏
−∞
Next, we separate the above expression into signal and noise components
65
(Refer Slide Time: 11:59)
∞ ∞
𝑟(𝑇) = 𝑎0 ∫ 𝑝(𝑇 − 𝜏)ℎ(𝜏)𝑑𝜏 + ∫ 𝑛(𝑇 − 𝜏)ℎ(𝜏)𝑑𝜏
−∞ −∞
The first term in the above expression is the signal component and the second term are the
noise component. We want to design a filter ℎ(𝜏) in order to minimize the noise
component and maximize the signal component.
66
Thus, we have two fundamental objectives, i) minimizing the noise power and ii)
maximizing the signal power. Or we can say we want to maximize the ratio of signal power
to the noise power. This ratio is called the Signal to Noise Ratio or SNR.
𝑠𝑖𝑔𝑛𝑎𝑙 𝑝𝑜𝑤𝑒𝑟
𝑆𝑁𝑅 =
𝑛𝑜𝑖𝑠𝑒 𝑝𝑜𝑤𝑒𝑟
SNR is a fundamental quantity for the analysis of digital as well as analog communication
systems. As noise will inevitably be a part of any communication system.
67
First let us calculate the signal power. The signal is given by,
∞
𝑟(𝑡) = 𝑎0 ∫ 𝑝(𝑇 − 𝜏)ℎ(𝜏)𝑑𝜏
−∞
∞ 2
𝑠𝑖𝑔𝑛𝑎𝑙 𝑝𝑜𝑤𝑒𝑟 = 𝐸{|𝑎0 ∫ 𝑝(𝑡 − 𝜏)ℎ(𝜏) 𝑑𝜏| }
−∞
∞ 2
= 𝐸{|𝑎02 |}𝐸{|∫ 𝑝(𝑡 − 𝜏)ℎ(𝜏) 𝑑𝜏| }
−∞
Here, the second term is a real signal and thus modulus can be removed, and the first term
is the data symbol power or 𝑃𝑑 .
68
(Refer Slide Time: 20:06)
i.e.,
𝐸(|𝑎0 |2 ) = 𝑃𝑑
𝑝(𝑡) and ℎ(𝑡) are deterministic signals and not random signals thus their expected value
will be signal itself.
69
∞ 2
𝑠𝑖𝑔𝑛𝑎𝑙 𝑝𝑜𝑤𝑒𝑟 = 𝑃𝑑 (∫ 𝑝(𝑇 − 𝜏)ℎ(𝜏)𝑑𝜏)
−∞
We have now computed the numerator term of the SNR. We will continue this discussion
further in the next module and see how can we design ℎ(𝜏) in order to maximize SNR.
70
Principles of Communication Systems - Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture – 07
Digital Communication Receiver, Noise Properties, Output
Noise Power
Hello. Welcome to another module in this massive open online course. So, we are looking
at the signal processing or how to process the receive signal at the receiver of the digital
communication system .
So, in this module let us start looking at how to characterize the noise power.
∞
𝑛̃ = ∫ 𝑛(𝑇 − 𝜏)ℎ(𝜏)𝑑𝜏
−∞
𝑛̃ is the noise after passing through the receive filter sampled at time T.
71
(Refer Slide Time: 02:12)
𝐸{|𝑛|2 } = 𝐸{𝑛2 }
Till now we have assumed that the noise in additive in nature. Now we invoke the other
two aspects of our model, namely white noise and gaussian nature.
72
(Refer Slide Time: 05:04)
The mean of the noise will be zero and the autocorrelation will be a delta function at zero.
i.e.,
𝐸{𝑛(𝑡)} = 0
𝜂
𝐸{𝑛(𝑡)𝑛(𝑡 + 𝜏)} = 𝑅𝑛𝑛 (𝜏) = 𝛿(𝜏)
2
We also assume that the noise process is a wide sense stationary process.
73
Note that a gaussian process will pass through a linear filter, the resulting signal will still
be a gaussian. Thus 𝑛̃ is also a gaussian.
∞
𝑛̃ = ∫ 𝑛(𝑇 − 𝜏)ℎ(𝜏)𝑑𝜏
∞
74
(Refer Slide Time: 12:59)
∞
𝐸{𝑛̃} = 𝐸 {∫ 𝑛(𝑇 − 𝜏)ℎ(𝜏)𝑑𝜏}
−∞
Since the expectation operator is linear and the impulse response ℎ(𝜏) is deterministic, we
can write,
∞
𝐸{𝑛̃} = ∫ 𝐸{𝑛(𝑇 − 𝜏)}ℎ(𝜏)𝑑𝜏}
−∞
75
Since 𝐸{𝑛(𝑡)} = 0 we get,
𝐸{𝑛̃} = 0
𝐸{𝑛̃2 } = 𝐸{𝑛̃.̃}
𝑛
∞ ∞
= 𝐸{∫−∞ 𝑛(𝑇 − 𝜏)ℎ(𝜏)𝑑𝜏 × ∫−∞ 𝑛(𝑇 − 𝜏̃ )ℎ(𝜏̃ )𝑑𝜏̃ }.
76
(Refer Slide Time: 17:49)
∞ ∞
= 𝐸{∫ ∫ 𝑛(𝑇 − 𝜏)𝑛(𝑇 − 𝜏̃ )ℎ(𝜏)ℎ(𝜏̃ )𝑑𝜏𝑑𝜏̃ }
−∞ −∞
Now we make use of the fact that the expectation operator is linear and move it inside.
Also the impulse response are not random and thus can be taken outside the expectation
operator.
77
Thus we have,
∞ ∞
= ∫ ∫ 𝐸{𝑛(𝑇 − 𝜏)𝑛(𝑇 − 𝜏̃ )}ℎ(𝜏)ℎ(𝜏̃ )𝑑𝜏𝑑𝜏̃
−∞ −∞
𝜂 ∞ ∞
∫ ∫ ℎ(𝜏)ℎ(𝜏̃ )𝛿(𝜏̃ − 𝜏)𝑑𝜏̃ 𝑑𝜏
2 −∞ ∞
78
(Refer Slide Time: 23:47)
∞ ∞
= ∫ ℎ(𝜏)(∫ ℎ(𝜏̃ )𝛿(𝜏̃ − 𝜏)𝑑 𝜏̃ )𝑑𝜏
−∞ −∞
𝜂 ∞
= ∫ |ℎ(𝜏)|2
2 −∞
79
(Refer Slide Time: 26:31)
Thus,
𝜂
𝐸{𝑛̃2 } = 𝐸ℎ
2
This relation gives us a method to calculate the noise power at the output of the receiving
filter. We will continue our discussion on SNR in the next module.
80
Thank you very much.
81
Principles of Communication Systems - Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture - 08
Digital Communication Receiver, Optimal SNR, Matched Filter
Hello, welcome to another module in this massive open online course. So, we are looking
at the SNR at the output that is after the processing, after the filtering and the sampling at
the receiver in the digital communication system, alright.
82
(Refer Slide Time: 01:59)
∞
𝑠𝑖𝑔𝑛𝑎𝑙 = ∫ 𝑎0 𝑝(𝑇 − 𝜏)ℎ(𝜏)𝑑𝜏
−∞
∞
𝑛𝑜𝑖𝑠𝑒 = ∫ 𝑛(𝑇 − 𝜏)ℎ(𝜏)𝑑𝜏
−∞
83
SNR is given by the ratio of signal power to noise power,
𝑠𝑖𝑔𝑛𝑎𝑙 𝑝𝑜𝑤𝑒𝑟
𝑆𝑁𝑅 =
𝑛𝑜𝑖𝑠𝑒 𝑝𝑜𝑤𝑒𝑟
From the previous modules we substitute the value of the signal and noise powers.
∞ 2
𝑃𝑑 (∫−∞ 𝑝(𝑇 − 𝜏)ℎ(𝜏)𝑑𝜏)
= 𝜂 ∞
|ℎ(𝜏)|2 𝑑𝜏
2 ∫−∞
For optimal communication, we want to maximize the SNR. For this we will use Cauchy
Schwartz inequality. First for simplicity of notation, let use the following notation,
𝑝(𝑇 − 𝜏) = 𝑢(𝜏) and ℎ(𝜏) = 𝑣(𝜏).
84
(Refer Slide Time: 06:00)
∞ 2 ∞ ∞
(∫ 𝑢(𝑡)𝑣(𝑡)𝑑𝑡) ≤ ∫ 𝑢2 (𝑡)𝑑𝑡 × ∫ 𝑣 2 (𝑡)𝑑𝑡
−∞ −∞ −∞
∞ ∞
𝑃𝑑 ∫−∞ 𝑝2 (𝑇 − 𝜏)𝑑𝜏 . ∫∞ ℎ2 (𝜏)𝑑𝜏
𝑆𝑁𝑅 ≤ 𝜂 ∞ 2
(𝜏)𝑑𝜏
2 ∫−∞ ℎ
85
(Refer Slide Time: 09:20)
Or,
∞
𝑃𝑑 ∫−∞ 𝑝2 (𝑇 − 𝜏)𝑑𝜏
𝑆𝑁𝑅 ≤ 𝜂
2
The SNR will be maximized when the equality condition of the Cauchy Schwarz
inequality is satisfied. i.e., when u(t) and v(t) are proportional to each other. Or,
𝑝(𝑇 − 𝜏) ∝ ℎ(𝜏)
86
(Refer Slide Time: 11:48)
Or
𝑝(𝑇 − 𝜏) = 𝑘ℎ(𝜏)
Since the above will be valid for any k, without loss of generality, we can assume k=1.
ℎ(𝜏) = 𝑝(𝑇 − 𝜏)
87
For maximum SNR, the receiver filter has to be “matched” to the pulse shaping filter.
This is termed as matched filter Principle and one of the fundamental concepts of digital
communication.
88
(Refer Slide Time: 14:58)
89
(Refer Slide Time: 15:57)
Consider an exponential pulse shaping filter. The receiver filter will simply be the pulse
shaping filter, inverted in the time domain (see figure) and shifted by T (along the time
axis.)
The above two operations will give us the matched receive filter.
90
(Refer Slide Time: 19:35)
∞
𝑃𝑑 ∫−∞ 𝑝2 (𝑇 − 𝜏)𝑑𝜏
𝑆𝑁𝑅 = 𝜂
2
∞ ∞
2 (𝑇
∫ 𝑝 − 𝜏)𝑑𝜏 = ∫ 𝑝2 (𝜏)𝑑𝜏
−∞ −∞
91
Thus, we get,
𝑃𝑑 ∞ 2
𝑆𝑁𝑅 = 𝜂 ∫ ℎ (𝜏)𝑑𝜏
2 −∞
𝑃𝑑 ∞ 2
= 𝜂 ∫ 𝑝 (𝜏)𝑑𝜏
−∞
2
Thus, maximum SNR is proportional to the total energy of the pulse shaping filter. And
more importantly this value is achieved when the receive filter is matched to the pulse
shaping filter, with inverted and shifted time. i.e.,
ℎ(𝜏) = 𝑝(𝑇 − 𝜏)
Thank you.
92
Principles of Communication Systems - Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture - 09
Probability of Error in Digital Communication, Probability Density
Functions of Output
Hello. Welcome to another module in this massive open online course. So, we are
looking at the receiver of a digital communication system, in today's module this module
let us start looking at.
The probability of error how to characterize or how to or basically what is the probability
of error corresponding to a particular scheme at the digital communication receiver. So,
we would like to look at the probability of error. Now the probability of error remember
in a digital communication system we are transmitting several symbols, the probability
of error is the probability that a certain symbol is received in error, and this is a very
important metric to characterize the performance of a digital communication system and
of course, naturally the performance is better if the probability of error is lower alright.
So, we always try to achieve the lowest possible probability of error alright. So, the aim
is to design a reliable communication system which means the probability of error that is
we want the symbols to be received accurately at the receiver therefore, the probability
93
of error that the probability that the same received symbol is in error must be as low as
possible. So, the probabilities of error describe it we transmit. So, probability of error
now the probability of error this is an important metric; is an important metric correct to
characterize the performance of a digital communication system. Characterize or
quantify of a digital communication system performance of a digital communication
system.
And of course, the lower the probability so we try to minimize we aim to minimize the
probability, the aim is to minimize probability of error; typical values of probability of
error approximately 10 to the power of minus 6 to 10 to the power of minus 8 these are
the typical probability we also know that we are going to look at this is the known as the
bit error rates or symbol error rates etcetera, these are the typical values of probability of
error.
94
(Refer Slide Time: 04:41)
Now, how do we characterize now how to characterize the probability of error that is the
next question how do we measure or how to for a given scheme that is the idea for a
given scheme how to characterize the probability of error for a given scheme, how to
characterize the probability of error. Now basically we have seen that our received
sample r of T is given as a naught, after sampling. Remember after filtering and sampling
you have seen this in the previous modules after remember after filtering with h of t and
sampling at t equal to after filtering and sampling this is a naught correct integral minus
infinity to infinity, P T minus tau, h tau d tau plus n tilde this is the noise remember we
also verified that the noise is Gaussian.
95
(Refer Slide Time: 06:39)
We also verified that this noise is Gaussian in nature and further we have seen that to
maximize the signal to noise power ratio we have to use the optimal filter which is the
match filter. To maximize SNR we have to choose h of t equals h of tau equals P T
minus tau, this is termed as the matched filter and this is one of the important most
important principles in digital communication.
This is termed as the this is a very important principle this is termed as the matched filter
and therefore, employing the matched filter we have that is h of t minus tau equals p of
96
tau we have r of T equals minus infinity to infinity, that is substituting h of tau we get h
of tau equals P of T minus tau, we get P square T minus tau d tau plus n tilde.
Now, remember we have also seen this P square d minus T minus tau d tau is nothing but
this is also equal to this is nothing but this is also equal to integral minus infinity to
infinity P square tau d tau which is basically like a denote it as E P the energy of the
pulse. So, this quantity here integral minus infinity to infinity is nothing but the energy of
the pulse shaping filter, this is the energy of the pulse shaping filter. So, I can say the
output is very simply is simply a naught E P plus n tilde where a naught remember is a
transmitted symbol, E P is the energy of the pulse shaping filter that is integral minus
infinity to infinity, P square tau d tau plus n tilde where n tilde is a Gaussian noise.
Now, let us look at these aspects further. So, first we have this is the transmitted symbol,
E P is energy of the pulse shaping filter we have already seen this is the pulse let me
simply write this as the pulse energy in tilde is Gaussian noise.
97
(Refer Slide Time: 09:57)
And remember we have also seen n tilde we also proved in the previous modules that the
mean of n tilde equals 0, that it is 0 mean and further the variance of n tilde that is
expected value of n tilde square for additive white Gaussian noise is eta naught by 2
integral minus infinity mod h tau square d tau, but we have also seen that the energy of
the pulse shaping filter is nothing but the energy of the receive filter is nothing but the
energy of the pulse shaping filter for a matched filter. So, this is eta by naught by 2
integral minus infinity to infinity, P square tau d tau which is eta naught by 2 times E P
that is the expected value of n square.
98
And therefore, we can write n tilde is Gaussian, basically this can be summarized as n
tilde this n calligraphic n denotes normal distribution or Gaussian. So, n tilde is Gaussian
with mean 0 and variance eta naught by 2 E P.
So, this is basically what we have derived for the Gaussian noise. So, n tilde this is the
noise after sampling correct noise after sampling or noise at output of sampler, this
denotes Gaussian this denotes that the mean equals 0, this is the mean that is the first
entry and this is basically the power of the noise or the variance that is we are using the
notation for a Gaussian which is standard notation, Gaussian of mean mu and variance
sigma square is denoted by n mu comma sigma square, that is Gaussian of mean equal to
mu variance equal to sigma square. So, here our noise n tilde remembers the noise n tilde
is the argument when test follows that is n t that is n t the noise additive white Gaussian
noise is passed through a linear filter that is h t alright.
Hence the output noise process n tilde t is also Gaussian and when we sample it the
sample of a Gaussian random process, Gaussian random noise process is a Gaussian
random variable. Therefore, n tilde is a Gaussian random variable we have derived the
mean of that, the mean of that is 0 the variance of that is a eta by naught by 2 integral
minus infinity to infinity, h square tau d tau that is the energy of the filter, but when you
use a matched filter the matched filter is same as the pulse therefore, the energy of the
matched filter is same with the energy of the pulse therefore, the variance of the
Gaussian noise at the output of the sampler is eta naught by 2 times E P where E P is the
pulse.
So, that is the complete argument and I think it is important that you understand that. So,
please go through the entire if it is not clear I urge you to go through the previous
modules where we calculated the nature of the gauss noise at the output of the sampler,
and also the mean and the variance once again.
99
(Refer Slide Time: 13:54)
Now more importantly the other thing that one can also notice note here that if I have a
Gaussian random variable that is as an aside, you can note that if I have a Gaussian
random variable with mean 0 and variance one that is Gaussian random variable mean 0
variance mean equal to 0 variance unity. Such a Gaussian random variable this is known
as a standard Gaussian or standard normal variable standard normal random variable, this
is known as the standard normal or the standard Gaussian; this is known as the standard
normal or standard Gaussian random variable or standard Gaussian random variable.
100
Now therefore, basically we have a very elegant model if you look at this we have the
received symbol r T that is not the received symbol, but the sample r T which is equal to
a naught E P plus n tilde, let us now of course, we said that the symbols must belong to
digital constellation. Now the very first modules we have realized that this symbols in a
digital communication system symbols must belong to a digital constellation in a digital
communication.
So, let us choose a very simple system in which basically we have very simple
constellation in which a naught equals either minus A or plus A. Remember we can have
remember how this arises we have the information symbols if you go back, we have
these are your information stream is the bits basically the raw bits information bits. Now
I can map 0 to the voltage level A and 1 to the voltage level minus A for instance or so I
will get a minus A, A minus A 0 is mapped to a one is mapped to minus A, A minus A 0
So, these are basically the symbols. So, I will just let me write it clearly. So, A minus A
minus A, A minus A, A and so on here we have 0 mapped to a, one mapped to minus A,
one mapped minus A, 0 mapped to A, one mapped to minus A. So, we are using the
mapping 0 is mapping to A and information bit 1 is mapping to minus A and now,
therefore, if you observe this at the output you will have r naught r T equals plus or
minus A times E P plus n tilde.
101
(Refer Slide Time: 17:28)
So, what you will receive is the signal level plus A. So, since you are transmitting either
plus A or minus A. So, what you will receive after matched filtering and sampling these
are the plus A times E P, where E P is energy of the pulse or minus A times E P in the
presence of this noise n tilde.
So, let us consider these 2 scenarios let us write them clearly. So, we have r T equals A E
P plus n tilde, if a naught is equal to A that is corresponding to the information bit 0 or r
T equals minus A E P plus n tilde if a naught equals minus A. So, this is basically a
complete characterization of the output.
102
(Refer Slide Time: 18:38)
So, this is output after sampling and now you will realize here something interesting that
is if you transmit a naught equal to A what you get is A times E P plus n tilde, which
means n tilde. Now look at this A times E P is a constant you are adding it to the
Gaussian noise n tilde which are 0 mean. So, the mean of this Gaussian noise n tilde is
going to shift to A times E P.
Similarly, when you transmit a naught equal to minus A output is minus A times E P
plus n tilde, which means the in this case the mean of the noise n tilde will shift to minus
A times E p. So, if you look at the output right r of t which is either A times E P plus A n
tilde or minus A times E P plus n tilde corresponding to a or minus A the mean will be
shifted to basically a times E P or minus A times E P because n tilde itself is 0 mean
Gaussian noise, and the variance will remain unchanged because when you simply add a
constant quantity to the Gaussian noise to the Gaussian random variable, the mean shifts
the variance which is the spread of the Gaussian random variable remains unchanged.
So, the variance of the Gaussian random variable will remain fixed that is your eta
naught by 2 times E P and that is an important point.
So, let us note that. So, if we take a Gaussian random will. So, let us look at this property
I think because it is a fairly important property which we might invoke later also, if you
have n which is distributed as a Gaussian random variable according to mean, mu,
variance sigma square, then n plus K where K is a constant is distributed according to the
103
Gaussian random variable the mean is shifted by K variance is unchanged to variances is
a this is an important property, variance is unchanged because variance is remember it is
related to the spread of the Gaussian random variable variance is a.
So, basically what we are saying is if you have an initial Gaussian random variable
which has this probability density function, which is centered at the mean which has
mean mu if you add. So, this is let us say your n probability density. In fact, this is not n,
but this is the probability density function of n.
Now, the probability density function of n plus k will be something that is shifted. So,
this is basically your mu plus K, but when we say the variance the variance is related to
the spread its pop it is proportional to the spread. So, this spread remains unchanged or
basically the shape of the Gaussian remains unchanged, it is basically simply translated.
So, the spread here is sigma square, spread here will also be sigma square this is
unchanged.
104
(Refer Slide Time: 22:51)
Now, what is the relevance of this 2 a digital communication system problem now what
we are seeing is something very interesting go back and look at r T equals either plus A
E P plus n tilde or minus A E P plus n tilde, which means this is n tilde shifted by E p.
So, this will be simply Gaussian with mean shifted because n tilde is 0 mean. So, it is it
is mean will be 0 plus A t e p which is A E P sigma square of course, sigma square we
also know what is sigma square, sigma square is eta naught by 2 times E P the variance
is unchanged this is also Gaussian with A E P minus A E P; minus A E P plus n tilde is
Gaussian with mean shifted to minus A E P and variance sigma square.
So, basically corresponding to both these, so the way to distinguish the outputs is
basically corresponding to the transmission of plus A, the output is a Gaussian random
variable. Remember because you are always observing the output in the presence of
noise n tilde which is random.
105
(Refer Slide Time: 24:35)
So, naturally the output is also a random variable correct; however, when the symbol
plus A is transmitted the output is a Gaussian random variable with mean A times E P
when the symbol minus A is transmitted the output is a Gaussian random variable with
mean, minus A times E P and in both cases the variance is the same variance is identical
which is sigma square equal to eta naught by 2 times E p. So, let us note that.
So, in this case r T is Gaussian mean equals minus A times E P and in both cases the
variance is the same, we will call it sigma square as a eta naught by 2 for the matched
106
filter this is a eta naught by 2 times E P as we have shown above alright. So, that is what
we are going to. So, what we are going to do.
So, let me just draw it I think it makes sense to draw it. So, we have 2 Gaussians and the
other Gaussian looks something like this, both of them have the same shape. So, this is
mean equal to. So, this is basically at minus A or let me just this is minus A times E P
and this is A times E P, variance is the same and by symmetric this for symmetry this
point of intersection is 0 this is purely you can seen by symmetry.
So, the point of intersection is 0 by symmetry and of course, now you can see this
corresponds to this Gaussian distribution corresponds to a naught equals minus A, this
Gaussian density corresponds to a naught equals plus A. So, naturally you can see that
when you are transmitting a naught equal to A the Gaussian is shifted towards the right
that is shifted towards a times E P, when you are transmitting a naught equals minus A
the Gaussian is shifted towards the left that is the mean A times or minus A times E P ok.
So, we have 2 Gaussians with the same variance that is eta naught over 2 times E p, but
one is shifted to minus A times E P, the other is shifted to A times E p. So, this is
basically the 2 the output the probability density function of the output basically
comprises of 2 shifted Gaussians.
107
(Refer Slide Time: 28:23)
That is something which is very interesting and both have the same variance and again I
am making the same point I am repeating this point both have the same variance, sigma
square equals eta naught by 2 times E P and this is an important property. So, please go
over this to understand this better and we will complete the derivation of this from this
probability density function the different probability density functions of r T
corresponding to plus A and minus A, we are going to compute the decision rule at the
output at the receiver and also what is the corresponding probability of error. So, let us
stop here.
Thank you.
108
Principles of Communication Systems - Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture - 10
Probability of Error in Digital Communication, Optimal Decision
Rule, Gaussian Q function
Hello. Welcome to another module in this massive open online course. So, we are looking
at the performance of a digital communication system and we have said that the output
statistic after sampling can be described as follows.
𝜂
𝐴𝐸𝑃 + 𝑛̃ ~ 𝑁(𝐴𝐸𝑃 , 𝐸𝑃 )
𝑟(𝑇) = { 2
𝜂
−𝐴𝐸𝑃 + 𝑛̃ ~ 𝑁(−𝐴𝐸𝑃 , 𝐸𝑃 )
2
109
(Refer Slide Time: 01:46)
𝜂𝐸
For simplicity in writing symbols we assume𝐴𝐸𝑃 = 𝐴̃ and 2𝑃 = 𝜎 2
̃ 𝜎 2 ) 𝑓𝑜𝑟 𝑎𝑜 = 𝐴
𝑁(𝐴,
𝑟(𝑇) {
̃ 𝜎 2 ) 𝑓𝑜𝑟 𝑎𝑜 = −𝐴
𝑁(−𝐴,
Also, from symmetry of the two functions we can say that intersection point will lie at
zero.
110
From the above figure (see pict. above) we can see that for r(T)>0 PDF of +A has higher
value and for r(T)<0 PDF of -A has higher value. Thus, we formulate the following
detection rule or the decision rule.
𝑟(𝑇) ≥ 0 ⇒ 𝑑𝑒𝑡𝑒𝑐𝑡 𝐴
{
𝑟(𝑇) < 0 ⇒ 𝑑𝑒𝑡𝑒𝑐𝑡 − 𝐴
Detection rule is simply a rule that helps us decide the received symbol from the sampled
output of the transmitted signal.
111
(Refer Slide Time: 11:37)
We can intuitively say that an error would occur when ao =A is transmitted but ao=-A is
detected. Or when 𝑟(𝑇) < 0 for 𝑎𝑜 = 𝐴 or vice versa.
112
(Refer Slide Time: 14:24)
𝑃𝑒 = 𝑃𝑟 (𝑛̃ ≥ 𝐴̃)
∞
= ∫ 𝑓𝑁 (𝑛̃)𝑑𝑛̃
𝐴̃
113
(Refer Slide Time: 16:55)
𝑛̃ 𝑛̃2
1 −
=∫ 𝑒 2𝜎2 𝑑𝑛
̃
𝐴̃ √2𝜋𝜎 2
𝑛̃
In the above expression we substitute, 𝜎 = 𝑛′ we get,
𝑛̃ (𝑛′)2
1 −
= ∫̃ 𝑒 2 𝑑𝑛′
𝐴 √2𝜋
𝜎
114
The above expression is simply the integration of a gaussian PDF with mean 0 and variance
1.
∞
1 2 /2
𝑄(𝑢) = ∫ 𝑒 −𝑡 𝑑𝑡
𝑢 √{2𝜋}
Q(u) is the probability that the standard gaussian random variable is greater than u.
115
Thus we have,
𝐴̃
𝑃𝑒 = 𝑄 ( )
𝜎
𝐴2 𝐸𝑃
= 𝑄(√ 𝜂 )
2
Thank you.
116
Principles of Communication Systems - Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture – 11
Introduction to Binary Phase Shift Keying (BPSK) Modulation, Optimal Decision
Rule and Probability of Bit-Error or Bit-Error Rate (BER)
Hello, welcome to another module in this massive open online course in this module. Let
us start looking at binary phase shift keying. So, we want to look at this digital modulation
scheme which is known as binary phase shift keying.
This scheme also known as BPSK is one of the many modulation schemes.
117
(Refer Slide Time: 01:18)
In this scheme the pulse shaping filter or the pulse is given by:
2
𝑝(𝑡) = √ cos(2𝜋𝑓𝑐 𝑡) 𝑓𝑜𝑟 0 ≤ 𝑡 ≤ 𝑇
𝑇
118
(Refer Slide Time: 02:58)
𝑘
𝑇= 𝑤ℎ𝑒𝑟𝑒 𝑘 𝑖𝑠 𝑎𝑛 𝑖𝑛𝑡𝑒𝑔𝑒𝑟
𝑓𝑐
For, k=2, the pulse will have two cycles in the symbol duration
119
(Refer Slide Time: 05:15)
For k=2, it can be shown that the pulse p(t) is normalized to unit energy.
120
(Refer Slide Time: 06:29)
∞
𝐸𝑃 = ∫ 𝑝2 (𝑡)𝑑𝑡
−∞
𝑇
2
=∫ cos 2 (2𝜋𝑓𝑐 𝑡)𝑑𝑡
0 𝑇
121
2 𝑇 1 + 𝑐𝑜𝑠4𝜋𝑓𝑐 𝑡
= ∫ 𝑑𝑡
𝑇 0 2
=1
Each a0 carries one bit of information. Throughout all schemes we use a constant energy
per bit.
122
(Refer Slide Time: 09:49)
123
(Refer Slide Time: 10:55)
1 1
𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑒𝑛𝑒𝑟𝑔𝑦 = 𝐸𝑏 = 𝐴2 𝐸𝑃 + 𝐴2 𝐸𝑃
2 2
= 𝐴2 𝐸𝑃 = 𝐴2
𝐴 = √𝐸𝑏
124
(Refer Slide Time: 13:48)
2
𝑥(𝑡) = √𝐸𝑏 √ cos(2𝜋𝑓𝑐 𝑡) 0≤𝑡≤𝑇
𝑇
125
(Refer Slide Time: 15:05)
2𝐸𝑏
𝑥(𝑡) = −√ 𝑐𝑜𝑠2𝜋𝑓𝑐 𝑡
𝑇
2𝐸𝑏
=√ cos(2𝜋𝑓𝑐 𝑡 + 𝜋)
𝑇
126
(Refer Slide Time: 17:22)
Note that the waveforms for 0 and one are similar but are phase shifted by 𝜋. Thus, if we
phase shift bit 0 by 𝜋 we get 1 and shifting 1 by 𝜋 we get 0.
127
(Refer Slide Time: 20:20)
This is what gives the scheme its name. Since waveforms for different bits are phase
shifted from each other.
128
(Refer Slide Time: 21:42)
129
(Refer Slide Time: 23:11)
ℎ(𝜏) = 𝑝(𝑇 − 𝜏)
130
(Refer Slide Time: 24:23)
Here if the filter looses is not shifted by T, then it is no longer a causal filter. This makes
it important to shift the filter by T.
131
(Refer Slide Time: 28:41)
𝐴2 𝐸𝑃
𝑃𝑒 = 𝑄 (√ 𝜂 )
2
132
2𝐸𝑏
𝑃𝑒 = 𝑄(√ )
𝜂
This expression gives the probability of error with BPSK scheme and average energy per
bit Eb
133
Principles of Communication Systems - Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture - 12
Introduction to Amplitude Shift Keying (ASK) Modulation
Hello, welcome to another module in this massive open online course. So, in this module
let us start looking at another digital modulation scheme that is amplitude shift key.
So, previously we have seen phase binary phase shift keying, now, we will look at a
different modulation scheme which as amplitude shift key or ASK
134
(Refer Slide Time: 01:31)
The carrier pulse for this scheme the same as before i.e.,
2
𝑝(𝑡) = √ cos(2𝜋𝑓𝑐 𝑡) 𝑓𝑜𝑟 0 ≤ 𝑡 ≤ 𝑇
𝑇
135
(Refer Slide Time: 03:20)
𝑎𝑜 𝜖{0, 𝐴}
136
(Refer Slide Time: 05:04)
If ao =0 then the transmitted pulse p(t) for that symbol will also be zero. Thus the signal
will be “on” (or equal to p(t)) for some time (when ao=0) and it will be zero otherwise.
137
(Refer Slide Time: 07:53)
138
(Refer Slide Time: 09:09)
𝐴𝑝(𝑡)𝑖𝑓 𝑎𝑜 = 0, 𝑏𝑖𝑡 = 1
𝑥(𝑡) {
0 𝑖𝑓 𝑎𝑜 = 1, 𝑏𝑖𝑡 = 0
Each value has the probability half. Thus, the average energy per bit is given by,
1 2 1
𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑒𝑛𝑒𝑟𝑔𝑦 𝑝𝑒𝑟 𝑏𝑖𝑡 = 𝐴 𝐸𝑃 + 0 = 𝐴2 𝐸𝑃
2 2
139
(Refer Slide Time: 10:31)
𝐴 = √2𝐸𝑏
So, this ensures energy per bit is constant across the schemes.
140
(Refer Slide Time: 12:07)
Now, we want to calculate what is the probability of the resulting probability of error for
this scheme or the probability of bit error.
141
(Refer Slide Time: 13:51)
= 𝐴𝑝(𝑡) + 𝑛(𝑡)
142
(Refer Slide Time: 15:53)
𝑟(𝑇) = 𝐴𝐸𝑝 + 𝑛̃
Where 𝑛̃ is the gaussian noise with mean zero and variance 𝜂/2 (for 𝐸𝑃 = 1)
143
(Refer Slide Time: 18:11)
144
(Refer Slide Time: 20:05)
For ao=A we have 𝑟(𝑇) = 𝐴𝐸𝑃 + 𝑛̃. Thus, the overall expression for r(T) is,
𝐴𝐸𝑃 + 𝑛̃ 𝑖𝑓 𝑎𝑜 = 𝐴
𝑟(𝑇) = {
𝑛̃ 𝑖𝑓 𝑎𝑜 = 0
However in this r T simply n tilde which means it is a Gaussian with mean equal to 0. The
variance is the same in both case.
145
(Refer Slide Time: 22:17)
So, if you look at this, this is not shifted to minus n plus a but it is something very different.
If you look at this, this is still two Gaussians, in first case the mean is 0 that is or in one
case the first case rather the mean is A times E p, this corresponds to the bit 0 and this
corresponds to the bit equals 1 and now you can see the midpoint is not 0, but by symmetry
again midpoint is A E p divided by 2.
So, this is the Gaussian corresponding to bit 1, this is the Gaussian for bit equal to 0 by
symmetry the midpoint is A E p divided by 2 which is the detection threshold which is
146
also the detection which is also your detection threshold. So, by symmetry the midpoint is
A E p divided by 2 which is also the detection threshold. Therefore, the optimal decision
rule is not to compare r T with 0.
So, the optimal decision rule is not to compare r T with 0, but to compare r T with this
threshold A E p divided by 2 which E p is 1. So, this is a by 2. If r T is greater than or
equal to A E p divided by 2 implies, this implies the decision is a naught equal to A.
147
On the other hand if r T is less than A E p divided by 2 decision is a naught equals 0. So,
this is basically our by symmetry this is our optimal decision rule where we have exploited
the symmetry in the system, alright.
So, basically what we have seen so far is we have started looking at different digital
modulation scheme that is amplitude shift keying in which the different waveforms shifted
in terms of amplitude. One of them is amplitude A, other has an amplitude 0. We have
seen; what is the amplitude level to keep the energy constant at E b or the energy per bit
constant at E b? Again we have seen what are the different received what is the received
symbol, what is the probability density function or what are the probability density
functions of the sample r T corresponding the transmission of the waveform for bit 0 and
the waveform corresponding to bit 1. We have seen that one is the Gaussian with mean A
E p where E A is amplitude level E p is the pulse energy other is an a Gaussian with mean
at 0. Therefore by symmetry the midpoint is at A E p divided by 2 which is the detection
threshold.
148
Principles of Communication Systems - Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture - 13
Optimal Decision Rule for Amplitude Shift Keying (ASK), Bit-Error Rate (BER)
and Comparison with Binary Phase Shift Keying (BPSK) Modulation
Hello, welcome to another module in this massive open online course. In the last module
we looked at optimal decision rule for ASK. In this module we will look at probability of
error in ASK.
𝐴𝐸𝑃 + 𝑛̃ 𝑖𝑓 𝑎𝑜 = 𝐴
𝑟(𝑇) = {
𝑛̃ 𝑖𝑓 𝑎𝑜 = 0
149
(Refer Slide Time: 02:17)
𝑟(𝑇) is simply a combination of two gaussian distributions one with mean 0 and other with
mean 𝐴𝐸𝑝 and both with variance 𝜂𝐸𝑝 /2.
By symmetry we concluded that midpoint (or intersection point) of these two will lie on
𝐴𝐸𝑃 /2.
150
(Refer Slide Time: 03:22)
𝐴𝐸𝑃
𝐷𝑒𝑐𝑖𝑑𝑒 𝑎𝑜 = 𝐴 𝑖𝑓 𝑟(𝑇) ≥
{ 2
𝐴𝐸𝑃
𝐷𝑒𝑐𝑖𝑑𝑒 𝑎𝑜 = 0 𝑖𝑓 𝑟(𝑇) ≤
2
151
𝐴𝐸𝑃
𝑟̃ (𝑇) = 𝑟(𝑇) −
2
𝐴𝐸𝑃
+ 𝑛̃ 𝑖𝑓 𝑎𝑜 = 𝐴
𝑟̃ (𝑇) = { 2
𝐴𝐸𝑃
− + 𝑛̃ 𝑖𝑓 𝑎𝑜 = 0
2
Now you can see there is symmetry, it is symmetric about 0. So, now, you can see this is
similar to BPSK; similar to BPSK with A E p replaced by A E p by this is similar to BPSK
with A E p replaced by A E p by 2. Therefore, again now we have something that similar
to BPSK that is A E p by 2 plus n tilde minus A E p by 2 plus n tilde. Remember for BPSK
binary phase shift keying we had A E p by A E p plus n tilde minus A E p plus n tilde
therefore, in that sense this is similar to binary phase shift keying with A E p replaced by
A E p divided by 2.
152
(Refer Slide Time: 06:15)
Hence, the probability of error or bit error rate in ASK will be given by an expression
similar to BPSK with 𝐴𝐸𝑃 replaced by 𝐴𝐸𝑃 /2.
153
(Refer Slide Time: 08:38)
𝐴2 𝐸𝑃
𝑃𝑒 = 𝑄 (√ )
2𝜂
154
𝐸𝑏
𝑃𝑒 = 𝑄 (√ )
𝜂
Now, substitute E p equal to 1 that is pulse normalized to unit energy and remember we
have calculated in amplitude shift keying we have said that for average bit energy E b A
square must be equal to 2 E b. So, substituting a square equals 2 E b we get P e equals Q
square root of A square is 2 E b in to E p is 1 divided by 2 n naught divided by 2 n naught
which is equal to Q square root of E b over N naught that is your probability of error.
155
(Refer Slide Time: 11:04)
𝐸𝑏
𝑃𝑒,𝐴𝑆𝐾 = 𝑄 (√ )
𝜂
2𝐸𝑏
𝑃𝑒,𝐵𝑃𝑆𝐾 = 𝑄 (√ )
𝜂
156
Here, it is important to remember that Q function is a decreasing function as it is the
integration of the tail of a gaussian from its argument.
Thus higher the argument of the tail Q function, lower its value.
Thus for the same value of average energy per bit, we have,
𝑃𝑒,𝐵𝑃𝑆𝐾 ≤ 𝑃𝑒,𝐴𝑆𝐾
157
(Refer Slide Time: 16:36)
1
√ 𝐸𝑏 𝐸𝑏
𝑄 ( 2. 2 ) = 𝑄 (√ )
𝜂 𝜂
158
(Refer Slide Time: 18:43)
For same BER, BPSK needs half the average bit energy compared to that of ASK.
1
𝐸𝑏,𝐵𝑃𝑆𝐾 = 𝐸𝑏,𝐴𝑆𝐾
2
1
⇒ 10 log10 𝐸𝑏,𝐵𝑃𝑆𝐾 = 10 log10 + 10 log10 𝐸𝑏,𝐴𝑆𝐾
2
159
(Refer Slide Time: 20:04)
Or,
160
(Refer Slide Time: 21:57)
i.e., for same BER BPSK requires 3dB lower average energy per bit.
So, we will stop here and continue with other aspects in the subsequent modules.
161
Principles of Communication Systems - Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture - 14
Introduction to Signal Space Concept, Orthonormal Basis Signals
Hello, welcome to another module in this massive open online course. So, in today’s
module this module let us start looking at another concept in digital communication that
is a very important concept and in fact also one of the most fundamental frameworks which
is termed as a signal space. Slightly abstract but has several applications especially in
digital communications.
162
(Refer Slide Time: 01:06)
Consider two pulses 𝑃1 (𝑡) and 𝑝2 (𝑡) such that they are orthogonal to each other and have
unit energy.
163
(Refer Slide Time: 02:04)
i.e.,
∞ ∞
∫ 𝑝12 (𝑡)𝑑𝑡 = ∫ 𝑝22 (𝑡)𝑑𝑡 = 𝐸𝑃 = 1
−∞ −∞
∞
∫ 𝑝1 (𝑡)𝑝2 (𝑡)𝑑𝑡 = 0
−∞
164
(Refer Slide Time: 04:08)
165
(Refer Slide Time: 06:07)
A signal space is simply similar to a vector , a space that is formed by a linear combination.
𝛼1 𝑝1 (𝑡) + 𝛼2 𝑝2 (𝑡)
166
(Refer Slide Time: 07:02)
All signals which can be represented by above forms will form the signal space S of 𝑝1 (𝑡)
and 𝑝2 (𝑡). Let us consider an example.
2
𝑝1 (𝑡) = √ cos(2𝜋𝑓1 𝑡) 0≤𝑡≤𝑇
𝑇
2
𝑝2 (𝑡) = √ cos(2𝜋𝑓2 𝑡) 0 ≤ 𝑡 ≤ 𝑇
𝑇
167
(Refer Slide Time: 09:22)
Here we assume that T contains integer number of cycles of both 𝑓1 and 𝑓2 . i.e.,
𝑘1 𝑘2
𝑇= = (𝑤ℎ𝑒𝑟𝑒 𝑘1 ≠ 𝑘2 )
𝑓1 𝑓2
We can assume that 𝑓1 and 𝑓2 are the multiples of the same fundamental frequency 𝑓𝑜 .
𝑘1
𝑓1 = = 𝑘1 𝑓𝑜
𝑇
168
𝑘2
𝑓2 = = 𝑘2 𝑓𝑜
𝑇
Where,
1
𝑓𝑜 =
𝑇
169
∞
𝐸1 = ∫ 𝑝12 (𝑡)𝑑𝑡
−∞
𝑇
2
=∫ cos 2 (2𝜋𝑓1 𝑇)𝑑𝑡
0 𝑇
2 𝑇 1 + cos(4𝜋𝑓1 𝑡)
= ∫ 𝑑𝑡
𝑇 0 2
= 1 = 𝐸𝑃
170
(Refer Slide Time: 13:49)
∞
𝐸2 = ∫ 𝑝22 (𝑡)𝑑𝑡 = 1 = 𝐸𝑝
−∞
171
(Refer Slide Time: 14:38)
∞
2 𝑇
∫ 𝑝1 (𝑡)𝑝2 (𝑡)𝑑𝑡 = ∫ cos(2𝜋𝑓1 𝑡) cos(2𝜋𝑓2 𝑡) 𝑑𝑡
−∞ 𝑇 0
1 𝑇
= ∫ {cos(2𝜋(𝑓1 + 𝑓2 ) 𝑡) + cos(2𝜋(𝑓1 − 𝑓2 )𝑡)}𝑑𝑡
𝑇 0
172
(Refer Slide Time: 16:07)
Replacing 𝑓1 and 𝑓2 by 𝑘𝑓1 and k𝑓2 we get and integrating from 0 to T we get
So, this is equal to I can write this as 1 over T sin 2 pi F 1 plus F 2 times T divided by 2 pi
times F 1 plus F 2 evaluated between the limits 0 to T plus 1 over T sin 2 pi F 1 minus F 2
times T divided by 2 pi times F 1 minus F 2 0 to T. We already seen F 1 plus F 2 is k times
F naught.
173
(Refer Slide Time: 17:31)
= sin 2π(𝑘1 − 𝑘2 ) 𝑓𝑜 𝑇 = 0
174
(Refer Slide Time: 20:05)
𝑝1 and 𝑝2 are hence, orthogonal and have unit energy, thus they will form the basis of a
signal space.
175
(Refer Slide Time: 21:33)
2 2
𝛼1 𝑝1 (𝑡) + 𝛼2 𝑝2 (𝑡) = 𝛼1 √ cos(2𝜋𝑓1 𝑡) + 𝛼2 √ cos(2𝜋𝑓2 𝑡)
𝑇 𝑇
In the next modules we will try to build modulation scheme using this vector space.
176
Principles of Communication Systems – Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture - 15
Introduction to Frequency Shift Keying (FSK)
Hello, welcome to another module in this massive open online course. In this module, let
us look at another digital modulation technique that is frequency shift key.
So, we will be looking at a new digital modulation scheme which is termed as frequency
shift key. This is a digital modulation scheme as we have already said and it is based on
the signal space concept that we have previously described.
177
(Refer Slide Time: 01:14)
Suppose we have two orthonormal pulses as discussed in the previous lecture,𝑝1 (𝑡) and
𝑝2 (𝑡).
2
𝑝1 (𝑡) = √ cos(2𝜋𝑓1 𝑡) 0≤𝑡≤𝑇
𝑇
2
𝑝2 (𝑡) = √ cos(2𝜋𝑓2 𝑡) 0≤𝑡≤𝑇
𝑇
178
(Refer Slide Time: 02:07)
Both 𝑝1 (𝑡) and 𝑝2 (𝑡) have unit energy and are orthogonal to each other.
179
(Refer Slide Time: 04:38)
2
𝐴𝑝1 (𝑡) = 𝐴√ cos(2𝜋𝑓1 𝑡) 𝑓𝑜𝑟 𝑏𝑖𝑡 0
𝑇
𝑥(𝑡) =
2
𝐴𝑝2 (𝑡) = 𝐴√ cos(2𝜋𝑓2 𝑡) 𝑓𝑜𝑟 𝑏𝑖𝑡 1
{ 𝑇
180
Note here that both 𝑝1 (𝑡) and 𝑝2 (𝑡) differ in frequency.
One waveform can be obtained from other by shifting the frequency. Hence, this scheme
is known as frequency shift keying or FSK.
Assuming both 0 and 1 appear with probability half each, we get the average energy per
bit,
181
1 2 1
𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝑒𝑛𝑒𝑟𝑔𝑦 𝑝𝑒𝑟 𝑏𝑖𝑡 = 𝐴 𝐸𝑃 + 𝐴2 𝐸𝑃 = 𝐸𝑏
2 2
Assuming 𝐸𝑝 = 1 we get,
𝐴 = √𝐸𝑏
182
2𝐸
√ 𝑏 cos(2𝜋𝑓1 𝑡) 𝑓𝑜𝑟 𝑏𝑖𝑡 0
𝑇
𝑥(𝑡) =
2𝐸
√ 𝑏 cos(2𝜋𝑓1 𝑡) 𝑓𝑜𝑟 𝑏𝑖𝑡 1
{ 𝑇
183
(Refer Slide Time: 11:52)
184
(Refer Slide Time: 13:39)
Now that we have two types of signal, we need to find out how to match the receive filter.
𝑝1 (𝑡) + 𝑝2 (𝑡)
𝑦̃(𝑡) = 𝑦(𝑡) − 𝐴 ( )
2
185
(Refer Slide Time: 15:41)
𝑝1 (𝑡) − 𝑝2 (𝑡)
= 𝐴( ) + 𝑛(𝑡)
2
= 𝐴 𝑝̃(𝑡) + 𝑛(𝑡)
186
(Refer Slide Time: 17:10)
𝑝1 (𝑡) + 𝑝2 (𝑡)
𝑦̃(𝑡) = 𝑦(𝑡) − 𝐴 ( )
2
= −𝐴 𝑝̃(𝑡) + 𝑛(𝑡)
187
(Refer Slide Time: 18:33)
Thus,
188
(Refer Slide Time: 20:15)
189
(Refer Slide Time: 21:32)
The factor of two is simply a constant and will not affect the SNR.
ℎ(𝜏) = 𝑝1 (𝑇 − 𝜏) − 𝑝2 (𝑇 − 𝜏)
190
(Refer Slide Time: 22:44)
2 2
= √ cos(2𝜋𝑓1 (𝑇 − 𝜏)) − √ cos(2𝜋𝑓2 (𝑇 − 𝜏))
𝑇 𝑇
Thus, we have derived the receive filter for optimal SMR, in the next module we will
derive the SNR and bit error rate for this scheme.
191
Principles of Communication Systems – Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture - 16
Optimal Decision Rule for FSK, Bit-Error Rate (BER) and Comparison with
BPSK, ASK
Hello. Welcome to another module in this massive open online course. So, we are looking
at frequency shift keying all right. And in frequency shift keying well we are considering
2 different wave forms, to represent the information bit 0 and one which is shifted in
frequency correct.
192
(Refer Slide Time: 01:23).
ℎ(𝑡) = 𝑝1 (𝑇 − 𝑡) − 𝑝2 (𝑇 − 𝑡)
193
(Refer Slide Time: 03:02)
∞
𝑟(𝑇) = ∫ 𝑦(𝜏)ℎ(𝑇 − 𝜏)𝑑𝜏
−∞
∞
= ∫ (𝐴𝑝1 (𝜏) + 𝑛(𝜏)) × (𝑝1 (𝜏) − 𝑝2 (𝜏))𝑑𝜏
−∞
194
∞ ∞
= ∫ 𝐴𝑝1 (𝜏)(𝑝1 (𝜏) − 𝑝1 (𝜏))𝑑𝜏 + ∫ 𝑛(𝜏)(𝑝1 (𝜏) − 𝑝2 (𝜏))𝑑𝜏
−∞ −∞
The first term in the above expression is the signal component and the second term are the
noise component.
195
∞
𝑠𝑖𝑔𝑛𝑎𝑙 = ∫ 𝐴𝑝1 (𝜏)(𝑝1 (𝜏) − 𝑝2 (𝜏))𝑑𝜏
−∞
∞ ∞
=∫ 𝐴𝑝12 (𝜏)𝑑𝜏 − ∫ 𝐴𝑝1 (𝜏)𝑝2 (𝜏)𝑑𝜏
−∞ −∞
= 𝐴𝐸𝑃
∞
𝑛̃ = ∫ 𝑛(𝜏)(𝑝1 (𝜏) − 𝑝2 (𝜏))𝑑𝜏
−∞
196
(Refer Slide Time: 08:15)
We know from our previous discussions the above expression will be a gaussian with mean
zero (since n(t) is a gaussian with mean zero).
𝜂 ∞
𝑣𝑎𝑟 = ∫ |ℎ(𝜏)|2 𝑑𝜏
2 −∞
197
𝜂 ∞ 2
= ∫ (𝑝1 (𝜏) − 𝑝2 (𝜏)) 𝑑𝜏
2 −∞
𝜂 ∞ 2
= ∫ (𝑝1 (𝜏) + 𝑝22 (𝜏))𝑑𝜏
2 −∞
Since the signals are orthonormal the last term 2𝑝1 (𝑡)𝑝2 (𝑡) will be zero.
=𝜂
198
This is the noise power after sampling
Corresponding to bit 1,
𝑟(𝑇) = −𝐴𝐸𝑃 + 𝑛̃
199
= −𝐴 + 𝑛̃
𝐴 + 𝑛̃ 𝑓𝑜𝑟 𝑏𝑖𝑡 0
𝑟(𝑇) = {
−𝐴 + 𝑛̃ 𝑓𝑜𝑟 𝑏𝑖𝑡 1
The above expression is similar to BPSK except 𝜎 2 = 𝜂 the noise term is 𝜂/2 for BPSK.
By symmetry we can say that the optimal decision rule will be,
200
𝑟(𝑇) ≥ 0 ⇒ 𝑑𝑒𝑐𝑖𝑑𝑒 𝑏𝑖𝑡 0
{
𝑟(𝑇) < 0 ⇒ 𝑑𝑒𝑐𝑖𝑑𝑒 𝑏𝑖𝑡 1
𝐴
𝐵𝐸𝑅 = 𝑄 ( )
𝜎
201
𝐴2
= 𝑄 (√ )
𝜂
Or,
𝐸𝑏
𝑃𝑒 = 𝑄 (√ )
𝜂
202
Hence BER for FSK is equal to BER for ASK (for equal average energy per bit). And both
of these are greater than BER of BPSK.
We need twice the average bit energy for ASK and FSK to achieve the same BER as
BPSK.
Or we can say ASK and FSK need 3dB more power to achieve the same BER as BPSK
So, BPSK is the amongst these 3 BPSK binary phase shift keying is the most efficient
modulations scheme and the reason for that if you explore it will be because binary phase
shift keying for the same average bit error rate uses antipodal signalling that is it uses
signals plus A and minus A all right which maximises the distance between the
constipation points.
You can note this in formally all though we have not shown this rigorously it maximises
the distance between the conciliation points for the same average bit error rate. While both
amplitude shift keying and frequency shift keying do not do this
203
Principles of Communication Systems – Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture - 17
Introduction to Quadrature Phase Shift Keying (QPSK)
Hello. Welcome to another module in this massive open online course. So, in the previous
module, we have looked at different digital modulation schemes. In this module we will
start looking at another digital modulation scheme that is quadrature phase shift keying or
QPSK.
204
(Refer Slide Time: 01:16)
2
𝑝1 (𝑡) = √ cos(2𝜋𝑓𝑐 𝑡) 𝑓𝑜𝑟 0 ≤ 𝑡 ≤ 𝑇
𝑇
205
2
𝑝2 (𝑡) = √ sin(2𝜋𝑓𝑐 𝑡) 𝑓𝑜𝑟 0 ≤ 𝑡 ≤ 𝑇
𝑇
Where 𝑡 = 𝑘/𝑓𝑐 .
Observe that 𝑝2 (𝑡) is a phase shifted of version of 𝑝1 (𝑡). With a phase shift of 𝜋/2.
206
Also both of these have unit energy (this we can say from previous modules).
i.e.,
𝐸𝑝 = 1 for p1 and p2
207
(Refer Slide Time: 07:35)
∞
∫ 𝑝1 (𝑡)𝑝2 (𝑡)𝑑𝑡
−∞
208
(Refer Slide Time: 08:19)
2 ∞
= ∫ sin(2𝜋𝑓𝑐 𝑡) cos(2𝜋𝑓𝑐 𝑡) 𝑑𝑡
𝑇 −∞
1 𝑇
= ∫ sin(4𝜋𝑓𝑐 𝑡) 𝑑𝑡
𝑇 0
=0
209
Thus, the inner product of the two pulses is zero. They are then orthonormal pulses (unit
energy + orthogonal)
Note that these two basis signals are similar to those in Quadrature carrier multiplexing,
in which the signals are modulated over 𝑚1 cos(2𝜋𝑓𝑐 𝑡) − 𝑚2 sin(2𝜋𝑓𝑐 𝑡)
210
(Refer Slide Time: 12:31)
211
(Refer Slide Time: 16:11)
It is possible to have, 𝑎1 = ±𝐴 and 𝑎2 = ±𝐴, with average bit energy=𝐸𝑏 . Thus, 𝐴 = √𝐸𝑏 .
212
(Refer Slide Time: 18:22)
Here we see that a signals has two orthogonal pulses, thus each can carry 2 bits of
information in one symbol cycle, whereas in previous schemes only one symbol per cycle
could be transmitted.
213
Principles of Communication Systems – Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture - 18
Waveforms of Quadrature Phase Shift Keying (QPSK)
Hello. Welcome to another module in this massive open online course. So, we are looking
at a different digital modulation scheme that is Quadrature Phase Shift Keying.
214
(Refer Slide Time: 00:56)
215
2 2
𝑥(𝑡) = 𝐴√ cos(2𝜋𝑓𝑐 𝑡) + 𝐴√ sin(2𝜋𝑓𝑐 𝑡)
𝑇 𝑇
2 1 1
=𝐴 { cos(2𝜋𝑓𝑐 𝑡) + sin(2𝜋𝑓𝑐 𝑡)}
√𝑇 √2 √2
2𝐴 𝜋
= cos (2𝜋𝑓𝑐 𝑡 − )
√𝑇 4
216
(Refer Slide Time: 04:40)
Similarly the second waveform (𝐴𝑝1 (𝑡) − 𝐴𝑝2 (2)) we can write,
2𝐴 𝜋
= cos (2𝜋𝑓𝑐 𝑡 + )
√𝑇 4
The third waveform (−𝐴𝑝1 (𝑡) + 𝐴𝑝2 (𝑡)) can be written as,
2𝐴 3𝜋
= cos (2𝜋𝑓𝑐 𝑡 + )
√𝑇 4
217
And the fourth, (−𝐴𝑝1 (𝑡) − 𝐴𝑝2 (𝑡)),
2𝐴 3𝜋
= cos (2𝜋𝑓𝑐 𝑡 + )
√𝑇 4
2𝐴 𝜋
cos (2𝜋𝑓𝑐 𝑡 − )
√𝑇 4
2𝐴 𝜋
cos (2𝜋𝑓𝑐 𝑡 + )
4
𝑥(𝑡) = √𝑇
2𝐴 3𝜋
cos (2𝜋𝑓𝑐 𝑡 + )
√𝑇 4
2𝐴 5𝜋
cos (2𝜋𝑓𝑐 𝑡 + )
{√𝑇 4
218
(Refer Slide Time: 08:53)
Note here that each consecutive waveform is shifted from the other by a phase difference
of 𝜋/2. Also the last waveform is shifted form the first with a phase shift of, 𝜋/2. Each
waveform is shifted form it’s neighbouring by a phase of 𝜋/2 or 90𝑜 or a quadrature. This
is the reason why, this scheme is known as Quadrature phase shift keying.
219
(Refer Slide Time: 13:01)
These different waveforms can be represented using a phasor diagram. The four signals
can be represented on a circle of radius 2𝐴/√𝑇 (see figure above.). Each is shifted from
its consecutive by a phase of 𝜋/2. The points at an angle of (from the positive x-axis),
𝜋/4, 3𝜋/4, 5𝜋/4 (-3𝜋/4) and 7𝜋/4 (−𝜋/4).
220
(Refer Slide Time: 16:35)
The four points will be, (A,A), (-A,A), (-A,-A) and (A,-A).
In the subsequent week, we will look into matched filter design for QPSK and its BER
221
Principles of Communication Systems – Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture - 19
Matched Filtering, Bit-Error Rate and Symbol Error Rate for
Quadrature Phase Shift Keying (QPSK)
Hello. Welcome to another module in this massive open online course. So, we are looking
at quadrature phase shift keying and we have looked at several aspects first how to
construct the modulated signal, what are the different pulse wave forms in quadrature shift
keying correct, what is the property that they satisfy that is these different waveforms are
phase shifted from each other by 𝜋/2. Now let us look at the detection the processing at
the receiver and the resulting performance in terms of it is bit error rate.
222
(Refer Slide Time: 01:13)
We have 𝑥(𝑡) = 𝑎1 𝑝1 (𝑡) + 𝑎2 𝑝2 (𝑡). Where, 𝑝1 (𝑡) and 𝑝2 (𝑡) are independent symbols.
Thus, the optimal receive filter for each will be 𝑝1 (𝑇 − 𝜏) and 𝑝2 (𝑇 − 𝜏).
So, this will give us A 1 times E P where E P is the pulse energy which is equal to 1 plus
noise and tilde and this will give us A 2 times E P. So, when you match filter with respect
to P 2 the component with respect to component along the signal space along the signal
space direction P 1 T, that is component along pulse P 1 T will vanish and we get A 2
223
times E p, A 2 times that is the A 2 is a component along the signal space function P 2 or
the signal space pulse P 2 T E P is 1 plus the noise into tilde.
For instance, let us take. So, what we are saying is that basically we match filter with
respect to both. So, first we match filter with respect to P 1 that is the signal space basis
signal P 1 T alright. To make a decision with respect to A 1 alright and make a decision
with respect to the transmitted signal A 2 we match filter with respect to P 2 that is a pulse
P 2 T which is another orthogonal basis function for the signal space.
Now, let us look at how this operates. So, matched filter with respect to. So, when we
match filter for instance. So, let us say we match filter with respect to match filter with
respect to h t equals P 1 T minus tau.
Where, n(t) is the white noise with zero mean and variance 𝑁𝑜 𝛿(𝜏)/2.
224
(Refer Slide Time: 05:02)
∞
= ∫ 𝑎1 𝑝1 (𝜏) + 𝑎2 𝑝2 (𝜏)ℎ1 (𝑡 − 𝜏)𝑑𝜏
−∞
225
∞
= ∫ 𝑎1 𝑝1 (𝜏) + 𝑎2 𝑝2 (𝜏)𝑝1 (𝑡 + 𝜏 − 𝑇)𝑑𝜏
−∞
∞
∫ (𝑎1 𝑝1 (𝜏) + 𝑎2 𝑝2 (𝜏))𝑝1 (𝜏)𝑑𝜏
−∞
226
(Refer Slide Time: 10:09)
The inner product of 𝑝1 (𝑡) and 𝑝2 (𝑡) will be zero, leaving us with inner product of p1 with
p1 which is energy of the pulse 𝐸𝑃 . Thus, we have,
= 𝑎1 𝐸𝑝 = 𝑎1 (𝑠𝑢𝑏𝑠𝑡𝑖𝑡𝑢𝑡𝑖𝑛𝑔 𝐸𝑝 = 1)
227
(Refer Slide Time: 13:11)
𝑟1 (𝑇) = 𝑎1 + 𝑛
̃1
Where, 𝑎1 will be ±𝐴 (where 𝐴 = √𝐸𝑏 ). 𝐸𝑏 being the average energy per bit.
228
By symmetry the optimal detection rule will be,
Similarly, for the second pulse (symbol), we use the match filter ℎ2 (𝜏) on 𝑥(𝑡).
229
(Refer Slide Time: 16:59)
𝑦(𝑡) = 𝑎2 + 𝑛
̃2
Schematically the QPSK scheme can be represented by two independent paths for 𝑝1 and
𝑝2 (see figure above). The paths have received filter ℎ1 (𝑡) and ℎ2 (𝑡), both are sampled at
t=T. and the output of the samplers are 𝑟1 (𝑇) and 𝑟2 (𝑇).
230
(Refer Slide Time: 21:00)
The output of each sampler is similar to BPSK. Both 𝑎1 and 𝑎2 can be ±𝐴.
The bit error rate (BER) of each component will be similar to BPSK.
231
(Refer Slide Time: 22:08)
2𝐸𝑏
𝑝𝑒1 = 𝑝𝑒2 = 𝑝𝑒 = 𝑄 (√ )
𝜂
232
(Refer Slide Time: 24:33)
233
(Refer Slide Time: 26:37)
Probability that the symbol is received correctly in any channel is given by,
𝑝𝑐 = 1 − 𝑝𝑒
Probability that both the channels receive the correct bit will be given by,
234
(Refer Slide Time: 27:54)
235
(Refer Slide Time: 29:54)
= 2𝑝𝑒 − 𝑝𝑒2
236
(Refer Slide Time: 30:53)
2𝐸𝑏 2𝐸𝑏
𝑃𝑒 = 2𝑄 (√ ) − 𝑄 2 (√ )
𝜂 𝜂
√2𝐸𝑏
𝑄( )≪1
𝜂
237
2𝐸𝑏 2𝐸𝑏
⇒ 𝑄 2 (√ ) ≪ 𝑄 (√ )
𝜂 𝜂
2𝐸𝑏
𝑃𝑒 =
̃ 2𝑄 (√ )
𝜂
So, basically roughly this means there are 2 bit is each bit can be in error, bit error rate is
twice roughly twice that of BPSK although it is not exactly true because P e of because
using the prob using the results from probability theory, we know that probability of A
union B is probability of A plus probability of B minus probability of A intersection B. In
fact, that is what the square term represents, but since the square term is much smaller, we
can neglect it at high SNR, and we can simply write it as twice Q of square root of 2 E b
over N naught. That is the overall probability of symbol error for this QPSK modulation
scheme with average bit energy E b alright.
So, we will stop here and explore other schemes in subsequent modules.
238
Principles of Communication Systems – Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture - 20
Introduction to M-ary PAM (Pulse Amplitude Modulation), Average
Symbol Power and Decision Rules
Hello. Welcome to another module. So, in this module let us start looking at M-ary another
digital modulation scheme based on M levels termed as M-ary pulse amplitude
modulation.
239
(Refer Slide Time: 01:39)
𝑥(𝑡) = 𝑎𝑜 𝑝(𝑡)
240
Example if we have 4 levels we need 2 bits for representation, each level can be
represented by one of (00, 01, 10 or 11).
For further discussion, let us take an example of 8 level system. And let the 8 possible
values be ,
Here all the levels in the constellation are separated by 2A. So, this is a M-ary that is 8 ary
PAM pulse amplitude modulated constellation.
241
(Refer Slide Time: 06:15)
𝑀
𝑎𝑜 𝜖 ± (2𝑖 + 1) 𝑖 = 0,1,2, … … . , −1
2
242
(Refer Slide Time: 07:51)
𝑀
−1
2
1
𝐸𝑠 = ∑ (2𝑖 + 1)2 𝐴2 𝑎𝑜 = (2𝑖 + 1)𝐴
𝑀
2 𝑖=0
Here, by symmetry we can say that +𝐴 and −𝐴. Will have the same energy so average can
be calculated by taking one side itself.
243
𝑀
−1
2 2
2𝐴
= ∑ (4𝑖 2 + 4𝑖 + 1)
𝑀
𝑖=0
Using the formula for sum of n integers and sum of square of n integers we can simplify
the above expression.
244
(Refer Slide Time: 13:12)
𝐴2
= (𝑀2 − 1)
3
𝐴2
𝐸𝑠 = (𝑀2 − 1)
3
245
(Refer Slide Time: 15:01)
3𝐸𝑠
𝐴=√ 2
𝑀 −1
246
(Refer Slide Time: 16:37)
2 𝑘
𝑝(𝑡) = √ cos(2𝜋𝑓𝑐 𝑡) 𝑓𝑜𝑟 0 ≤ 𝑡 ≤
𝑇 𝑓𝑐
Where k is an integer.
247
(Refer Slide Time: 18:04)
ℎ(𝑡) = 𝑝(𝑇 − 𝑡)
We can say this because for the matched filter principle we did not assume any specific
modulation scheme or number of levels, only the pulse shape. Thus, for M-ary scheme the
matched filter will be given by the above expression.
248
Output after sampling from the matched filter will be given by,
𝑟(𝑇) = 𝑎𝑜 + 𝑛̃ (𝑎𝑠𝑠𝑢𝑚𝑖𝑛𝑔 𝐸𝑝 = 1)
𝜂𝐸𝑝
Where 𝑛̃ is the additive white gaussian noise process with mean zero and variance =
2
𝜂/2.
Let us derive the decision rule now. For the symbol A the decision rule will be,
𝑖𝑓 0 ≤ 𝑟(𝑇) ≤ 2𝐴 ⇒ 𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝐴
𝑖𝑓 2𝐴 ≤ 𝑟(𝑇) ≤ 4𝐴 ⇒ 𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝐴
Thus, here instead of a single boundary of decision we will have two boundaries for each
level and these will be at the middle points at each level.
249
(Refer Slide Time: 26:18)
The overall decision rule will be a combination of M different cases, one for each level
So, if R t 0 less than equal to R t less than equal to I am sorry again not 0, this is if 0 less
if 2 A less than equal to R t less than equal to 4 a implies decision equals 3 A.
Such decision rule is also known as the nearest neighbor rule as the decision for a symbol
is based on the level nearest to the sampled or received signal mean.
250
The boundary points however will have a slightly different rule as they only have one
neighbor.
Thank you.
251
Principles of Communication Systems – Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture - 21
M-ary PAM (Pulse Amplitude Modulation) - Part II, Optimal Decision Rule,
Probability of Error
Hello, welcome to another module in this massive open online course. So, we are looking
at M-ary PAM: that is M-ary pulse amplitude modulation.
We looked at the optimal decision rule for M-ary PAM scheme, and concluded that for
interior points, the decision is based on nearest neighbour rule.
252
(Refer Slide Time: 01:38)
Nearest neighbour rule means 𝑎𝑜 is decided to be the nearest neighbour of the detected
(output after the filter) value of 𝑟(𝑇).
At he two end points however the situation is slightly different. The end points only have
one neighbour and are given by, −(𝑀 − 1)𝐴 and (𝑀 − 1)𝐴. Thus we can form the
decision rule for end points,
253
(Refer Slide Time: 03:14)
Together with the previously discussed decision rules for interior points these form the
complete set of decision rules.
254
(Refer Slide Time: 06:02)
255
We will now calculate the probability of error.
𝑎𝑜 = 𝐴
Thus, we will have, 0 ≤ 𝑟(𝑇) < 2𝐴. Error occurs if 𝑎𝑜 = 𝐴 but 𝑟(𝑇) ≥ 2𝐴 or 𝑟(𝑇) < 0
256
(Refer Slide Time: 12:13)
Since the above two events are mutually exclusive, we can write,
𝑟(𝑇) = 𝑎𝑜 + 𝑛̃
257
= 𝐴 + 𝑛̃
𝑟(𝑇) ≥ 2𝐴
⇒ 𝑛̃ ≥ 𝐴
And
258
Thus, the probability of error is given by,
Now, 𝑛̃ is a gaussian with zero mean thus is symmetric around zero (or it is an even
function). Thus, the two probabilities in the above sum will be equal. i.e.,
Here we have assumed that the probability of a single point is zero (𝑃𝑟 (𝑛̃ = 𝐴) = 0)
259
(Refer Slide Time: 18:30)
𝐴2
= 2𝑄(√ 𝜂 )
2
260
(Refer Slide Time: 21:32)
Q function is the tail integral of a gaussian or the probability that value of a standard
gaussian is greater than x,
261
(Refer Slide Time: 23:22)
The same can be said for all the interior points. Now, considered the exterior points.
𝑎𝑜 = (𝑀 − 1)𝐴
𝑟(𝑇) = (𝑀 − 1)𝐴 + 𝑛̃
262
(Refer Slide Time: 25:05)
⇒ 𝑛̃ < −𝐴
Thus,
263
(Refer Slide Time: 26:26)
Thus,
𝐴
𝑃𝑒 = 𝑄 ( )
𝜎
264
(Refer Slide Time: 27:53)
1
Pr(𝑎𝑜 = ±(2𝑖 + 1)𝐴) =
𝑀
2
Pr(𝐸𝑛𝑑 𝑝𝑜𝑖𝑛𝑡) =
𝑀
𝑀−2
Pr(𝐼𝑛𝑡𝑒𝑟𝑖𝑜𝑟 𝑝𝑜𝑖𝑛𝑡) =
𝑀
265
𝑃𝑒 = Pr(𝑒|𝑒𝑛𝑑𝑝𝑜𝑖𝑛𝑡) Pr(𝐸𝑛𝑑𝑝𝑜𝑖𝑛𝑡) + Pr(𝑒|𝑖𝑛𝑡𝑒𝑟𝑖𝑜𝑟 𝑝𝑜𝑖𝑛𝑡) Pr(𝑖𝑛𝑡𝑒𝑟𝑖𝑜𝑟 𝑝𝑜𝑖𝑛𝑡)
𝐴 2 𝐴 𝑀−2
= 𝑄 ( ) + 2𝑄 ( )
𝜎 𝑀 𝜎 𝑀
1 𝐴
= 2 (1 − )𝑄 ( )
𝑀 𝜎
𝜂 3𝐸𝑠
𝜎= 𝑎𝑛𝑑 𝐴 = √ 2
2 𝑀 −1
266
(Refer Slide Time: 32:01)
Thus,
1 6𝐸𝑠
𝑃𝑒 = 2 (1 − ) 𝑄 (√ )
𝑀 𝜂(𝑀2 − 1)
267
No. of bits per symbol is given by log 2 𝑀.
In this modulation scheme by loading a large number of bits per symbol, we can increase
the overall data transfer rate without increasing the symbol rate. This is extremely useful
in modern high speed communications like 4G and 5G.
So, average probability of error substituting this expression for both A and n naught by 2
that is twice 1 minus 1 by M Q square root of A square which is 3 E s divided by n square
minus 1 divided by n naught by 2. So, that will be 6, E s divided by n naught into M square
minus 1. So, this is basically the average probability of error for M-ary PAM it is slightly
elaborate, but it is a very interesting derivation it is one of the fundamental expressions
because M-ary PAM is a very general digital modulation scheme.
So, this is let me describe a little bit more about it, now if you substitute M equal to 2 if
you substitute M equal to 2, you can see that probability of error will be twice 1 minus 1
over twice 1 minus half is basically half twice into 2 into half is 1. So, this reduces to Q
square root of well n square minus 1 is M equal to 2. So, M square minus 1 is 3. So, six
divided by 3 is 2 twice E s by n naught which is same as what we had before what we had
before is E b Q of square root of 2 E b over n naught, but for 2 PAM that is when M equal
to 2 the average bit energy is the same as average symbol energy because number of bits
per symbol is 1, alright.
So, average bit energy is the same as average symbol energy and therefore, what you have
seen is we have derived the average bit energy for M-ary PAM as I have said it is a very
general modulation because M-ary PAM, you can transmit log M to the base 2 bits per
symbol. So, unlike the previous modulation schemes such as binary phase shift keying
etcetera you are not limited to 1 bit per symbol you can load a larger number of bits on
each symbol and that is helpful especially in modern high speed wireless communication
systems because if you have a good channel where the SNR is good you can keep loading
a large number of bits on each symbol such as 4 PAM, 8 PAM, 128 PAM.
In fact, we are going to see a much more general modulation scheme next that is known
as quam which is similar to QPSK that is we have 2 basis functions and you can load any
number of bits on each on each basis function correct. Each orthogonal basis function or
each orthogonal basis symbol thereby by increasing the number of bits per symbol for the
same symbol rate you can significantly increase the bit rate and that is how one achieves
268
higher and higher data rates for instance in 4 G wireless communication system or now
people are talking about somewhat five g wireless communication system. So, by loading
by using higher these are known as higher order modulations by using higher values of M
by loading large number of bits on each symbol one is able to achieve higher data rate that
is why the performance of this general modulation scheme is important.
So, by loading; so the final point, I want to mention is number of bits again emphasize
number of bits per symbol equals log M to the base 2 by loading large number of bits per
symbol rather than simply one bit per symbol using large of number of bits per symbol
one is able to achieve higher data rates in for instance 4 G and 5 G. So, these are very
important 4 G and 5 G that is the whole point. So, this is something. So, this is basically
summarizes the probability of error for M-ary PAM, let me just write it over here. So,
write its clear probability of error for M-ary PAM.
So, basically that wraps up this module where we had the previous module we have
introduced M-ary PAM partly discussed the decision rules for the internal points. Now we
have completed the discussion regarding the decision rules for the end points there are 2
end points and what is the probability of error for each point if it is an internal point
probability of error for an end point and therefore, what is the average probability of error
and we have derived an expression for that all right.
269
So, please go through this derivation again because it involves several steps a series of
logical arguments that we have to make and the series of computations that we have to
make at every step and we have used several different properties such as properties of
random variables such as properties of mutually exclusive events properties of mutually
exclusive and exhaustive events the total probability rule etcetera.
So, this is a very interesting derivation that also that of course, tells you what is the average
probability of error what is optimal decision rule what is an M-ary PAM constellation,
what is optimal decision rule, what is a resulting probability of error and also explains or
gives or a also this practical example helps you better understand several principles of
probability and random processes alright.
So, we will stop here. We will look at other aspects in the subsequent modules.
270
Principles of Communication Systems – Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture – 22
M-ary QAM (Quadrature Amplitude Modulation) – Part I, Introduction,
Transmitted Waveform, Average Symbol Energy
Hello. Welcome to another module in this massive open online course. In this module we
will discuss the another digital modulation scheme known as Quadrature amplitude
modulation or QAM.
271
(Refer Slide Time: 01:38)
And to be a bit more precise this is known as M-ary QAM, M-ary quadrature amplitude
modulation.
This is similar to QPSK where we had two pulses, here too we have two pulses 𝑝1 (𝑡) and
𝑝2 (𝑡).
272
(Refer Slide Time: 03:12)
Where, 𝑎1 and 𝑎2 are two separate and independant symbols from a PAM constellation.
2
𝑝1 (𝑡) = √ cos(2𝜋𝑓𝑐 𝑡)
𝑇
273
2
𝑝2 (𝑡) = √ sin(2𝜋𝑓𝑐 𝑡)
𝑇
Where 𝑇 = 𝑘/𝑓𝑐 .
These pulses are identical to QPSK, they have unit pulse energy (𝐸𝑝 = 1) and are shifted
by a face difference of 𝜋/2.
274
For an M-ary QAM the total number of symbols will be M, and we have two sets for 𝑎1
and 𝑎2 . Thus both 𝑎1 and 𝑎2 will be derived from a constellation of √𝑀-ary PAM.
𝑇𝑜𝑡𝑎𝑙𝑛𝑢𝑚𝑏𝑒𝑟𝑜𝑓𝑠𝑦𝑛𝑏𝑜𝑙𝑠𝑖𝑛𝑄𝐴𝑀 = √𝑀 × √𝑀 = 𝑀
275
(Refer Slide Time: 11:29)
𝑎1 , 𝑎2 𝜖√𝑀 − 𝑎𝑟𝑦𝑃𝐴𝑀
√𝑀
𝑎𝑗 = ±(2𝑖 + 1)𝐴0 ≤ 𝑖 ≤ − 1, 𝑗𝜖{1,2}
2
276
𝑀 = 64
√𝑀 = 8
√𝑀
=4
2
=6
277
(Refer Slide Time: 15:59)
𝐸𝑠
𝐴𝑣𝑒𝑟𝑎𝑔𝑒𝑠𝑦𝑚𝑏𝑜𝑙𝑒𝑛𝑒𝑟𝑔𝑦𝑝𝑒𝑟𝑎1 =
2
𝐸𝑠
𝐴𝑣𝑒𝑟𝑎𝑔𝑒𝑠𝑦𝑚𝑏𝑜𝑙𝑒𝑛𝑒𝑟𝑔𝑦𝑝𝑒𝑟𝑎2 =
2
278
𝐸𝑠 𝐴2 2
⇒ = (√𝑀 − 1)
2 3
3𝐸𝑠
⇒𝐴=
2(𝑀 − 1)
Here we have Es over 2 equals a square by 3 into square root of M square minus 1 which
basically implies Es is equal to well this basically implies or the other way round. Rather
a s or rather a is equal to 3 Es by 2 M minus 1 and this is an important.
279
So, the amplitude a or the distance 2 a is such that a for each square root of M-ary PAM a
for each square root of M-ary PAM constituent square root of M-ary PAM is chosen as 3
Es by 2 3 Es by 2 M minus 1. So, this ensures remember what is the constraint here ensures
average energy Es for average energy Es for M-ary for the M-ary QAM. So, alright so,
what we have done here is basically we have introduced a different modulation which is
generalized version of PAM. In fact, for that matter generalization of several scheme
VPSK, QPSK is generalization of BPSK PAM is in a generalization of BPSK and QAM
basically is a generalization of all these such modulation scheme.
So, it contains a combination of 2 PAMs there are 2 constituent PAM constellation pulse
amplitude modulation constitution. Each of containing square root of M symbols therefore,
that QAM is M-ary, alright. So, M-ary QAM is form from 2 constituents square root M-
ary PAMs, each transmitted on one is transmitted on the cosine 2 pi F c t component and
the other is transmitted on the sin 2 pi F c t pulse both of the pulses are in quadrature. So,
it is quadrature amplitude modulation all right, previously we have pulse amplitude
modulation.
Now, we are using 2 pulses which are in quadrature. So, it is quadrature amplitude
modulation each constituent PAM is square root of M symbols. And what we have derived
is basically to ensure an each constituent PAM has a average symbol energy Es by 2 that
ensures energy Es for the a M-ary QAM. And then we have derived a relation for the for
the constant A all right correspond to each constituent A square root M-ary PAM such that
the average symbol energy is Es for the overall QAM constellation.
So, we will stop here and in the next module you will look at what is a appropriate receiver
the processing at the receiver, and the resulting bit error rate and symbol error rate and
also the this overall symbol error rate for the QAM constellation.
280
Principles of Communication Systems – Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture - 23
M-ary QAM (Quadrature Amplitude Modulation) - Part II, Optimal Decision Rule,
Probability of Error, Constellation Diagram
Hello, welcome to another module in this massive open online course. So, we are looking
at M-ary QAM that is M-ary quadrature amplitude modulation, and we are about start over
discussion on the receiver and receive processing and the bit error rate for M-ary QAM,
but before we do that yesterday we derived the value of amplitude A.
3𝐸𝑠
𝐴=√
2(𝑀 − 1)
281
(Refer Slide Time: 00:51)
282
Here, 𝑛(𝑡) is AWGN with autocorrelation 𝜂𝛿(𝜏)/2. The receive filter will be similar to
that of QPSK. i.e., we will have two filters ℎ1 (𝑡) and ℎ2 (𝑡) with ℎ1 (𝑡) = 𝑝1 (𝑇 − 𝑡) and
ℎ2 (𝑡) = 𝑝2 (𝑇 − 𝑡).
The output of the filter after sampling will be a combination of two functions,
𝑟1 (𝑇) = 𝑎1 + 𝑛
̃1
𝑟2 (𝑇) = 𝑎2 + 𝑛̃2
283
(Refer Slide Time: 06:57)
𝑛
̃1 and 𝑛̃2 are AWGN with mean zero and variance 𝜂/2.
284
(Refer Slide Time: 08:06)
Now consider the decision rules. Decision for 𝑎1 will be based on 𝑟1 (𝑇) and decision for
𝑎2 will be based on 𝑟2 (𝑇). These are independent PAM symbols (but are √𝑀 − 𝑎𝑟𝑦-PAM)
not M-ary-PAM.
285
𝐼𝑓 0 ≤ 𝑟𝑖 (𝑇) < 2𝐴 𝑑𝑒𝑐𝑖𝑑𝑒 𝑎𝑖 = 𝐴
.
𝑖𝑛𝑡𝑒𝑟𝑖𝑜𝑟 𝑝𝑜𝑖𝑛𝑡𝑠 .
.
{ .
𝑟𝑖 (𝑇) ≥ (√𝑀 − 2)𝐴 𝑑𝑒𝑐𝑖𝑑𝑒 𝑎𝑖 = (√𝑀 − 1)𝐴
𝑒𝑛𝑑 𝑝𝑜𝑖𝑛𝑡𝑠 {
{ 𝑟𝑖 (𝑇) < −(√𝑀 − 2)𝐴 𝑑𝑒𝑐𝑖𝑑𝑒 𝑎𝑖 = −(√𝑀 − 1)𝐴
equals A if r i T greater than or equal to square root of M minus 2 A decide well a i equal
to square root of M minus where a M minus 1 is and further i equals either 1 or ok.
286
(Refer Slide Time: 14:04)
Simply, we can state that we choose 𝑎𝑖 as the closest constellation point in the √𝑀-ary
PAM to 𝑟𝑖 (𝑇).
Now, the probability of error; remember the probability of the error for each PAM.
287
1 𝐴
𝑃𝑒 = 2 (1 − )𝑄
√𝑀 𝜂
√
( 2)
288
3𝐸𝑠
𝐴=√
2(𝑀 − 1)
The above value of A ensures average energy 𝐸𝑠 for the overall M-ary QAM.
289
Substituting the above gives us,
1 3𝐸𝑠
𝑃𝑒 = 2 (1 − ) 𝑄 (√ )
√𝑀 𝜂(𝑀 − 1)
290
The overall error rate for M-ary QAM is given by,
2
𝑃𝑒,𝑄𝐴𝑀 = 1 − (1 − 𝑃𝑒,𝑃𝐴𝑀 )
2
= 2𝑃𝑒,𝑃𝐴𝑀 − 𝑃𝑒,𝑃𝐴𝑀
𝑃𝑒,𝑄𝐴𝑀 =
̃ 2𝑃𝑒,𝑃𝐴𝑀
291
Dropping the approximate sign for simplicity we have,
1 3𝐸𝑠
𝑃𝑒,𝑄𝐴𝑀 = 2 (1 − ) 𝑄 (√ )
√𝑀 𝜂(𝑀 − 1)
We will now look at a simple constellation diagram for M-ary QAM. Assume M=16. This
means 𝑎1 and 𝑎2 belong to 4-PAM.
292
𝑎1 𝜖 {−3𝐴, −𝐴, 𝐴, 3𝐴}
These can be represented on a x-y plane like format. We take x-axis to be 𝑎1 axis and 𝑎2
to be y-axis. The points can be represented as in figure above as a constellation of 16 points
(4 points on each axis). Not that QAM will always have a square constellation.
293
(Refer Slide Time: 31:02)
𝑀 = 16
#𝑏𝑖𝑡𝑠 = log 2 16
= 4 𝑏𝑖𝑡𝑠
294
In the QAM scheme we can increase M in order to increase the bit-rate without increasing
the symbol transfer rate.
So, there are all several different possibilities. So, you have M equal to of course, if M is
equal to 4 that is QPSK, 16 QAM, 64 QAM, 256 QAM, 1024 and so on. So, these are all
the QAM. So, this is of course, your 4 symbols means 2 bits means one bit on each in
phase and quadrature this is of course, your QPSK; and number of bits equals, 2 4 that is
log to the base to of this quantity 2, 4, 6, 8, 10 of course, you can add 2, 0, 4, 8, 4, 0, 9, 6
and you will have 12 bits. So, the number of bits increases.
We can choose higher order modulation adaptively in order to meet the rate requirements.
So, higher order modulation and adaptive modulation higher order modulation plus.
295
(Refer Slide Time: 35:53)
Higher order and adaptive modulation are important to achieve higher data rates in 3G, 4G
and 5G.
But higher order modulation usage of higher order module higher order modulation is one
of the strategy is one of the main strategies that can be employed to multiply right
multiplied the bit rates several times that is for the same symbol rate right for the same
symbol rate for instance, if you have a symbol rate of 10 mega symbol per second if you
use 4 QAM that is QPSK which has 2 bits per symbol, you get a data rate of 20 bits or 20
296
megabits per second 20 Mbps, and if you can succeed in using 1024 QAM, which has 10
bits per symbol you can get a data rate of 100 Mbps. So, you can see from 20 Mbps to 100
Mbps that is a factor of 5 into 5. So, that is a factor of 5 in data rate.
So, you can go from. So, let us say this can 3G data rate this can be 4G data rate. So, 3G
systems typically or data rates around Mbps or at most 10 Mbps, in 4G you can go up to
data you can go to data rates of 100 Mbps and in fact, more you can go to data rates of
about 200 to 300 Mbps. So, it all depends on the modulation scheme use along of course,
along with several other technologies, but modulation right and adaptively choosing the
modulation depending on the user at user requirements. So, the modulation scheme
depends on the requirement of the user right what kind of the rate the user requires and
also the channel condition, if the wireless channel is able to support such a data rate. So,
adaptively one can modulate on can chose an appropriate modulation right to meet the data
rate requirement and it is of course, deliver the maximum data rate to each user.
297
So, in this module we have covered comprehensively covered QAM that is what is the
constellation what is the structure of QAM what is the structure of the transmitted
waveform what is the processing at the receiver and the associated probability the
probability of error for M-ary QAM and also, the importance of this QAM with respect to
the modern wireless technologies all right. So, we will stop here and continue with the
other aspects in subsequent modules.
298
Principles of Communication Systems – Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture – 24
M-ary PSK (Phase Shift Keying) – Part I, Introduction, Transmitted
Waveform, Constellation Diagram
Hello. Welcome to another module in this massive open online course. So, today in this
module we are going to look at a different modulation scheme that is M-ary PSK or M-ary
phase shift keying, similar to M-ary QAM which is quadrature amplitude modulation and
as M symbols. A M-ary PSK or phase shift keying has M symbols and it is a phase shift
keying based digital modulation scheme.
299
(Refer Slide Time: 01:30)
300
(Refer Slide Time: 02:06)
This is another digital modulation scheme and again this also generalizes the concept of
binary phase shift keying which has 2 phases.
301
(Refer Slide Time: 04:48)
For M-ary PSK, 360o circle is divided into M components, the M possible phases are,
2𝜋 4𝜋 2𝜋(𝑀 − 1)
0, , ,…….,
𝑀 𝑀 𝑀
2𝜋𝑖
(𝑖 + 1)𝑡ℎ 𝑝ℎ𝑎𝑠𝑒 = 0≤𝑖 ≤𝑀−1
𝑀
302
(Refer Slide Time: 07:18)
The scheme reduces to BPSK for M=2, where phases are 0 and 𝜋.
The M phases can be represented on a circle of radius A. The first phase (i=0) will be at
the x-axis and the second will be at 2𝜋/𝑀, then 4𝜋/𝑀 and so on.
303
(Refer Slide Time: 11:10)
2𝜋𝑖 2𝜋
The ith point will have projection 𝐴𝑐𝑜𝑠 ( 𝑀 ) on the x-axis and 𝐴𝑠𝑖𝑛 ( 𝑀 ) on the y-axis.
2𝜋𝑖
𝑎1 = 𝐴𝑐𝑜𝑠 ( )
𝑀
2𝜋𝑖
𝑎2 = 𝐴𝑠𝑖𝑛 ( )
𝑀
304
(Refer Slide Time: 13:09)
2
𝑝1 (𝑡) = √ cos(2𝜋𝑓𝑐 𝑡) 0 ≤ 𝑡 ≤ 𝑇
𝑇
2
𝑝2 (𝑡) = √ sin(2𝜋𝑓𝑐 𝑡) 0 ≤ 𝑡 ≤ 𝑇
𝑇
Where 𝑇 = 𝑘/𝑓𝑐 .
2𝜋𝑘
𝑎1 𝜖 {𝐴𝑐𝑜𝑠 ( ) , 𝑘 = 0, 1, 2, … . , 𝑀 − 1}
𝑀
2𝜋𝑘
𝑎2 𝜖 {𝐴𝑠𝑖𝑛 ( ) , 𝑘 = 0, 1, 2, … . , 𝑀 − 1}
𝑀
305
(Refer Slide Time: 13:37)
306
(Refer Slide Time: 16:06)
2𝜋𝑘 2𝜋𝑘
𝑆𝑘 = (𝐴𝑐𝑜𝑠 ( ) , 𝐴𝑠𝑖𝑛 ( ))
𝑀 𝑀
∞
𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝑠𝑦𝑚𝑏𝑜𝑙 𝑒𝑛𝑒𝑟𝑔𝑦 = 𝐸 {∫ 𝑥 2 (𝑡)𝑑𝑡}
−∞
307
∞
2
= 𝐸 {∫ (𝑎1 𝑝1 (𝑡) + 𝑎2 𝑝2 (𝑡)) 𝑑𝑡}
−∞
Since the inner product of 𝑝1 (𝑡) and 𝑝2 (𝑡) is zero and the total pulse energy for both is
𝐸𝑝 = 1.
= 𝐸{𝑎12 𝐸𝑝 + 𝑎22 𝐸𝑝 }
= 𝐸{𝑎12 + 𝑎22 }
308
2 2
2𝜋𝑘 2𝜋𝑘
= 𝐸 {(𝐴𝑐𝑜𝑠 ( )) + (𝐴𝑠𝑖𝑛 ( )) }
𝑀 𝑀
= 𝐴2
𝐴 = √𝐸𝑏
309
(Refer Slide Time: 22:55)
Thus the radius of the circle for a fixed symbol 𝐸𝑠 will be √𝐸𝑠 .
So, will stop here and look at the receiver design, the detection and also the ensuing
probability of error and other aspects is a subsequent modules.
310
Principles of Communication Systems – Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture – 25
M-ary PSK (Phase Shift Keying) – Part II, Optimal Decision Rule, Nearest
Neighbour Criterion, Approximate Probability of Error
Hello, welcome to another module in this massive on open online course. So, we are
looking at M-ary PSK modulation, and we have said that in M-ary PSK constellation
points are arranged inside as on a circle right circle of radius A equal’s square root of 𝐸𝑠 .
2𝜋𝑘 2𝜋𝑘
𝑆𝑘 = (√𝐸𝑠 cos ( ) , √𝐸𝑠 sin ( ))
𝑀 𝑀
311
(Refer Slide Time: 01:22)
𝑟1 (𝑡) = 𝑎1 𝐸𝑝 + 𝑛̃ = 𝑎1 + 𝐸𝑝
312
𝑟2 (𝑡) = 𝑎2 𝐸𝑝 + 𝑛̃ = 𝑎2 + 𝐸𝑝
𝑎1 and 𝑎2 can be arranged on a circle with radius 𝐴 = √𝐸𝑠 , similarly we can re[resent
𝑟1 (𝑇) and 𝑟2 (𝑇) on the same circle as a different point 𝑟(𝑇) = (𝑟1 (𝑇), 𝑟2 (𝑇))
The decision rule will be to look for the nearest constellation point from 𝑟(𝑇)
313
(Refer Slide Time: 07:54)
min 𝑘||𝑟(𝑇) − 𝑆𝑘 ||
314
(Refer Slide Time: 09:55)
Or,
2𝜋𝑘 2 2𝜋𝑘 2
min 𝑘 √(𝑟1 (𝑇) − √𝐸𝑠 cos ( )) + (𝑟2 (𝑇) − √𝐸𝑠 sin ( ))
𝑀 𝑀
The expression cannot be solved in any manner for an accurate analytical expression of
BER. We will thus employ an indirect approach for BER.
315
(Refer Slide Time: 14:23)
Consider QPSK, each point in the constellation has 2 nearest neighbours and distance
between them is 2√𝐸𝑏 .
𝑑𝑚𝑖𝑛
= 𝛼𝑄 ( )
√2𝜂
316
Here 𝛼 is the number of nearest neighbours and 𝑑𝑚𝑖𝑛 is the distance between such
neighbours.
In M-ary PSK a constellation point will have two nearest neighbours and the distance will
be,
𝜋
𝑑𝑚𝑖𝑛 = 2√𝐸𝑠 sin ( )
𝑀
317
This is a simple expression for distance between a chord.
318
(Refer Slide Time: 23:41)
𝜋
2√𝐸𝑠 sin ( )
𝑃𝑒 = 2𝑄 ( 𝑀 )
√𝜂
𝜋
2𝐸𝑠 sin2 𝑀
= 2𝑄 (√ )
𝜂
This is an approximate expression for the SER but is a very tight approximation (or very
close to the actual value) for high values of 𝐸𝑠 /𝜂.
319
(Refer Slide Time: 24:42)
320
Principles of Communication Systems – Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture – 26
Introduction of Information Theory, Relevance of Information Theory,
Characterization of Information
Hello, welcome to another module in this massive open online course and in this module,
we will change tracks and start looking at an extremely different topic and an extremely
important topic in understanding the performance and behaviour of communication
systems which is termed as information theory.
321
(Refer Slide Time: 01:18)
Let us look at the basic schematic of a digital communication system. It consists of a source
which generates a message signal 𝑚(𝑡) which is then converted into a stream of bits using
an encoder.
These bits are now transmitted over a channel and require a certain bandwidth. Naturally,
we would like to minimizes the usage of resources in the communication system, thus
minimize the bandwidth required for a particular message. We want to convey the
maximum amount of information using the minimum amount of resources. Wo we want
322
to encode a certain amount of information in the minimum number of bits possible.
Information theory helps us systematically achieve this, by helping in maximizing the
information content per bit.
323
(Refer Slide Time: 08:18)
We know how to characterize physical quantities such as weight height even colour. But
how does one characterize abstract quantities such as information? Information helps us
achieve exactly that. It helps us to quantitatively and objectively characterize information.
i.e., it provides a tangible measure of information. It provides us a quantitative and
mathematical framework to characterize information.
324
(Refer Slide Time: 10:51)
325
(Refer Slide Time: 14:37)
A central notion in information theory is the concept of an information source. i.e., the
information to be encoded is generated by a source.
Let’s assume we have a discreet source which generates the symbol set
𝑆 𝜖 {𝑆𝑜 , 𝑆1 , 𝑆2 , 𝑆3 … . 𝑠𝑀−1 }. The set S is known as source alphabet.
326
(Refer Slide Time: 17:40)
327
(Refer Slide Time: 19:36)
Pr(𝑆𝑖 ) = 𝑃𝑖
For a continuous source, the information will be a continuous function and will have a
continuous probability function. But here we are considering a discreet source. From
probability theory we can say that,
𝑃𝑖 ≥ 0
𝑀−1
Σ𝑖=0 𝑃𝑖 = 0
328
(Refer Slide Time: 20:31)
Let us now discuss the characterize the information content of a symbol. The information
content is given by,
1
𝐼(𝑆𝑖 ) = log 2
𝑃𝑖
329
(Refer Slide Time: 22:24)
330
(Refer Slide Time: 25:37)
This means higher the probability of occurrence of a symbol, lower is its information
content.
𝐼𝑓 𝑃𝑖 → 1 ⇒ 𝐼(𝑠𝑖 ) → 0
{
𝐼𝑓 𝑃𝑖 → 0 ⇒ 𝐼(𝑠𝑖 ) → ∞
331
This is intuitively as well. Think of news, rare events make news, regular events do not.
Thus, rare events contain more information than regular ones. 1/𝑃𝑖 is a measure of
infrequency or rarity of an event. Thus, rarer an event, higher the value of 1/𝑃𝑖 and higher
the amount of information it contains.
We will start with this definition and further keep refining it progressively and look at
various other norms, various other measures to characterize the information, content of a
source and the relevance of all these aspects and the relevance in fact of this whole area of
information theory in communications in our future modules.
332
Principles of Communication Systems – Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture – 27
Definition of Entropy, Average Information/ Uncertainty of Source, Properties of
Entropy
Hello. Welcome to another module in this massive open online. So, in this module
alright, let us start looking at a new topic that is entropy which is one of the most
fundamental aspects of information theory, which we will going to define shortly. So,
this is entropy.
So, we have seen in our discussion that, I si the information per symbol with the
corresponding to source alphabet S I is log 1 to the base log 1 over Pi to the base 2 bit is
correct right. And well Pi we have seen this is nothing, but the probability of the symbol
S I and we have seen that Isi this quantity tends to infinity as Pi S the probability as base
equally Pi tends to 0. And I si tends to 0 that is for certain events that is when the event
occurs with probability 1 I si tends to 0 information tends to 0 alright. So, frequently
occurring event have less information rare events have higher amount of information
associated with them.
333
(Refer Slide Time: 02:14)
Now, the average information the average information of the source can now be defined
as the average information of source S well average information of source S is simply the
expected value this is I of S which is expected value of I of well expected value of I of S
which is basically nothing, but well this is summation i equal to 0 summation i equal to 0
to m minus 1 log 2. So, this is Pi, log 2 or let me write it this way. This is Pi times the
information that probability of symbol i times the information associated with symbol i.
334
So, we are computing the weighted average. So, this is equal to Pi log to the base log 2 to
the base 1 over Pi. And this quantity it is expected value of s this quantity is termed as H
of S H of S equals Pi log 1 over Pi to the base 2. This is termed as it is a fundamental
quantity of information theory is termed as the entropy. This is termed as the entropy of
sources.
So, this is termed as the entropy of source S. So, it is basically information of symbol i
this is your information of symbol i weighted by the probability Pi, weighted by it is
probability of occurrence. So, it is the expected value as a average information of the
source which is obtained by weighing the information of each symbol S I that is I si with
it is probability Pi and summing over all the alphabets summing over the entire source
alphabet, alright.
So, this quantity is termed as the entropy of the source and as we have said is a
fundamental quantity in information theory. It can characterizes the average information
per source average it characterizes. So, this characterizes entropy characterizes or
quantifies average information content, average information of the source average
information of the source as in the information average information per symbol one can
also think of average information per symbol emitted by the source. Emitted or generated
emitted or generated. This can also be thought of as entropy can also be thought of as
average uncertainty.
335
(Refer Slide Time: 07:16)
It is the average the average uncertainty associated with the source. So, more uncertainty
also implies basically more randomness or so more uncertainty.
So, more uncertainty more uncertainty implies more information. So, you can also be
thought of as the average uncertainty right. So, this symbol what is uncertain about the
next symbol that is going to be generated about the source all right. There is which
means that there is greater uncertainty, which captures the fact that that is created
information in the symbols generated by the source. If this if there is a greater certainty if
we know for sure what the or if we show or if you know with great certainty what the
next symbol that is generated by the source is going to be; that means, on an average
there is less information in the symbols or less information content in the symbols that is
generated by the source.
We are going to see this also of course, as we proceed through subsequent aspects or
subsequent examples of entropy.
336
(Refer Slide Time: 09:10)
Now, let us start by looking at some simple properties of entropy, if you look at the
definition of entropy which is Pi log on or p an observed that well Pi greater than equal
to 0 less than equal to 1 over Pi less than or equal to 1 this implies well one less than or
equal to 1 over Pi less than. So, 1 over Pi is always basically greater than or equal to 1
which implies log 1 over Pi to the base 2 is greater than or equal to 0. Since 1 over Pi is
greater than equal to 1. So, Pi is greater than equal to 1 log to the 2 to the base 1 over Pi.
Log to the base 2 1 over p is greater than equal to 0. So, Pi is greater than equal to 0
correct log 1, over Pi to the base 2 is greater than equal to 0, this implies both of these
are non negative quantities.
337
(Refer Slide Time: 10:33)
So, therefore, Pi log 1 over Pi to the base 2 is greater than or equal to 0 for each i. So,
this is basically this entropy is the sum of non negative terms, implies H of S is also
greater than or equal to 0. So, entropy is non negative correct. Entropy can be zero, but it
cannot be negative. In fact, we will see examples where entropy is 0. So, entropy has to
be greater than equal to. So, this also means that the information content of a source
cannot be cannot be negative. It is non negative either 0 or greater than 0 entropy of a
source the information content of a source has to be greater than or equal to 0.
338
Okay. So, this basically means that the information content, information content of
source is greater than equal to 0.
Further observe that if Pi equal to 1, then Pi log to the base 2 or Pi equal to 1 times log to
the base 2 1 over 1 this is 0. So, this is equal to 0.
So, this quantity equals 0 for Pi well for Pi equal to 1. Now what happens if Pi equal to 0
what happens to this quantity Pi equals 0 Pi equals 0, let us look at this quantity Pi log to
the base 2, 1 over Pi, which is equal to I can write it as log to the base 2, 1 over log to the
base 2 1 over Pi divided by 1 over p i.
339
(Refer Slide Time: 13:34)
Now, you can see here that both the numerator that is if you look at this quantity here,
numerator and the denominator tend to infinity as Pi tends to 0 as Pi tends to 0 both the
numerator and denominator tend to tend to infinity. So, this is undefined. So, this
expression at 0 in that sense it is undefined.
So, let us look at the limit as Pi tends to 0. And naturally if both the numerator and
denominator tend to infinity as Pi tends to 0, we can use the lthopital's rule right which is
the rule in calculus to evaluate the limit. So, correct.
340
So, now let us calculate the limit of this as Pi tends to 0, that is log the base to 1 over Pi
which is equal to well, I will write the numerator as minus log to the base 2 Pi 1 over Pi
which is equal to.
Now, again I can differentiate this I can take the derivative of the numerator since it is ill
defined on substitution of 0, I can take minus log 2 to the base Pi. I can differentiate the
numerator and denominator differentiate the numerator and denominator with respect to
Pi. This is termed as a L'Hospital's Rule which is basically from calculus to evaluate
correct the limit of precise with such quantities which are ill defined.
341
(Refer Slide Time: 16:18)
Now, if I substitute this that becomes limit Pi tends to 0, if you look at this that becomes
well the numerator is well d by d minus log Pi to the base 2. I can write this as Pi 2 the
base that is your traditional logarithm to the base e, to log e to the base 2 divided by well
I can differentiate with respect to Pi that gives me minus 1 over Pi square. And finally,
differentiating the numerator, I get minus log minus derivative of log is 1 over Pi minus
log in to log e to the base 2 into Pi square from the denominator will come to the
numerator. So this is basically limit Pi tends to 0, well Pi into log 2 p to the base 2 which
is equal to 0. So, basically what we have established is if you look at this is limit that is.
342
Since we cannot directly substitute Pi equal to 0 into Pi log 1 over Pi limit, p d Pi tends
to 0 Pi log to the base 2 1 over Pi also tends to 0.
So, you can say. So, and also remember we have also seen before that Pi log to the base
2 1 over Pi equals also equals 0 for certain events that is for Pi equal to 1. So, we have
seen that for both the what you see is that as Pi tends to 0 all for Pi equal to 1 the average
information that is Pi log to the base 2 1 over Pi is basically 0 that is for both events
which occur with probability one, and events which tend to occur with probabilities close
to 0 or the rarest of the rare events which rarely occur which means that the probability
of occurrence is close to 0 for both such events both types of such events the average
information associated with 0. And this is this can be understood as follows that of
course, as we have seen events which occur very frequently that is the probability close
to one naturally the information associated with them is 0.
So, the average information associated with them is 0 as we have seen before, but what
we have seen is that although information associated with the rare events which are
probability close to 0 is very high, since they occur extremely infrequently the average
information associated with them is also 0 all right. So, Pi log to the 2 log to the base 2 1
over Pi is 0 when Pi equal to 1 or when Pi is also close to 0. So, that is an important
point.
343
So, this basically shows that this is basically equal to. So, 1 over Pi log to the base 2 to
the base. So, what we have seen is Pi intuitively this is close to 0 for both very common
and very rare very common, and very common meaning, Pi approximately equal to 1,
very rare meaning Pi approximately equal to 0. So, that is what we have, so all right.
So, in this module what we have seen is basically we have defined a very important
quantity, which is the entropy all right which is basically the average information
associated with each symbol of the source which is expected value of the information
expected value of the information per symbol that is I si of the source which denotes the
average information that is associated per each symbol of the source. Which also said
this characterizes the uncertainty associated with the source all right, which is also
basically used to characterize or quantify the information content of a particular source
and we have also seen that Pi log to the base 2 1 over Pi is equal to 0 when Pi is 1 or P is
very close to 0.
So, we will stop here and continue with other aspects in the subsequent modules.
344
Principles of Communication Systems – Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture – 28
Entropy Example – Binary Source, Maximum and Minimum Entropy of Binary
Source
Hello, welcome to another module in this massive open online course. So, we are looking
at various aspects of information theory and we have also defined the concept of entropy
correct and we have seen various properties of the entropy of the source which based
characterizes the average information or also the average uncertainty associated with this
source. Average information per symbol of a source or the average uncertainty that is
associated with this (Refer Time: 00:41). So, let us now look at an example to understand
this better.
𝑆 = {𝑆𝑜 , 𝑆1 }
Pr(𝑆𝑜 ) = 𝑝
345
⇒ Pr(𝑆1 ) = 1 − 𝑝 (𝑠𝑖𝑛𝑐𝑒 𝑠𝑢𝑚 𝑜𝑓 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑖𝑒𝑠 𝑖𝑠 𝑜𝑛𝑒)
1 1
𝐻(𝑠) = Pr(𝑆𝑜 ) log 2 + Pr(𝑆1 ) log 2
Pr(𝑆𝑜 ) Pr(𝑆1 )
1 1
𝐻(𝑠) = 𝑝 log 2 + (1 − 𝑝) log 2
𝑝 1−𝑝
346
(Refer Slide Time: 03:40)
If we replace p by 1-p,
1 1
𝐻(1 − 𝑝) = (1 − 𝑝) log 2 + (1 − (1 − 𝑝)) log 2
1−𝑝 1 − (1 − 𝑝)
1 1
= (1 − 𝑝) log 2 + 𝑝 log 2
1−𝑝 𝑝
347
Now let as find the entropy for p=0,
1 1
lim 𝐻(𝑝) = lim 𝑝 log 2 + (1 − 𝑝) log 2
𝑝→0 𝑝→0 𝑝 1−𝑝
1
= lim 𝑝 log 2 ( ) = 0 (𝑓𝑟𝑜𝑚 𝑝𝑟𝑒𝑣𝑖𝑜𝑢𝑠 𝑚𝑜𝑑𝑢𝑙𝑒)
𝑝→0 𝑝
Thus we have,
lim 𝐻(𝑠) = 0
𝑝→0
348
(Refer Slide Time: 07:40)
= lim 𝐻(𝑝) = 0
𝑝→0
349
(Refer Slide Time: 10:07)
Now 𝑝 = 0 means Pr(𝑠𝑜 ) = 0 and Pr(𝑠1 ) = 1, i.e., source always generates 𝑠1 . And
similarly, for p=1, source always generated 𝑠𝑜 .
In both the cases there is no uncertainty. And hence the information content and thereby
the entropy both are zero.
350
(Refer Slide Time: 13:07)
𝑑 𝐻(𝑝)
𝐻 ′ (𝑝) =
𝑑𝑝
𝑑
= (−𝑝 log 2 𝑝 − (1 − 𝑝) log 2 (1 − 𝑝))
𝑑𝑝
= − log 2 𝑝 + log 2 (1 − 𝑝)
351
(Refer Slide Time: 15:21)
𝐻 ′ (𝑠) = 0
log 2 𝑝 = log 2 (1 − 𝑝)
⇒ p= 1−p
1
𝑜𝑟, 𝑝 =
2
352
The entropy is maximized when, both the symbols have equal probability, i.e.,
1
Pr(𝑠𝑜 ) = Pr(𝑠1 ) =
2
𝑑 ′
𝐻 ′′ (𝑠) = 𝐻 (𝑝)
𝑑𝑝
1 1 1
=− ( + )≤0
ln 2 𝑝 1 − 𝑝
Hence the point will be maxims (since 𝐻 ′ (𝑠) = 0 and 𝐻 ′′ (𝑠) < 0).
1 1 1
𝐻 (𝑝 = ) = log 2 2 + log 2 2 = 1
2 2 2
353
(Refer Slide Time: 18:50)
354
(Refer Slide Time: 20:22)
355
(Refer Slide Time: 22:06)
𝐻(0) = 𝐻(1) = 0
Plotting the function for H(s), will be a convex function, zero at 0 and one, and achieving
a maximum value of 1 at p=0.5. Entropy is a measure of the number of bits required per
symbol. For p=0.5, we will need one bit per symbol. For other values of p, at least
theoretically we will need less than 1 bit per symbol.
So, for instance less than 1 implies lower than 1 symbol can be represented using less than.
So, implies can be represented and this is what the encoding process does right, the
encoding process expected to do is take this source symbols and come up with an efficient
code conduct with an efficiency stream write as information bits stream to represent this
using the minimum number of bits right take to binary symbols S naught and S 1 correct,
S naught and S 1 and come up with an information bit stream to represent this efficiently
using the lowest the least number of possible bits that is the lowest possible bits in an
average sense. That is the average number of bits used to represent each symbol must be
the smallest and that is what we are going to this process termed as coding or encoding. In
fact, that is the theory regarding this is what we are going to look at in detail as you go
through the rest of the modules of this course.
356
So, basically that is what we have seen we have seen an interesting example of a binary
source with 2 symbols S naught an S 1 probabilities P and 1 minus P we have looked at
what happens at the extremes that is P equal to 0 P equal to 1 there is no uncertainty, the
information is 0, maximum information or entropy occurs at P equal to half. And of course,
if you plot it, it looks like a nice concave function which rises starts from 0 at P equal to 0
rises to 1 at P equal to half and falls down again to 0 at P equal to 1, all right.
So, will stop here and look at explorer other aspects, other properties study this behavior
of entropy further in a subsequent modules.
357
Principles of Communication Systems - Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture – 29
Maximum Entropy of Source with M-ary Alphabets,
Concave/ Convex function, Jensen’s Inequality
Hello, welcome to another module in this massive open online course. So, we are looking
at entropy, entropy of a source and the various properties of entropy all right. Let us now
look at maximum entropy that is when is the entropy maximized.
So, let us look at entropy maximization or maximum entropy basically into understand
when is entropy maximized.
358
(Refer Slide Time: 00:48)
𝑆 = {𝑠𝑜 , 𝑠1 , 𝑠2 𝑠3 }
The probabilities of each symbol are given by, 𝑝𝑜 = 0.5,𝑝1 = 0.25, 𝑝2 = 𝑝3 = 0.125
1 1 1 7
𝐻(𝑠) = 0.5 log 2 + 0.25 log 2 + 2 × 0.125 log 2 =
0.5 0.25 0.125 4
359
Thus, we have on average 1.75 buts per symbol.
= 1.75
The symbols which occur with smaller probability have a larger number of bits (larger
amount of information). There is a fundamentally relation between entropy and the
average number of bits. Average number of bits in a system is equal to the average entropy.
This relation will be explored more rigorously in future modules.
360
(Refer Slide Time: 04:32)
361
Now suppose that each symbol in the alphabet is equiprobable, i.e.,
1
𝑝1 = 𝑝2 = 𝑝3 = 𝑝4 =
4
1
𝐻(𝑠) = 4 × log 2 4
4
362
(Refer Slide Time: 08:42)
Let us now define a concave function. Suppose a function F(x) (refer to pict. above). And
𝑥1 and 𝑥2 be two points on the x-axis in the domain of the function. Then, for a concave
function, the cord joining the points (𝑥1 , 𝐹(𝑥1 )) and (𝑥2 , 𝐹(𝑥2 )) always lies bellow the
function.
Similarly, a function is concave for the inverse relation (the cord is above the function).
363
(Refer Slide Time: 11:17)
364
(Refer Slide Time: 13:29)
For a log function it can be easily observed that it is a concave function (see pict. above).
365
(Refer Slide Time: 15:31)
𝐹(𝑥) = log 2 𝑥
It can be shown that. if the second derivative of a function is negative, then the function is
a concave function.
1 1
𝐹 ′′ (𝑥) = (− 2 ) ≤ 0
ln 2 𝑥
366
For a concave function, suppose we substitute (𝜃 = 𝑝1 ) and (1 − 𝜃 = 𝑝2 ). Then 𝑝1 and 𝑝2
will be probabilities. Thus, we can write,
𝑀−1 𝑀−1
𝐹 ( ∑ 𝑝𝑖 𝑥𝑖 ) ≥ ∑ 𝑝𝑖 𝐹(𝑥𝑖 )
𝑖=0 𝑖=0
367
Now,
𝑀−1
∑ 𝑝𝑖 𝑥𝑖 = 𝐸{𝑥}
𝑖=0
Thus, we have,
𝐹(𝐸{𝑥}) ≥ 𝐸{𝐹(𝑥)}
368
(Refer Slide Time: 22:20)
𝑀−1 𝑀−1
1 1
𝐻(𝑠) = ∑ 𝑝𝑖 log 2 ≤ log 2 ∑ 𝑝𝑖 . = log 2 𝑀
𝑝𝑖 𝑝𝑖
𝑖=0 𝑖=0
369
(Refer Slide Time: 23:47)
Hence, we have,
𝐻(𝑠) ≤ log 2 𝑀
1
Pr(𝑠𝑖 ) = ∀ 𝑖 𝜖 {0,1,2,3,4, … . 𝑀 − 1}
𝑀
370
(Refer Slide Time: 26:01)
371
Principles of Communication Systems - Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture - 30
Joint Entropy - Definition of Joint Entropy of Two Sources, Simple Example for
Joint Entropy Computation
Hello, welcome to another module in this massive open online course. So, in this module
let us start looking at another new topic that is entropy. So, far we have looked at different
properties of entropy when is entropy maximized let us start looking at another concept
that is joint entropy.
So, we will now start looking at a new concept which is the joint entropy you can think of
this has the joint information into the combined information. So, basically you can think
of this as the; we will talk more about this later you can think of this as the combined
information when you have multiple sources.
372
(Refer Slide Time: 01:09)
For example, consider communication channel with input 𝑋 and output 𝑌 with size M and
N respectively having the alphabet,
373
Arranging both the input and the output in a tabular form we can have a 𝑀 × 𝑁 table, each
cell of the table will have an input symbol 𝑠𝑖 and output symbol 𝑟𝑗 and suppose it has a
probability 𝑝𝑖𝑗 .
𝑝𝑖𝑗 = Pr(𝑋 = 𝑠𝑖 ∩ 𝑌 = 𝑟𝑗 ) 0 ≤ 𝑖 ≤ 𝑀 − 1; 0 ≤ 𝑗 ≤ 𝑁 − 1
One can think of (𝑋, 𝑌) as a 2- dimensional source of symbols with total number of
symbols=MN.
374
(Refer Slide Time: 09:35)
𝑀−1 𝑁−1
1
𝐻(𝑋, 𝑌) = ∑ ∑ 𝑝𝑖𝑗 log 2
𝑝𝑖𝑗
𝑖=0 𝑗=0
375
(Refer Slide Time: 11:57)
376
(Refer Slide Time: 15:05)
𝑀−1 𝑁−1
∑ ∑ 𝑝𝑖𝑗 = 1
𝑖=0 𝑗=0
377
1 1 1 1
= +2× +6× +4× =1
4 8 16 32
Secondly,
𝑀−1
∑ Pr(𝑠𝑖 , 𝑟𝑗 ) = Pr(𝑟𝑗 )
𝑖=0
378
(Refer Slide Time: 20:05)
379
(Refer Slide Time: 22:25)
Similarly,
𝑁−1
∑ Pr(𝑠𝑖 , 𝑟𝑗 ) = Pr(𝑠𝑖 )
𝑗=0
1
𝐻(𝑠) = ∑ ∑ 𝑝𝑖𝑗 log 2
𝑝𝑖𝑗
𝑖 𝑗
380
1 1 1 4
= 2 × log 2 8 + 6 × log 2 16 + log 2 4 + log 2 32
8 16 4 32
27
= = 3.375 𝑏𝑖𝑡𝑠
8
Well this will give you 3 over 4 plus. So, this will be 16 4 divide 1 over 4. So, this will be
3 over 2 plus half plus 5 over 8 that is if you look at this; this will be over 8 you have 6
plus 12 plus 4 plus 5 that is basically you have express for 10 22 plus 5 that is 27 over 8
which is equal to 3.375 bit.
381
(Refer Slide Time: 27:11)
So, basically what we have done now is we have computed the joint entropy of X and Y
the (Refer Time: 27:51) is denoted by h of X comma Y. So, in this module basically we
will introduce we motivated this problem of joint entropy as considering a simple example
of a communication system in which your transmitting symbols X receiving symbols Y.
So, we would like to understand the interplay better understand the interplay or the
information various aspects of the information between of corresponding to these 2 point
it is just a import X and the output Y. We have defined this notion of joint entropy and we
have seen a simple example and computed the joint entropy. We have further explored the
properties of joint entropy and their relevance in the context of communication in the
subsequent modules.
382
Principles of Communication Systems - Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture – 31
Properties of Joint Entropy, Relation between Joint Entropy and Marginal
Entropies
1
𝐻(𝑋, 𝑌) = ∑ ∑ 𝑝𝑖𝑗 log 2
𝑝𝑖𝑗
𝑖 𝑗
383
(Refer Slide Time: 01:23)
In the previous module we saw an example with joint entropy of 3.375 with total 16
symbols.
384
(Refer Slide Time: 02:42)
1 1 1 1
𝐻(𝑥) = 𝐻 ( , , , )
4 4 4 4
1
= 4 × log 2 4 = 2 𝑏𝑖𝑡𝑠
4
385
(Refer Slide Time: 04:14)
1 1 1 1
𝐻(𝑌) = 𝐻 ( , , , )
2 4 8 8
1 1 1
= log 2 2 + log 2 4 + 2 × log 2 8
2 4 8
= 1.75 𝑏𝑖𝑡𝑠
386
(Refer Slide Time: 06:41)
This property is intuitive as well, as the joint information or the combined information will
be greater than or equal to the individual information or,
𝐻(𝑋, 𝑌) ≥ 𝐻(𝑋)
387
(Refer Slide Time: 09:32)
388
1
𝐻(𝑋) = ∑ Pr(𝑠𝑖 ) log 2
Pr(𝑠𝑖 )
𝑖
1
= ∑ ∑ 𝑝𝑖𝑗 log 2 (𝑏𝑦 𝑡𝑜𝑡𝑎𝑙 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑟𝑢𝑙𝑒)
𝑝𝑖
𝑖 𝑗
1
𝐻(𝑌) = ∑ 𝑝𝑗 log 2
𝑝𝑗
𝑗
1
= ∑ ∑ 𝑝𝑖𝑗 log 2
𝑝𝑖𝑗
𝑗 𝑖
389
(Refer Slide Time: 14:51)
1 1 1
= ∑ ∑ 𝑝𝑖𝑗 log 2 − ∑ ∑ 𝑝𝑖𝑗 log 2 − ∑ ∑ 𝑝𝑖𝑗 log 2
𝑝𝑖𝑗 𝑝𝑖 𝑝𝑗
𝑖 𝑗 𝑖 𝑗 𝑖 𝑗
𝑝𝑖 𝑝𝑗
= ∑ ∑ 𝑝𝑖𝑗 log 2
𝑝𝑖𝑗
𝑖 𝑗
390
(Refer Slide Time: 15:55)
391
(Refer Slide Time: 18:22)
392
From Jensen’s inequality we know that,
∑ 𝑝𝑖 log 2 𝑥𝑖 ≤ log 2 ∑ 𝑝𝑖 𝑥𝑖
𝑖 𝑖
𝑝𝑖 𝑝𝑗 𝑝𝑖 𝑝𝑗
⇒ ∑ 𝑝𝑖𝑗 log 2 ( ) ≤ log 2 ∑ 𝑝𝑖𝑗 ( )
𝑝𝑖 𝑝𝑗 𝑝𝑖 𝑝𝑗
𝑖,𝑗 𝑖,𝑗
= log 2 ∑ 𝑝𝑖 𝑝𝑗
𝑖,𝑗
= log 2 (∑ 𝑝𝑖 ∑ 𝑝𝑗 )
𝑖 𝑗
= log 2 1 = 0
393
(Refer Slide Time: 21:23)
394
(Refer Slide Time: 24:15)
Hence, we have,
Joint information joint entropy is less than or equal to the sum of the entropies of the
individual sources
395
Let us, we will stop this module here and look at other aspects in the subsequent modules.
396
Principles of Communication Systems - Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture – 32
Conditional Entropy, Example of Conditional Entropy, Properties of Conditional
Entropy
Hello, welcome to another module in this massive open online course. So far we have seen
several concepts in information theory; we have seen the definition of entropy, joint
entropy.
397
(Refer Slide Time: 01:12)
𝑀−1
𝑀−1 𝑁−1
1
= ∑ Pr(𝑋 = 𝑠𝑖 ) ∑ Pr(𝑌 = 𝑟𝑗 |𝑋 = 𝑠𝑖 ) × log 2
𝑃(𝑌 = 𝑟𝑗 |𝑋 = 𝑠𝑖 )
𝑖=0 𝑗=0
1
= ∑ ∑ Pr(𝑠𝑖 ) Pr(𝑟𝑗 |𝑠𝑖 ) log 2
𝑃𝑟(𝑟𝑗 |𝑠𝑖 )
𝑖 𝑗
1
= ∑ ∑ Pr(𝑠𝑖 , 𝑟𝑗 ) log 2
Pr(𝑟𝑗 , 𝑠𝑖 )
𝑖 𝑗
398
(Refer Slide Time: 02:09)
399
(Refer Slide Time: 05:14)
𝐻(𝑌|𝑋) + 𝐻(𝑋)
1 1
= ∑ ∑ Pr(𝑠𝑖 , 𝑟𝑗 ) log 2 + ∑ Pr(𝑠𝑖 ) log 2
Pr(𝑟𝑗 |𝑠𝑖 ) Pr(𝑠𝑖 )
𝑖 𝑗 𝑖
1 1
= ∑ ∑ Pr(𝑠𝑖 , 𝑟𝑗 ) log 2 + ∑ ∑ Pr(𝑠𝑖 , 𝑟𝑗 ) log 2
Pr(𝑟𝑗 |𝑠𝑖 ) Pr(𝑠𝑖 )
𝑖 𝑗 𝑖 𝑗
400
1
= ∑ ∑ Pr(𝑠𝑖 , 𝑟𝑗 ) log 2
Pr(𝑟𝑗 |𝑠𝑖 ) Pr(𝑠𝑖 )
𝑖 𝑗
1
= ∑ ∑ Pr(𝑠𝑖 , 𝑟𝑗 ) log 2
Pr(𝑠𝑖 , 𝑟𝑗 )
𝑖 𝑗
= 𝐻(𝑋, 𝑌)
401
(Refer Slide Time: 08:24)
402
(Refer Slide Time: 10:18)
403
(Refer Slide Time: 13:15)
Consider the 4x4 table of X and Y probabilities form the previous example (see pict.
above).
404
(Refer Slide Time: 17:37)
𝑃(𝑌|𝑋 = 𝑠𝑜 ) = 𝑃(𝑌 = 𝑟𝑗 |𝑋 = 𝑠𝑜 )
Pr(𝑌 = 𝑟𝑗 , 𝑋 = 𝑠𝑜 )
=
Pr(𝑋 = 𝑠𝑜 )
1/8
Now, Pr(𝑌 = 𝑟𝑜 |𝑋 = 𝑠𝑜 ) = 1/4=1/2. Similarly, 𝑃(𝑌 = 𝑟1 |𝑋 = 𝑠𝑜 ) = 1/4 and
1
Pr(𝑌 = 𝑟2|𝑋 = 𝑠𝑜 ) = 8 = Pr(𝑋 = 𝑟3 |𝑋 = 𝑠𝑜 ).
405
(Refer Slide Time: 21:10)
1 1 1 1
𝐻(𝑌|𝑋 = 𝑠𝑜 ) = 𝐻 ( , , , ) = 1.75𝑏𝑖𝑡𝑠
2 4 8 8
1 1 1
𝐻(𝑌|𝑋 = 𝑠1 ) = 𝐻 ( , , , 0) = 1.7𝑏𝑖𝑡𝑠
4 2 8
1 1 1 1
𝐻(𝑌|𝑋 = 𝑠2) = 𝐻 ( , , , ) = 2
4 4 4 4
406
𝐻(𝑌|𝑋 = 𝑠3 ) = 𝐻(1,0,0,0) = 0
= 1.375𝑏𝑖𝑡𝑠
407
(Refer Slide Time: 25:48)
𝐻(𝑋) = 2𝑏𝑖𝑡𝑠
𝐻(𝑋, 𝑌) = 3.375𝑏𝑖𝑡𝑠(𝑝𝑟𝑒𝑣𝑖𝑜𝑢𝑠𝑚𝑜𝑑𝑢𝑙𝑒)
We can now easily verify that 𝐻(𝑌|𝑋) + 𝐻(𝑋) = 𝐻(𝑋, 𝑌). Similarly we can also verify
that 𝐻(𝑋, 𝑌) = 𝐻(𝑋|𝑌) + 𝐻(𝑌).
408
(Refer Slide Time: 27:59)
So, we will stop here and continue with other aspects in the subsequent modules.
409
Principles of Communication Systems - Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture - 33
Mutual Information, Diagrammatic Representation, Properties of Mutual
Information
Hello welcome to another module in this massive open online course. Today let us look at
another new concept information theory that is a concept of mutual information ok.
410
(Refer Slide Time: 01:15).
Suppose we have two sources of information, 𝑋 and 𝑌. The mutual information of these
sources can be defined as,
= 𝐻(𝑋) − 𝐻(𝑋|𝑌)
Suppose we have communication channel with transmitted signal 𝑋 and received signal 𝑌.
411
(Refer Slide Time: 03:17)
= 𝐻(𝑌) − 𝐻(𝑌|𝑋)
412
(Refer Slide Time: 05:49).
𝐻(𝑋) is the uncertainty in the transmitted signal X and 𝐻(𝑋|𝑌) is the uncertainty
remaining in X after observing Y. thus the expression for 𝐼(𝑋; 𝑌) denotes the uncertainty
in X that is resolved after observing Y.
413
tell X on observing Y.This quantity plays an important role in characterizing the
fundamental limits of a communication channel.
And this therefore, has a very fundamental role to play in communication the mutual
information therefore, plays as we shall see later plays a fundamental role in characterizing
the performance of a, plays a fundamental role in characterizing communication across the
channel.
414
(Refer Slide Time: 13:06).
𝐼(𝑋; 𝑌) = 𝐼(𝑌; 𝑋)
𝐼(𝑋; 𝑌) ≤ 0
415
(Refer Slide Time: 14:07).
416
(Refer Slide Time: 16:53)
Now, if you look at these components individually. If you look at this component this is
the uncertainty remaining this is uncertainty remaining in X. If you look at this shaded
region that is in y, but not in X this is uncertainty remaining in Y on observing X.
So, we can say this is the uncertainty in X resolved on observing Y. And if you look at this
total quantity, now this is the joint entropy information in X and Y this is joint entropy to
get both this together. This is information in X comma Y. This is the joint entropy that is
information in X and Y. And if you observe this figure, now you can see the joint entropy
H X Y is a sum of 3 mutually disjoint components, H this is H X given Y mutual
information between X and Y and H Y given X.
417
(Refer Slide Time: 20:33).
So, the now you can write you can see from the figure H X comma Y H X given Y plus
the mutual information i X Y plus the information H i Y H Y given X. And this you can
see from the figure and this you can verify also for instance, i X given Y this is equal to H
Y minus H Y given X.
418
you look at this H X given Y plus H Y. This is nothing but the joint entropy H X comma
Y.
So, we can derive such useful results using the mutual information. So, this is the property
and you can, and you can look this these are 3 disjoint components H X given Y the mutual
information between X Y, and H Y given X that is the uncertainty in Y on observing X.
And therefore, that brings us end of this module, basically what you have seen in this
module is we have defined a new measure of which, I would rather key or a rather critical
measure to understand the subsequent theory that we are going to develop to characterize
the fundamental rate of information transfer that is fundamental rate at which information
can be transmitted across the channel. This is termed as the mutual information.
And further we have seen a pictorial way to sort of comprehensively represent the various
quantities that we have seen so far the entropies the joint entropy the conditional entropy
and now also the mutual information between 2 sources alright.
So, we will stop here and continue with the other aspects in the subsequent modules.
419
Principles of Communication Systems - Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture – 34
Simple Example of Mutual Information, Practical Example of Mutual Information
- Binary Symmetric Channel
Hello, welcome to another module in this massive open online course. So, we are looking
at mutual information, correct.
420
(Refer Slide Time: 01:29)
3
𝐼(𝑋; 𝑌) = 𝐻(𝑌) − 𝐻(𝑌|𝑋) =
8
421
(Refer Slide Time: 04:00)
13
𝐻(𝑋|𝑌) =
8
422
(Refer Slide Time: 05:58)
13 3
= 𝐼(𝑋; 𝑌) = 2 − =
8 8
423
(Refer Slide Time: 08:02)
Consider another example, of a binary symmetric channel. Binary means the input is a
binary 0 or 1. Symmetric because the probability of both 0 and 1 is same and equal to 𝑝.
(see pict. above)
424
(Refer Slide Time: 11:26)
𝑝 is then the flip probability and 1 − 𝑝 is the probability of correct transmission. Let 𝑠𝑜
and 𝑠1 be the transmission probability of 0 and 1. 𝑟0 and 𝑟1 be the probability of receiving
0 or 1 respectively.
425
Suppose,
Pr(𝑋 = 𝑠0 ) = 𝛼
Then,
Pr(𝑋 = 𝑠1 ) = 1 − 𝛼
426
(Refer Slide Time: 14:46)
Pr(𝑌 = 𝑟0 |𝑠0) = 1 − 𝑝
Pr(𝑌 = 𝑟1 |𝑠1) = 𝑝
1 1
= 𝑝 log 2 + (1 − 𝑝) log 2
𝑝 1−𝑝
𝑃(𝑌 = 𝑟0|𝑠1 ) = 𝑝
𝑃(𝑌 = 𝑟1|𝑠1 ) = 1 − 𝑝
1 1
= 𝑝 log 2 + (1 − 𝑝) log 2
𝑝 1−𝑝
427
(Refer Slide Time: 15:53)
= 𝛼𝐻(𝑝) + (1 − 𝛼)𝐻(𝑝)
= 𝐻(𝑝)
428
(Refer Slide Time: 18:06)
= (1 − 𝑝)𝛼 + 𝑝(1 − 𝛼)
= 𝛼 + 𝑝 − 2𝛼𝑝
429
For receiving the symbol 𝑟1 we can write,
𝑃(𝑌 = 𝑟1 ) = 1 − Pr(𝑌 = 𝑟0 )
= 1 + 2𝛼𝑝 − 𝛼 − 𝑝
430
= 𝐻(𝛼 + 𝑝 − 2𝛼𝑝) − 𝐻(𝑝)
This expression gives us the mutual information of a binary symmetric channel with flip
probability p and source probabilities 𝛼 and 1 − 𝛼.
431
Principles of Communication Systems - Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture - 35
Channel Capacity, Implications of Channel Capacity, Claude E. Shannon- Father
of Information Theory, Example of Capacity of Binary Symmetric Channel
Welcome to another module in this massive open online course, alright. So, we are looking
at the mutual information. We have looked at the properties of the mutual information, the
definition the properties and the relevance of mutual information, alright and let us now
look at one of the most important and landmark results in information theory which relates
to channel capacity and which depends on the mutual information.
432
(Refer Slide Time: 01:28)
433
(Refer Slide Time: 05:43)
𝐶 = max 𝐼(𝑋; 𝑌)
Pr(X)
C gives the capacity of the channel. i.e., the maximum of mutual information (over all
probabilities of input distribution) between input X and Y gives the capacity of the
communication channel. The above result was first published in a paper titled a
mathematical theory of communication, by Shannon.
434
(Refer Slide Time: 10:11)
435
(Refer Slide Time: 13:13)
This is the most fundamental result in information theory. However, this does give a
method of achieving this maximum capacity. However, this result has led to a number of
techniques and error correction codes for maximum capacity in wireline as well as wireless
communication channels.
C gives the maximum rate at which information can transmitted over the channel with
arbitrarily low probabilities.
436
(Refer Slide Time: 17:32)
Let just take an example for channel capacity, let us go back to our Binary Symmetric
Channel and quickly look at what we had yesterday.
437
(Refer Slide Time: 20:03)
For a BSC channel with the mutual information was given by,
Here 𝛼 is the probability of 0 at the source and 𝑝 is the flip probability of the channel.
438
(Refer Slide Time: 22:09)
Maxima of the first term occurs for 𝛼 + 𝑝 − 2𝛼𝑝 = 1/2. Considering symmetry at the
1 1 1
source assume 𝛼 = 1/2. Then, 𝛼 + 𝑝 − 2𝛼𝑝 = + 𝑝 − 2 × × 𝑝 =
2 2 2
439
(Refer Slide Time: 25:55)
The unit of this capacity will be bits per channel use or bits per symbol (which can later
be translated to bits per second).
440
(Refer Slide Time: 29:36)
𝐶 = 1 − 𝐻(𝑝) = 1 − 𝐻(1 − 𝑝)
𝑝 = 0 𝐶 = 1 − 𝐻(0) = 1
𝑝 = 1 𝐶 = 1 − 𝐻(1) = 1
Thus, for both p=0 and p=1, the capacity is maximum and equal to 1. This means with no
flip and 100% flip the capacity is maximized. This is because with 100% flip also we can
know for sure what was the source symbol. The minimum will be for p=0.5
1 1
𝑝= 𝐶 = 1−𝐻( ) = 0
2 2
i.e., with 50% flip probability the capacity is effectively zero.
441
(Refer Slide Time: 30:13)
IN the next set of modules, we will discuss the capacity of more channels, and chiefly the
AWGN channel.
442
Principles of Communication Systems - Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture – 36
Differential Entropy, Example for Uniform Probability Density function
Hello, welcome to another module in this massive open online course. So, we looking at
information theory we have looked at Shannon's landmark result that is the channel coding
theorem alright which is basically gives result for the fundamental rate at which the
information can be transmitted across a channel, alright.
443
(Refer Slide Time: 01:05)
So far we have considered sources with discreet alphabet only. However, a source can be
continuous source as well, with output 𝑥(𝑡) being a continuous signal.
This can take values from a continuous domain instead of a finite-number symbols.
444
(Refer Slide Time: 03:58)
445
(Refer Slide Time: 06:06)
For example, 𝑓𝑋 (𝑥) can be a gaussian function 𝑁(𝜇, 𝜎) with mean 𝜇 and variance 𝜎.
1 −
𝑥−𝜇
𝑓𝑋 (𝑥) = 𝑒 2𝜎2
√2𝜋𝜎 2
1 𝑥
𝑓𝑋 (𝑥) = 𝑒 −𝜆
𝜆
446
(Refer Slide Time: 07:41)
447
(Refer Slide Time: 08:50)
∞
1
ℎ(𝑋) = ∫ 𝑓𝑋 (𝑥) log 2 𝑑𝑥
−∞ 𝑓𝑋 (𝑥)
This is represented using small ℎ in place of 𝐻, which is used for discreet sources.
448
(Refer Slide Time: 11:41).
Suppose the output the output probability is uniformly distributed between (0, 𝑎). Then we
have,
1 1
𝑓𝑋 (𝑥) = {𝑎 𝑓𝑜𝑟 0 ≤ ≤𝑎
𝑎
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
449
(Refer Slide Time: 13:20)
450
(Refer Slide Time: 14:53)
𝑎
1
ℎ(𝑋) = ∫ log 2 𝑎 𝑑𝑥
0 𝑎
𝑎
1
= log 2 𝑎 ∫ 𝑑𝑥
𝑎 0
= log 2 𝑎
451
Not that unlike entropy for discreet systems, the battery can be negative too. For example,
in this case for 𝑎 < 1 the entropy will be negative.
We will stop with this and we look at other aspects in the subsequent modules.
452
Principles of Communication Systems - Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture – 37
Differential Entropy of Gaussian Source, Insights
Hello, welcome to another module in this massive open online course. So, we are looking
at the concept of differential entropy for a source which can take continuous set of values
alright. So, source alphabet which can take a continuous set of values right now let us look
at the differential entropy of a Gaussian source that is a source output which follows the
Gaussian which has the Gaussian probability density function and this is very important
because this is as we are going to see later this is one of the; I mean this is one of the most
commonly occurring such source or one can say that the source output at the output one
of the most commonly occurring probability density functions for the source output and
has a fundamental role to play in the context of information theory as we are going to see
later.
So, we are going to look at the differential entropy differential entropy of a; this is the
differential entropy of a of a Gaussian source. This is one of the most commonly occurring
and practically relevant sources in communication systems.
453
(Refer Slide Time: 01:42)
1 (𝑥−𝜇)2
−
𝑓𝑋 (𝑥) = 𝑒 2𝜎2
√2𝜋𝜎 2
∞
1
ℎ(𝑋) = ∫ 𝑓𝑋 (𝑥) log 2 𝑑𝑥
−∞ 𝑓𝑋 (𝑥)
454
For the gaussian function,
1 1
log 2 = ln log 𝑒
𝑓𝑋 (𝑥) 𝑓𝑋 (𝑥) 2
1 1
= log 2 𝑒 { ln 2𝜋𝜎 2 + 2 (𝑥 − 𝜇)2 }
2 2𝜎
455
(Refer Slide Time: 04:41)
456
(Refer Slide Time: 07:08)
Substituting the above in the expression for differential entropy we will have,
∞ (𝑥 − 𝜇)2
1 2
ℎ(𝑋) = ∫ 𝑓𝑋 (𝑥) log 2 𝑒 { ln 2𝜋𝜎 + } 𝑑𝑥
−∞ 2 2𝜎 2
∞ ∞ (𝑥 − 𝜇)2
1
= log 2 𝑒 ∫ 𝑓𝑋 (𝑥) ln 2𝜋𝜎 2 𝑑𝑥 + log 2 𝑒 ∫ 𝑓𝑋 (𝑥) 𝑑𝑥
−∞ 2 −∞ 2𝜎 2
457
∞
1 1
= log 2 2𝜋𝜎 2 + 2 log 2 𝑒 ∫ 𝑓𝑋 (𝑥)(𝑥 − 𝜇)2 𝑑𝑥
2 2𝜎 −∞
1 1
= log 2 2𝜋𝜎 2 + log 2 𝑒
2 2
1
= log 2 2𝜋𝜎 2 𝑒
2
458
(Refer Slide Time: 12:23)
Hence,
1
ℎ(𝑋) = log 2𝜋𝜎 2 𝑒
2 2
Note that the differential entropy depends only on the standard deviation or the width of
the gaussian distribution, not the mean value. Thus shifting the mean, without changing
the variance will not change the differential entropy of the source.
459
(Refer Slide Time: 14:00).
460
(Refer Slide Time: 17:02)
So, let us stop here and we look at other aspects in the subsequent modules.
461
Principles of Communication Systems - Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture - 38
Joint Conditional/ Differential Entropies, Mutual Information
Hello. Welcome to another module in this massive open online course. So, we are looking
at the differential entropy of a continuous source and in that on the same lines, let us extend
also the concept similar to the entropy. We had the concept of joint entropy and conditional
entropy. So, let us extend the differential entropy to also the other aspects. So, there is the
joint and the conditional entropies, ok.
In this lecture we will define the concepts of joint and conditional differential entropies
(i.e., joint and conditional entropy for continuous sources).
462
(Refer Slide Time: 01:21)
463
(Refer Slide Time: 02:56)
Let us say we have two continuous sources X and Y with marginal pdf’s 𝑓𝑋 (𝑥), 𝑓𝑌 (𝑦).
These sources can be input and output of a communication channel respectively.
464
(Refer Slide Time: 04:59)
∞ ∞
1
ℎ(𝑋, 𝑌) = ∫ ∫ 𝑓𝑋,𝑌 (𝑥, 𝑦) log 2 𝑑𝑥𝑑𝑦
−∞ −∞ 𝑓𝑋,𝑌 (𝑥, 𝑦)
465
∞
ℎ(𝑋|𝑌) = ∫ 𝑓𝑌 (𝑦)ℎ(𝑋|𝑌 = 𝑦)𝑑𝑦
−∞
∞ ∞
1
= ∫ 𝑓𝑌 (𝑦) ∫ 𝑓𝑋|𝑌 (𝑥|𝑌 = 𝑦) log 2 𝑑𝑥 𝑑𝑦
−∞ −∞ 𝑓𝑋|𝑌 (𝑥|𝑌 = 𝑦)
In the above expression we substitute,
𝑓𝑋,𝑌 (𝑥, 𝑦)
𝑓(𝑋|𝑌 = 𝑦) =
𝑓𝑌 (𝑦)
466
(Refer Slide Time: 08:42)
The conditional probability density function can be defined as follows that is F of X given
Y of X given Y. We just write the definition for your convenience F of X given Y X given
Y equals Y equals F of X, Y divided by F of F of X. So, this is the conditional probability
density function and now if you look at, you can write this again as the following thing.
You can basically write this as follows. Of course, there has to be an integral, sorry there
has to be an integral with respect to here the integral with respect to X followed by another
integral with respect to Y.
467
Now, if we look at this F of Y, the probability density function of Y into F of X, Y given
F of X given Y is nothing, but the joint probability density function F of X, Y.
So, therefore, you will have combining these two that is if we use the property that where
we use the property that F of X F of X F of X given Y times, this is equal to the joint
probability density function. So, F of X is the conditional probability density function, F
of X given y times the probability density function equal to the joint probability density
function.
Therefore if you look at this joint therefore, if you look at this conditional entropy, the
definition will boil down to minus infinity to infinity minus infinity to infinity F of X, Y
correct X, Y times log to the base two 2 over F of X, Y. I am sorry F of X, this will still be
F of X given Y X given Y times dx dy, ok.
468
(Refer Slide Time: 13:48)
469
The definitions for joint and mutual information for both the cases is similar to those for
the case of discreet sources.
So, we will stop here and look at other aspects in subsequent modules.
470
Principles of Communication Systems - Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture – 39
Capacity of the Gaussian Channel – Part I
Hello, welcome to another module in this massive open online course. So, in this module
let us look at another fundamental property or fundamental result for the Gaussian channel
which is the capacity characterized the capacity of the Gaussian channel.
Gaussian channels are the most commonly occurring and practically relevant models for
digital communication channels. And capacity is a fundamental characterization of such
channels.
471
(Refer Slide Time: 01:10)
Suppose the channel has input X and output Y. Assuming an AWGN channel with noise
N,
𝑋 =𝑌+𝑁
472
(Refer Slide Time: 03:44)
We want to find the capacity of the AWGN channel. It is the maximum rate of information
transfer at arbitrarily low probability of error. The capacity is given by,
𝐶= max 𝐼(𝑋; 𝑌)
𝑓𝑋 (𝑥):𝐸{𝑋 2 }≤𝑃
The capacity is given by the maximum of the mutual information of X and Y over the pdf
of X. One constraint that has been introduced is the power constraint at input. A
communication system will be powered by a power source like a battery, this source will
limit the power available for the communication channel. i.e., the power source limits the
power consumption by the input communication channel.
473
(Refer Slide Time: 05:06)
474
(Refer Slide Time: 09:37)
= ℎ(𝑌) − ℎ(𝑌|𝑋)
From the above discussion we know, 𝑌 = 𝑋 + 𝑁. The noise 𝑁 is gaussian noise with mean
0 and variance, 𝜎 2 . Given X, 𝑋 + 𝑁 simply shifts the gaussian distribution by X without
changing the variance. Thus, Y|X is a random variable with mean 𝑋 and variance 𝜎 2 .
𝑌|𝑋~𝑁(𝑋, 𝜎 2 )
ℎ(𝑌|𝑋) = ℎ(𝑁(𝑋, 𝜎 2 ))
1
= log 2 2𝜋𝜎 2 𝑒
2
Now, since 𝑌 = 𝑋 + 𝑁
= 𝐸(𝑋) = 0
475
𝐸{𝑌 2 } = 𝐸(𝑋 2 + 𝑁 2 + 2𝑋𝑁)
= 𝐸{𝑋 2 } + 𝜎 2
𝐸{𝑌 2 } ≤ 𝑃 + 𝜎 2 = 𝜎̃ 2
1
= ℎ(𝑌) − log 2 2𝜋𝜎 2 𝑒
2
1
𝐶= max 2 ℎ(𝑌) − log 2 2𝜋𝜎 2 𝑒
𝑓𝑋 (𝑥):𝐸{𝑋 }≤𝑝 2
1
= max2 ℎ(𝑌) − log 2𝜋𝜎 2 𝑒
𝑓𝑌 (𝑦):𝐸{𝑌 ̃2
}≤𝜎 2 2
The second term in the above expression is a constant, thus we only have to maximize the
entropy of 𝑌 over it’s pdf’s bounded by the power constraint 𝐸{𝑌 2 } ≤ 𝜎̃ 2 .
476
(Refer Slide Time: 10:46)
477
(Refer Slide Time: 12:34)
478
(Refer Slide Time: 13:58)
479
(Refer Slide Time: 15:18)
480
(Refer Slide Time: 18:39)
481
(Refer Slide Time: 21:40)
So, we will stop here basically where we are trying to characterize the capacity the
maximum rate at which information can be transmitted across in Gaussian channel that is
the channel with Gaussian noise. So, we will stop here and continue in the subsequent
modules.
482
Principles of Communication Systems - Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture – 40
Capacity of the Gaussian Channel – Part II, Practical Implications, Maximum Rate
in Bits/Sec
We are mid way in calculating the capacity of a gaussian noise channel. In the previous
module we have shown that, the capacity will be given by,
1
( max2 ℎ(𝑦)) − log 2 2𝜋𝜎 2 𝑒
𝑓𝑌 (𝑦):𝐸{𝑌 ̃ 2}
≤𝜎 2
The second term in the above expression is constant, hence, we will focus on maximizing the first
term. How for this we will first define a quantity called Kullback Leibler divergence.
483
(Refer Slide Time: 01:10)
Consider,
∞
𝑓𝑌 (𝑦)
∫ 𝑔𝑌 (𝑦) log 2 ( ) 𝑑𝑦
−∞ 𝑔𝑌 (𝑦)
Where, 𝑔𝑌 (𝑦) and 𝑓𝑌 (𝑦) are two different probability density functions. To the above
quantity, we can apply Jensen’s inequality for the log function.
484
Jensen’s inequality is given by (for a log function),
𝑀−1 𝑀−1
∞ ∞
∫ 𝑝𝑋 (𝑥) log 2 𝑔(𝑥)𝑑𝑥 ≤ log 2 ∫ 𝑝𝑋 (𝑥)𝑔(𝑥)𝑑𝑥
−∞ −∞
485
(Refer Slide Time: 08:59)
∞ ∞
𝑓𝑌 (𝑦) 𝑔𝑌 (𝑦)𝑓𝑌 (𝑦)
∫ 𝑔(𝑦) log 2 ( ) 𝑑𝑦 ≤ log 2 (∫ 𝑑𝑦)
−∞ 𝑔𝑌 (𝑦) −∞ 𝑔𝑌 (𝑦)
∞
= log 2 (∫ 𝑓𝑌 (𝑦)𝑑𝑦) = log 2 1 = 0
∞
486
(Refer Slide Time: 10:52)
Thus we have,
∞
𝑓𝑌 (𝑦)
∫ g Y (y)log 2 ( ) 𝑑𝑦 ≤ 0
−∞ 𝑔𝑌 (𝑦)
∞
𝑔𝑌 (𝑦)
⇒ ∫ g Y (y)log 2 ( ) 𝑑𝑦 ≥ 0
−∞ 𝑓𝑌 (𝑦)
487
The above quantity is defined as Kullback Leibler Divergence represented by 𝐷(𝑔𝑌 ||𝑓𝑌 ).
Let 𝑔𝑌 (𝑦) be any arbitrary pdf of a random variable Y with 𝐸{𝑌 2 } = 𝜎̃ 2 and 𝑓𝑌 (𝑦) be the
gaussian pdf given by,
1 𝑦2
−
𝑓𝑌 (𝑦) = 𝑒 2𝜎̃2
√2𝜋 𝜎̃ 2
488
(Refer Slide Time: 16:03)
∞ ∞ ∞
𝑔𝑌 (𝑦) 1
∫ 𝑔𝑌 (𝑦) log 2 𝑑𝑦 = ∫ g Y (y)log 2 𝑔𝑌 (𝑦)𝑑𝑦 − ∫ 𝑔𝑌 (𝑦) log 2 𝑑𝑦
−∞ 𝑓𝑌 (𝑦) −∞ −∞ 𝑓𝑌 (𝑦)
∞
1
= −ℎ(𝑔𝑌 ) − ∫ 𝑔𝑌 (𝑦) log 2 𝑑𝑦
−∞ 𝑓𝑌 (𝑦)
489
(Refer Slide Time: 18:39)
Now,
1 𝑦2
√ 2
= 2𝜋 𝜎̃ 𝑒 ̃ 2
2𝜎
𝑓𝑌 (𝑦)
1 1 log 2 𝑒 𝑦 2
⇒ log 2 = log 2 2𝜋𝜎̃ 2 +
𝑓𝑌 (𝑦) 2 2 𝜎̃ 2
490
(Refer Slide Time: 20:09)
So, we are focusing on this now the first integral is there focus now on this integral.
∞ ∞
1 1 2
𝑦2
∫ 𝑔𝑌 (𝑦) log 2 𝑑𝑦 = ∫ 𝑔𝑌 (𝑦) { log 2 2𝜋𝜎̃ 𝑒 + log 2 𝑒 2 } 𝑑𝑦
−∞ 𝑓𝑌 (𝑦) −∞ 2 2𝜎̃
1 log 2 𝑒 ∞
= log 2 2𝜋𝜎̃ 2 𝑒 + ∫ 𝑔 (𝑦)𝑦 2 𝑑𝑦
2 2𝜎̃ 2 −∞ 𝑌
491
∞
Here we have used that ∫−∞ 𝑔𝑌 (𝑦) = 1, since 𝑔𝑌 (𝑦) is a pdf.
The second integral is the expectation value of 𝑦 2 for pdf 𝑔𝑌 (𝑦) i.e., 𝐸{𝑦 2 } = 𝜎̃ 2 . The
integral then is,
1 log 2 𝑒
= log 2 2𝜋𝜎̃ 2 𝑒 +
2 2
1
= log 2 2𝜋𝜎̃ 2 𝑒
2
492
(Refer Slide Time: 23:16)
493
(Refer Slide Time: 25:46)
Hence, we have,
1
0≤ log 2𝜋𝜎̃ 2 𝑒 − ℎ(𝑔𝑌 )
2 2
1
⇒ ℎ(𝑔𝑌 ) ≤ log 2𝜋𝜎̃ 2 𝑒
2 2
In the above expression, LHS is the entropy if a gaussian source with zero mean and RHS
is the entropy of a arbitrary source with zero mean.
494
(Refer Slide Time: 28:58)
1
𝐶=( max2 ℎ(𝑦)) − log 2 2𝜋𝜎 2 𝑒
𝑓𝑌 (𝑦):𝐸{𝑌 ̃2
}≤𝜎 2
1
From the above derivation we know that max ℎ(𝑦) = 2 log 2 2𝜋𝜎̃ 2 𝑒
Thus we have,
1 1
𝐶= log 2 2𝜋𝜎̃ 2 𝑒 − log 2 2𝜋𝜎 2 𝑒
2 2
1 𝜎̃ 2
= log 2 2
2 𝜎
1 𝑃 + 𝜎̃ 2
log 2
2 𝜎2
1 𝑃
= log 2 (1 + 2 )
2 𝜎
1
𝐶 = log 2 (1 + 𝑆𝑁𝑅)
2
495
(Refer Slide Time: 30:30)
The final result above is a very fundamental and popular result in information theory and
theory of communication systems.
496
(Refer Slide Time: 32:42)
The above result characterizes a channel in terms of bits per channel use or bits per symbol.
Suppose we a channel with bandwidth 2W (from -W and +W), band limited to W. Then
by Nyquist criteria the signal is characterized by a sampling rate of 2W.
497
(Refer Slide Time: 35:44)
𝑃
𝐶̃ = 𝑊 log 2 (1 + )
𝜎2
= 𝑊 log 2 (1 + 𝑆𝑁𝑅)
498
(Refer Slide Time: 38:22)
This is also knows as the Shannon Hartley law and is a very fundamental and practically
relevant result for communication systems.
So, we will stop here and look at another aspect in the subsequent modules.
499
Principles of Communication Systems - Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture - 41
Introduction to Source Coding and Data Compression, Variable Length Codes,
Unique Decodability
Hello. Welcome to another module in this massive open online course. In this module we
will start another important topic in information theory and communication systems which
is source coding.
Source coding is used for data compression and is thus also useful in storage of large
volumes of multimedia content, along with their transmission.
500
(Refer Slide Time: 03:00)
Suppose we have a source X and it generates symbols 𝑥(0), 𝑥(1), 𝑥(2) … … …. with time.
The symbols belong to a finite alphabet,
501
(Refer Slide Time: 05:33)
A basic assumption that we have made throughout this course till now is that the source is
a discreet memory less source. This means that the probability of a symbol is independent
of the symbol produced before a symbol. i.e.,
502
(Refer Slide Time: 11:37)
Source coding is the process of converting each symbol into a sequence of binary symbols
which are known as codewords. For efficient coding we would like to minimize average
number of bits per symbol. The mapping from symbols to words is known as code.
503
(Refer Slide Time: 15:47)
For example, consider a simple code, for an alphabet of M=4 symbols, {𝑠0 , 𝑠1 , 𝑠2 , 𝑠3 }
𝑠0 → 11
𝑠 → 10
{ 1
𝑠2 → 01
𝑠3 → 00
504
Here each symbol is represented by 2 bits such a code is known as a fixed length code.
Also observe that the code can be mapped to the alphabet one to one. One of the benefits
of such a code is the low complexity of the code, which allows for an easy decoding.
𝑠0 → 0
𝑠 →1
{ 1
𝑠2 → 00
𝑠3 → 11
505
Here, each symbol is represented by a variable number of bits. Such a code is known as a
variable length code. Such a code reduces the average number of bits per symbol. But then
consider the following sequence of bits, (000). This sequence can be decoded as 𝑠0 𝑠0 𝑠0 or
as 𝑠2 𝑠0 . Thus, the coding scheme does not provide a one to one mapping. This will be an
issue for a communication system. i.e., we cannot uniquely decode the symbol from the
received codes.
In the next module we will see how to get a variable length code from which we can uniquely
decode the symbols.
Now, in a variable length code for instance, again let us take an example of M equal to 4.
For instance, we have s0. Let us take an example s0 represented by 0, s1 represented by 1,
s2 represented by 0 0, s 3 represented by 1 1, ok.
Now, the variable length code, obviously you can see different symbols are represented
using different numbers of bits and further observed in this case, there is a problem. It is a
variable length code. For instance, here we have for 0, we have 1 bit. This is 1 bit, 2 bits
and 3 bits. Now, there is a problem for instance, what is the problem?
Let us look at the problem and the problem should be apparent to you. Here if you look at
the transmitted sequence 0, either correspond to s0 followed by s2 or it can correspond to
s0 s0 s0 or it can correspond to s2 followed by s0, ok.
506
So, the same sequence 0 0 0, you can correspond to either s0 s2 s0 s0 s0 or s2 s0. Note this
problem. So, same sequence what is the problem here in this example, the same sequence
can correspond either to s0 s2 s2 s0 or s0 s0. So, these three sequences, given the received
bits sequence 0 0 0, either the stored bit sequence or the bit sequence 0 0 0 at the receiver,
one cannot uniquely decode the transmitted symbol sequence or the transmitted source
coded sequence. It can either correspond to s2 s0 s0 s2 or s0 s0 s0.
So, this code is not uniquely decodable. So, this code, hence code is not uniquely
decodable. So, original sequence what do we mean by the original sequence? What do we
mean by uniquely decodable? Let us now formally define what we mean by, we see a code
is uniquely decodable. If the original symbol sequence is uniquely decodable if the original
if the original source sequence or the original source symbol sequence, original source
sequence can be reconstructed perfectly from the coded sequence. In this case, we have
seen that there exist at least 1 that is 0 0 0 at corresponding to which the symbol sequence,
the corresponding symbol sequence cannot be decoded uniquely. There are multiple
symbol sequence which give raise to the same code sequence, right so many to one
mapping.
So, not one to one mapping and therefore, this code is not uniquely decodable which
creates problems for us because either when the symbols have been received at the receiver
or if the symbols have been stored and you try or they have the bits have been stored and
507
you are trying to reconstruct the original symbol sequence, then it cannot be the symbol
sequence is unknown. It cannot be reconstructed uniquely, all right. So, there is an
information loss that happens in the source coding process and naturally this is not
preferable. So, therefore, what we are going to do? So, what we have seen here is, we have
seen now that does not mean that all variable length codes are not uniquely decodable here.
We have considered a variable for instance, therefore symbols are different numbers of
bits. So, this is a variable length code or what is also termed as VLC. Sometimes now of
course, it is clear that for a fixed length code is always going to be uniquely decodable,
correct as long as many symbols are not mapped to the same fix length codeword, the fix
length code because its nature is uniquely decodable, correct. How are the variable length
code? This is not the case. Now, what are the conditions to be satisfied by a code, variable
length code? For that matter, any code for it to be uniquely decodable and how do we
construct such codes. So, these are some of the aspects that we look at in the subsequent
modules.
508
Principles of Communication Systems - Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture - 42
Uniquely Decodable Codes, Prefix-free Code, Instantaneous Code, Average Code
length
In the previous module we took an example of a variable length code given by,
𝑠0 → 0
𝑠 →1
{ 1
𝑠2 → 00
𝑠3 → 11
509
(Refer Slide Time: 01:40)
We also observed that this code is not uniquely decodable as we can the sequence 000 can
be decoded as 𝑠0 𝑠2 as well as 𝑠2 𝑠0 . This will be a problem for a practical communication
system as the original message will be lost in decoding.
510
(Refer Slide Time: 04:55)
𝑠0 → 1
𝑠1 → 01
{
𝑠2 → 001
𝑠3 → 000
The above code will be uniquely decodable. For e.g., 1001 can only be 𝑠0 𝑠2 . How do we
actually get a uniquely decodable variable length code? Observe in this example that no
code is a prefix to another codeword. Such a code is known as a prefix free code or a prefix
code.
511
(Refer Slide Time: 06:00)
In the previous example from the last module code word for 𝑠0 was 0 which was a prefix
for codeword 00 for 𝑠2 .
512
(Refer Slide Time: 08:43)
Prefix free codes are also known as instantaneous codes as a codeword as each codeword
is decodable by itself, it is not required to wait for the next codeword.
513
(Refer Slide Time: 12:28)
It can be shown that prefix free codes and instantaneous codes are same and are uniquely
decodable.
514
(Refer Slide Time: 16:09)
Let the bit length of 𝑖 𝑡ℎ code be 𝑙𝑖 and the probability of occurrence be 𝑝𝑖 . Then the average
length of a codeword is given by,
𝑀−1
𝐿̅ = ∑ 𝑝𝑖 𝑙𝑖
𝑖=0
515
(Refer Slide Time: 18:29)
We would now like to minimize the average codelength as well as develop fundamental
insights into the process of code design.
So, with this we will stop this module here, and look at other aspects in the sub segment
modules.
516
Principles of Communication Systems - Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture - 43
Binary Tree Representation of code, Example Kraft Inequality
Hello, welcome to another module in this massive open online course. We are looking at
source coding and are trying to define different kinds of codes.
In this module we will look at binary tree representation of a code. This representation is
useful for designing as well decoding of codes.
517
(Refer Slide Time: 02:10)
𝑠0 − 10
𝑠 − 00
{ 1
𝑠2 − 11
𝑠3 − 110
518
The above code can be represented as binary tree. Suppose starting from the root, and we
move above for bit 1 and bellow for bit 2. Then the tree will look as follows (see pict
above).
In the above tree observe that the code for symbol 𝑠2 is an “ancestor” of code for symbol
𝑠3 , i.e., 𝑠2 occurs on the path of 𝑠3 . Also, not the code for 𝑠2 is a prefix to the code for 𝑠3 .
In general we can say that if a codeword lies along the path to another codeword, then the
code is not a prefix free code.
𝑠0 − 1
𝑠 − 01
{ 1
𝑠2 − 001
𝑠3 − 000
519
(Refer Slide Time: 08:44)
Now if we look at the binary tree for the above code, we will observe that (see pict above),
All the symbols lie on the final ends of the tree or the “leaves” of the tree. In this case no
codeword lies on the path to another codeword.
520
(Refer Slide Time: 13:55)
The above binary tree has three levels. Suppose we ‘complete’ the above tree, then we can
see that the maximum number of leaves or the maximum number of symbols that can be
represented using the above code is equal to 8. (𝑙𝑚𝑎𝑥 = 3). In general, is the depth if the
tree is 𝑙𝑚𝑎𝑥 the maximum number of leaves will be 2𝑙𝑚𝑎𝑥 .
The number of nodes eliminated at each leaf can be written as 2𝑙𝑚𝑎𝑥 −𝑙𝑖 where i is the level
(depth) of an actual leaf of the tree.
521
If we look at lengths of each leaf, it can be seen that the depth corresponds to the number
of bits in the codeword,
𝑠0 − 1 𝑙0 = 1
𝑠 − 01 𝑙1 = 2
{ 1
𝑠2 − 001 𝑙2 = 3
𝑠3 − 000 𝑙3 = 3
In a prefix free code each codeword for 𝑠𝑖 corresponds to 2𝑙𝑚𝑎𝑥 −𝑙𝑖 at depth 𝑙𝑚𝑎𝑥 of the
binary tree. From this we observe that
𝑀−1
522
(Refer Slide Time: 23:05)
523
(Refer Slide Time: 24:33)
𝑀−1
𝑀−1
⇒ ∑ 2−𝑙𝑖 ≤ 1
𝑖=0
This result is known as Kraft’s inequality. And can be used to check prefix free ness of a
code.
524
(Refer Slide Time: 26:42)
If a set of lengths 𝑙𝑖 satisfies the kraft’s inequality, then there exists a codeword with
lengths 𝑙𝑖
525
Principles of Communication Systems - Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture - 44
Lower Bound on Average Code Length, Kullback-Leibler Divergence
526
We now want to minimize the average codelength. Consider a symbols set,
𝑋 𝜖𝑆 = {𝑠0 , 𝑠1 , 𝑠2 , 𝑠3 , … . 𝑠𝑀−1 }
Pr(𝑠𝑖 ) = 𝑝𝑖
𝑀−1
𝐿̅ = ∑ 𝑝𝑖 𝑙𝑖
𝑖=0
The lengths 𝑙𝑖 will be bound by craft’s inequality,
𝑀−1
∑ 2−𝑙𝑖 ≤ 1
𝑖=0
527
(Refer Slide Time: 4:20)
We define a quantity
2−𝑙𝑖
𝑞𝑖 = −𝑙𝑖
∑𝑀−1
𝑗=0 2
Since denominator and numerator both are positive thus, 𝑞𝑖 ≥ 0. Further,
𝑀−1
2−𝑙𝑖 ≤ ∑ 2−𝑙𝑖
𝑗=0
Thus,
2−𝑙𝑖
−𝑙𝑖
≤1
∑𝑀−1
𝑗=0 2
⇒ 0 ≤ 𝑞𝑖 ≤ 1
Furthermore,
𝑀−1 −𝑙𝑖
2−𝑙𝑖 ∑𝑀−1
𝑗=0 2
∑ −𝑙𝑗 = −𝑙𝑗 =1
∑𝑀−1
𝑗=0 2 ∑𝑀−1
𝑗=0 2
𝑖=0
𝑀−1
⇒ ∑ 𝑞𝑖 = 1
𝑖=0
Thus, we have that 𝑞𝑖 ′𝑠 can be treated as set of probabilities
528
(Refer Slide Time: 05:43)
529
(Refer Slide Time: 07:59)
530
(Refer Slide Time: 10:07)
𝑀−1
𝑝𝑖
𝐷(𝑝||𝑞) = ∑ 𝑝𝑖 log 2 ≥0
𝑞𝑖
𝑖=0
531
(Refer Slide Time: 13:12)
𝑀−1
𝑝𝑖
∑ 𝑝𝑖 log 2 ≥0
𝑞𝑖
𝑖=0
𝑀−1 𝑀−1
1
⇒ ∑ 𝑝𝑖 log 2 𝑝𝑖 + ∑ 𝑝𝑖 log 2 ≥0
𝑞𝑖
𝑖=0 𝑖=0
−𝑙𝑗
1 ∑𝑀−1
𝑗=0 2
log2 = log2
𝑞𝑖 2−𝑙𝑖
𝑀−1
= 𝑙𝑖 + log 2 ∑ 2−𝑙𝑗
𝑗=0
= 𝑙𝑖 − 𝜖
532
(Refer Slide Time: 15:06)
𝜖 = − log 2 ∑ 2−𝑙𝑗
𝑗
533
(Refer Slide Time: 18:35)
𝑀−1
−𝐻(𝑋) + ∑ 𝑝𝑖 (𝑙𝑖 − 𝜖) ≥ 0
𝑖=0
𝑀−1 𝑀−1
⇒ ∑ 𝑝𝑖 𝑙𝑖 ≥ 𝐻(𝑋) + ∑ 𝑝𝑖 𝜖
𝑖=0 𝑖=0
⇒ 𝐿̅ = 𝐻(𝑋) + 𝜖
Assuming that 𝜖 can take arbitrary low values and as it approaches zero, we will have,
̅ ≥ 𝐻(𝑋)
𝐿
534
Thus, the entropy is a fundamental lower bound of the average codeword length for any
prefix free code.
535
Principles of Communication Systems - Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture - 45
Optimal Code Length, Constrained Optimization, Morse Code Example
Hello, welcome to another module in this massive open online course. So we are looking
at the average code length and we saw a fundamental bound for average codeword length
In this module we will look at how to get optimal code-lengths in order to approach the
lower bound H(x).
536
(Refer Slide Time: 01:54)
We know that any optimal set of lengths will have to satisfy kraft’s inequality,
𝑀−1
∑ 2−𝑙𝑖 ≤ 1
𝑖=0
𝑀−1
𝑙𝑎𝑣𝑔 = ∑ 𝑝𝑖 𝑙𝑖
𝑖=0
537
(Refer Slide Time: 03:39).
𝑀−1
𝑚𝑖𝑛. ∑ 𝑝𝑖 𝑙𝑖
𝑖=0
𝑀−1
𝑠. 𝑡. ∑ 2−𝑙𝑖 ≤ 1
𝑖=0
538
(Refer Slide Time: 05:09).
We can use the lagrange multiplier framework in order to achieve this above. Define
𝑀−1 𝑀−1
𝑓 = ∑ 𝑝𝑖 𝑙𝑖 + 𝜆 ( ∑ 2−𝑙𝑖 − 1)
𝑖=0 𝑖=0
In the above expression 𝜆 is known as the Lagrange multiplier. We differentiate the above
w.r.t 𝑝𝑖 and set them to 0 for each 𝑝𝑖 .
𝜕𝑓
=0
𝜕𝑝𝑖
⇒ 𝑝𝑖 + 𝜆2−𝑙𝑖 (−1) ln 2 = 0
𝜆𝑙𝑛2
⇒ 𝑙𝑖 = log 2
𝑝𝑖
𝑀−1
∑ 2−𝑙𝑖 = 1
𝑖=0
𝑀−1
𝑝𝑖
⇒ ∑ =1
𝜆 ln 2
𝑖=0
539
𝑀−1
1
⇒𝜆= ∑ 𝑝𝑖
ln 2
𝑖=0
1
⇒𝜆=
ln 2
540
(Refer Slide Time: 11:33)
The optimal codeword length which minimizes the average codeword length is given by,
1
𝑙𝑖∗ = log 2
𝑝𝑖
541
(Refer Slide Time: 14:29)
i.e., the codelength is inversely proportional to the probability of occurrence of the symbol.
This leads to an effective and efficient representation.
Consider an example of the Morse-code, which formed the backbone of the telegraph
system. Look at the representation of the following alphabets.
𝑇→−
𝑄 → −−. −
542
𝑍 → −−..
T is a much more frequently occurring alphabet than Q or Z. Thus, it’s length of code is
shorter. It is also intuitive that if a frequently occurring alphabet is shorter in code, the
overall statement will be more compact.
Of course, we have shown that using a constrained optimization frame work all, right?
Using the KKT frame work formulating the lagrangian using the lagrangian multiplier the
Lagrange multiplier solving this, right? Solving the lagrangian to obtained the optimal
543
codeword lengths l i which is given by log to the base 2 1 over p i all right. So, will stop
here and in the subsequent models we look at schemes to design efficient codes in order
to achieve this lower bound all right.
544
Principles of Communication Systems - Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture – 46
Approaching Lower Bound on Average code length, Block Coding
Hello, welcome to another module in this massive open online course. We are looking at
achieving the minimum possible codelengths.
545
In the last module we saw that minimum code length can be achieved (for which the
average codelength is equal to the entropy of the source),
1
𝑙𝑖 = log 2
𝑝𝑖
The length of a code has to be an integer, but the quantity on the right side might not be
an integer. And since this the minimum we can approach, we select for 𝑙𝑖 the greatest
integer function or the ceiling function, i.e.,
1
𝑙𝑖 = ⌈log 2 ⌉
𝑝𝑖
The ceiling function is the closest integer to the input, hence, we can write,
𝑥 ≤ ⌈𝑥⌉ ≤ 𝑥 + 1
So, I can choose l i equals the ceiling function because if you choose the lower integer then
you can violet because you cannot decrease we have a lower bound I cannot decrease it
beyond that is I cannot make it in the lower bound I can always choose an average length
I can always choose a length which is higher.
546
(Refer Slide Time: 04:56)
So, this is remember this is the integer ceiling function, this is the integer ceiling function
correct gives the next highest integer example for examples ceiling of 4.7 is equal to 5
etcetera so on and it is a very function alright, which you take the next highest integer
because it is not an integer you approximated by the next highest integer.
So, therefore, what I have is basically now if you look at this log 2 to the base 1 over P i
which is your l i which your setting your l i, it naturally follows that this is less than or
equal to log two to the base log to the base 2 1 over P i plus 1 and log to the base 2 1 over
547
P i, and that can be easily seen for example, ceiling of 4.7 is 5 and this is less than or equal
to 5 plus 4.7 plus 1 this is less than or equal to 4.7 plus 1 equals 5.7 and this is greater than
or equal to 4 and this is greater than and equal to I am sorry 4.7 this is greater than. So, we
have inequality that x is less than or equal to ceiling of x, less than or equal to x plus 1,
naturally because the ceiling of x is the next highest integer. So, it is greater than equal to
x and the next highest integer lies within 1 within a distance of 1 from the given quantity.
So, it is obvious less than equal to x by ceiling of x is less than equal to x plus 1 ok.
⇒ 𝐻(𝑥) ≤ 𝐿̅ ≤ 𝐻(𝑥) + 1
Hence, we have shown that in a practical scenario we can chose the average code length
within one bit of the entropy of the source.
548
(Refer Slide Time: 08:37)
549
Now let us see how to achieve this code length.
𝑋̅ = 𝑋1 , 𝑋2 , 𝑋3 , 𝑋4 , … … . , 𝑋𝑛
That is a group of n symbols. Suppose we encode this block of symbols (instead of each
symbols). Assume that these are IID symbols from a memoryless discreet source.
Let the average length be 𝐿̅𝑛 .Since the symbols are IID each has the same entropy,
𝐻(𝑋̅) = 𝑛𝐻(𝑋𝑖 )
𝐻(𝑋̅) ≤ ̅̅̅
𝐿𝑛 ≤ 𝐻(𝑋̅) + 1
⇒ 𝐻(𝑋) ≤ ̅̅̅
𝐿𝑛 ≤ 𝑛𝐻(𝑋) + 1
𝐻(𝑥) ̅̅̅
𝐿𝑛 𝐻(𝑥) 1
⇒ ≤ ≤ +
𝑛 𝑛 𝑛 𝑛
̅̅̅
𝐿𝑛 /𝑛 is the average number of bits per symbol for a set of n symbols. Now if we increase
the block length n, we can make the average length of a codeword, arbitrarily closer to the
lower bound (or entropy). Thus encoding larger and larger blocks of symbols, we can
achieve arbitrarily low lengths of code.
550
Principles of Communication Systems - Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture – 47
Huffman Code, Algorithm, Example, Average Code Length
Hello, welcome to another module in this massive open online course. So, we were looking
at various aspects of source coding the fundamental bound. On the lowest possible average
length of the source code for a given source and also we have seen the mechanism by
which we can approach this lower bound, that is by coding symbols across larger and
larger block lengths all right.
So, in this module we look at a practical scheme when algorithm to construct an efficient
source code for a given source which is a very simple and elegant scheme termed as the
Huffman code. So, that is the algorithm we look at in this module.
551
(Refer Slide Time: 01:16).
Huffman code is a popular and simple algorithm, to construct an efficient prefix free code
for a source with given probability distribution for the source alphabet.
552
(Refer Slide Time: 04:12)
𝑋𝜖𝑆 = {𝑠0 , 𝑠1 , 𝑠2 , 𝑠3 , 𝑠4 }
Pr(𝑠0 ) = 0.15
Pr(𝑠1 ) = 0.2
Pr(𝑠2 ) = 0.15
Pr(𝑠3 ) = 0.25
{Pr(𝑠4 ) = 0.25
553
(Refer Slide Time: 06:14)
𝑠3 = 0.25
𝑠4 = 0.25
𝑠1 = 0.2
𝑠2 = 0.15
{𝑠𝑜 = 0.15
554
Now we add the probabilities of the last two (lowest probabilities). One branch assigned
0 and another branch 1.
Then we continue the process with these two combined and the next lowest probability
symbol and continue above to form a Hoffman tree.
In order to get the Huffman code for each symbol, start from the end point (the root), and
reach the symbol node.
555
(Refer Slide Time: 19:58)
𝑠0 − 001
𝑠1 − 11
𝑠2 − 000
𝑠3 − 01
{ 𝑠4 − 10
556
Note that the two lowest probability symbols will have a code of equal length (since they
are combined first).
In the binary tree representation we can see that the code is a prefix free code.
557
𝑀−1
𝐿̅ = ∑ 𝑝𝑖 𝑙𝑖
𝑖=0
558
1 1 1
𝐻(𝑋) = 2 × 0.15 × log 2 + 0.2 × log 2 + 2 × 0.25 × log 2
0.15 0.2 0.25
= 2.2855
The difference between the lower bound and the actual average length is 0.0145
bits/symbol. By coding over large blocks we can go arbitrarily close to the entropy.
559
Principles of Communication Systems - Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture - 48
Introduction to Channel Coding, Rate of Code, Repetition Code, Hamming
Distance
560
Above is a schematic diagram of a digital communication system. In any digital
communication system, there is a probability of error due to a noise, resulting in a bit flip.
Eg, we send the bits, 011001 and the received signal is 011101.
We need to be able to correct these errors to the best possible extent. But how do we
achieve this?
561
(Refer Slide Time: 08:24)
Consider a simple example we need to transmit 011001. Now in the transmission we send
each bit three times, that is we transmit, 000 111 111 000 000 111. Suppose the received
bit stream is 000 011 011 000 000 110. Notice that some bits have been inverted.
Now we reconstruct the original bit sequence using maximum bit rule. That is the bit read
from a group of 3 bits is the bit that has maximum number of occurrences in that group.
Thus the reconstructed message is, 011001. Hence, in this case we are able to receive a
message and reconstruct it correctly even with the presence of noise.
562
(Refer Slide Time: 12:47)
One observation to draw here is that we need redundancy of bits in order to be able to
correct the errors.
563
(Refer Slide Time: 16:17)
This process of introducing systematic redundancy in the code for the purpose of error
correction is known as channel coding. A particular code is termed as a error control code.
In the above scheme for each 3 bits we have 1 info bits which means,
1 #𝑖𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝑏𝑖𝑡𝑠
𝑅𝑎𝑡𝑒 𝑜𝑓 𝑐𝑜𝑑𝑒 = =
3 # 𝑜𝑓 𝑡𝑜𝑡𝑎𝑙 𝑏𝑖𝑡𝑠
564
(Refer Slide Time: 20:20)
As redundancy increases the rate decreases which decreases the bandwidth efficiency.
Moreover, suppose to transit 1, we send 111 but, in the channel, two bits get flipped, and
we get 010. This group will be reconstructed as 0 and we will be unable to get the correct
code.
565
(Refer Slide Time: 25:51)
Thus the maximum number of bit errors in a group that can be handled is 1 for this case.
This is known as the hamming distance (𝑑𝐻 ). The hamming distance is the number of
dissimilar bits in the received message.
566
(Refer Slide Time: 29:10)
What we will see next is there are much better and much more efficient codes.
567
Principles of Communication Systems - Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture - 49
Introduction to Convolutional Code, Binary Field Arithmetic, Linear Codes
Hello welcome to another module in this massive open online course. So, we are looking
at channel coding, and in this module, we start looking at a different class of channel codes
termed as convolutional codes.
568
(Refer Slide Time: 01:31)
Consider a bit sequence 𝑥(0), 𝑥(1), 𝑥(2), 𝑥(3), … … The sequence can be possibly infinite.
569
𝐶0 (𝑗) and 𝐶1 (𝑗) are code bits at time j. Since two code bits are generated at each time
instant the rate is given by
1
𝑅𝑎𝑡𝑒 =
2
The addition of bits above will be according to binary field arithmetic. In binary field
arithmetic the addition operation is similar to an XOR operation and a multiplication is
similar to a AND logic operation.
0+0=0
0+1=1+0=0
1+1=0
570
(Refer Slide Time: 06:53)
The binary field operations described above are commutative, associative and distributive,
i.e., if a, b, c are three bits, then,
𝑎+𝑏 =𝑏+𝑎
𝑐𝑜𝑚𝑚𝑢𝑡𝑎𝑡𝑖𝑣𝑒 − {
𝑎. 𝑏 = 𝑏. 𝑎
𝑎 + (𝑏 + 𝑐) = (𝑎 + 𝑏) + 𝑐
𝑎𝑠𝑠𝑜𝑐𝑖𝑡𝑖𝑣𝑒 {
𝑎. (𝑏. 𝑐) = (𝑎. 𝑏). 𝑐
𝑎. (𝑏 + 𝑐) = 𝑎. 𝑏 + 𝑎. 𝑐
𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑣𝑒 {
(𝑎 + 𝑏). 𝑐 = 𝑎. 𝑐 + 𝑏. 𝑐
571
(Refer Slide Time: 08:04)
Now consider two information bit sequences, 𝑥(0), 𝑥(1), 𝑥(2), … . .. and
𝑦(0), 𝑦(1), 𝑦(2) … ….
𝑦
𝐶0 (𝑗) = 𝑥(𝑗) + 𝑥(𝑗 − 2)
572
(Refer Slide Time: 10:08)
Now,
𝑦
𝐶0𝑥 (𝑗) + 𝐶0 (𝑗) = 𝑥(𝑗) + 𝑥(𝑗 − 2) + 𝑦(𝑗) + 𝑦(𝑗 − 2)
𝑥+𝑦
= 𝐶0 (𝑗)
573
(Refer Slide Time: 11:50)
The coded bit for sum of two bits is same as sum of two coded bits. Thus, the scheme
follows additive law and hence is linear.
574
(Refer Slide Time: 14:47)
575
Principles of Communication Systems - Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture - 50
Example of Convolutional Code Output, Convolution Operation for Code
Generation
Hello. Welcome to another module in this massive open online course. We are looking at
convolutional codes correct which are class of error control codes to recover or correct
errors that occur over the channel in a communication system.
576
(Refer Slide Time: 01:31)
We will consider an example of a convolutional code. Consider the following bit sequence
from 𝑥(0) to 𝑥(9). 1 0 1 1 0 0 1 0 1 1. We take x (-1) =x (-2) =0. The coded bit 𝐶0 will be
given by binary field addition,
577
(Refer Slide Time: 04:04)
578
(Refer Slide Time: 05:52)
𝐶0 = 1001111001
𝐶1 = 1100011100
579
(Refer Slide Time: 08:42)
10 1
𝑟𝑎𝑡𝑒 = =
20 2
Let us now look at why these codes are called convolution codes.
𝐶0 = 𝑥(𝑗) + 𝑥(𝑗 − 2)
= 1. 𝑥(𝑗) + 0. 𝑥(𝑗 − 1) + 1. 𝑥(𝑗 − 2)
580
∞
= ∑ 𝑥(𝑗 − 𝑘)ℎ(𝑘)
−∞
Where h(k) is given by,
1 𝑘 = 0,2
ℎ(𝑘) = {
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
the coded sequence is given by convolution of 𝑥(𝑗) with a filter ℎ(𝑗) whose impulse response is
given as above.
581
𝐶1 (𝑗) = 𝑥(𝑗) + 𝑥(𝑗 − 1) + 𝑥(𝑗 − 2)
∞
The Codes are termed convolution codes since they are generated by convolution operation
of the input bitstream by convolution operation.
582
So, we will stop here and continue in the subsequent modules.
Thank you.
583
Principles of Communication Systems - Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture - 51
Matrix Representation of Convolutional Code, Generator Matrix,
Transform Domain Representation, Shift Register Architecture
Hello. Welcome to another module in this massive open online course. We are looking at
convolutional codes. Let us look at the matrix representation of a convolutional code.
In this module we will look at the concept of a generator matrix. Consider the input bit
sequence, 𝑥(0), 𝑥(1), 𝑥(2), 𝑥(3), … … … . We can represent the sequence using a row
vector, [𝑥(0) 𝑥(1) 𝑥(2) 𝑥(3) 𝑥(4) … … … … ].
584
(Refer Slide Time: 01:19)
Now let us write a matrix, with 𝑥 as it’s column, 𝐶0 𝐶1 are consecutive rows (see pict
above). The cell of a matrix corresponds to the coefficient of the row bit for the column of
the coded bit. For e.g.,
585
(Refer Slide Time: 06:56)
Now observe the following block at the top in 𝐶0 (2) and 𝐶1 (2) columns
11
{0 1
11
If we observe the previous and the next column, we can see that the this block is slowly
shifting downwards as we move forward in time (in the previous columns, the coded bits
are hidden for earlier (negative) times).
586
IF we call the matrix so formed as the generation matrix, we can write
̅=𝑥
𝐶 ̅𝐺
̅
Where 𝐺 is the generation matrix? 𝐶 is the code output and 𝑥̅ is the input vector (bit stream)
587
∞
𝑗
∑ 𝑥(𝑗)𝐷
𝑗=0
(From a D transform we can obtain a z transform by substituting 𝐷 = 𝑧 −1 )
= ∑ 𝑥(𝑗)𝐷 𝑗
𝑗=0
(since x(j)=0 for j<0)
588
(Refer Slide Time: 14:28)
If we substitute 𝑗 − 𝑘 = 𝑗̃ then 𝑗 = 𝑗̃ + 𝑘.
𝑋(𝐷) = ∑ 𝑥(𝑗̃)𝐷 𝑗̃ +𝑘
𝑗̃ =0
∞
= 𝐷 𝑘 ∑ 𝑥(𝑗̃)𝐷 𝑗̃
𝑗̃ =0
⇒ 𝑥(𝑗 − 𝑘) ←→ 𝐷 𝑘 𝑋(𝐷)
589
⇒ 𝑥(𝑗 − 1) → 𝐷𝑋(𝐷)
590
(Refer Slide Time: 19:02)
Convolution codes can be represented as shift register architecture. In this module we look
at a feed forward shift register architecture.
591
(Refer Slide Time: 21:28)
𝑥(𝑗 − 1) can simply be obtained by passing 𝑥(𝑗) through a delay filter and passing 𝑥(𝑗 −
1) through another delay block will give us 𝑥(𝑗 − 2). Then the required outputs of the
delay blocks and the input can be added to get the coded output. (see pict above)
So, we will stop here and look at other aspects in the subsequent modules.
592
Principles of Communication Systems - Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture – 52
State Diagram Representation of Convolutional Code, State Transitions, Example
of Code Generation using State Diagram
Hello, welcome to another module in this massive open online course. We are looking at
convolutional codes and we are look at various forms of representation convolutional
codes such as the generator matrix.
At any time instant we need knowledge of only 𝑥(𝑗 − 1) and 𝑥(𝑗 − 2), that is the previous
two bits (along with the current bit). Knowledge of earlier bits is not required.
593
(Refer Slide Time: 01:51)
The current state of the convolution code can therefore be taken as 𝑥(𝑗 − 1), 𝑥(𝑗 − 2).We
thus have four possible states.
𝑥(𝑗 − 1) 𝑥(𝑗 − 2)
0 0
0 1
594
1 0
1 1
595
(Refer Slide Time: 07:55)
To generate the output and the transition of the encoder, we need knowledge of the state
and the current bit at any given time instant.
596
(Refer Slide Time: 11:05)
The convolution code can be represented by a state transition diagram. The code, at any
time instant, receives the current bit and transitions into the next state. (the next state
depends on the current input bits). The mechanism of state transition can be represented
by a state transition diagram.
597
(Refer Slide Time: 14:27)
Now we will illustrate the state transition diagram of the example code that we have taken.
There will be four states, 00,01,10,11. If the current bit is one, then 00 transitions to 10, if
it is 0, it transitions to itself. Similarly, all the transitions can be sketched. (each will state
will have two arrows going out of it). The output is also represented with a slash (along
the current bit)
598
(Refer Slide Time: 22:52)
The above pict. Shows the complete state transition diagram, with the output.
599
(Refer Slide Time: 24:39)
600
(Refer Slide Time: 28:36)
Now we consider what happens for the input sequence, 1011001011. The coded bits and
the states are shown in bellow table,
0 1 00 11
1 0 10 01
2 1 01 00
3 1 10 10
4 0 11 10
5 0 01 11
6 1 00 11
601
7 0 10 01
8 1 01 00
9 1 10 10
11
602
(Refer Slide Time: 34:21)
𝐶0 : 1 0 0 1 1 1 1 0 0 1
𝐶1 : 1 1 0 0 0 1 1 1 0 0
The output can be verified using individual binary field addition operation
603
(Refer Slide Time: 35:48)
604
Principles of Communication Systems - Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture – 53
Trellis Representation of Convolutional Code and Valid Code Words
605
(Refer Slide Time: 02:27)
For a trellis diagram (see pict above), the following all the states are written in a single
column and these columns are repeated in a line. Each repetition (column) represents a
time instant. Arrows are drawn from one column to the next using the state representation.
These arrows too, will be repeated (since the convolution code is same for all instants).
606
(Refer Slide Time: 13:15)
607
(Refer Slide Time: 17:43)
Note that any code sequence will correspond a path on the trellis.
E.g.: the state transition 00 → 10 → 11. Corresponds to a path, with codeword output 11
10 ….
608
(Refer Slide Time: 19:24)
609
(Refer Slide Time: 22:42)
610
(Refer Slide Time: 27:29)
611
(Refer Slide Time: 30:46)
For the receiver, we need to find the input information bit sequence, this sequence will
correspond to a particular path on the trellis diagram. Thus, the process of decoding of
convolutional codes will involve finding a path on the trellis diagram from the received
code words
612
Principles of Communication Systems - Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture - 54
Decoding of Convolutional Code, Minimum Hamming Distance, Maximum
Likelihood Codeword Estimate
𝑟̅ = 11 11 10 01 10 10 11
= 𝑟̅0 𝑟̅1 𝑟̅2 … … … . 𝑟̅6
Corresponding to codeword sequence 𝑐̅0 𝑐̅1 … … … . 𝑐̅6 .
613
(Refer Slide Time: 02:20)
What we want now to decode the codeword received. However, when the codeword is
received, it might actually not be valid codeword, due to bit flips resulting from channel
noise. In such a scenario, we want to find the most probable codeword that has been
transmitted.
614
(Refer Slide Time: 07:11)
In order to find the most probable code word transmitted, we use the concept of hamming
distance that we used earlier. Hamming distance between two codewords (equal length),
is the number of dissimilar bits between them. E.g., hamming distance between 001001
and 101000 is 2.
𝑑𝐻 = 2
615
(Refer Slide Time: 09:07)
The decoding principle of a maximum likelihood decoder would be finding 𝑐̅ such that 𝑑𝐻
is minimized.
min 𝑑𝐻 (𝑟̅ , 𝑐̅ )
616
(Refer Slide Time: 13:08)
Further,
𝑑𝐻 (𝑐̅ , 𝑟̅ ) = 𝑑(̅̅̅
𝑐0 , ̅̅̅)
𝑟0 + 𝑑(̅̅̅
𝑐1 , ̅̅̅)
𝑟1 + 𝑑(̅̅̅
𝑐2 , ̅̅̅)
𝑟2 + ⋯
𝑁−1
= ∑ 𝑑(𝑐̅,
𝑖 𝑟
̅)
𝑖
𝑖=0
617
(Refer Slide Time: 17:32)
Observe that 𝑐𝑖 will correspond to a branch in the trellis at the 𝑖 𝑡ℎ stage. Therefore optimal
distance codeword involves finding a path through the trellis in which the sum of this brach
metric is minimum.
618
Principles of Communication Systems - Part II
Prof. Aditya K. Jagannatham
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture – 55
Principles of Decoding of Convolutional Code
Hello, welcome to another module in this massive online course. In the previous few
modules, we have discussed state representation of convolution codes and trellis
representation. We also discussed that the decoding of received codes involves finding the
maximum probable codeword that was transmitted. In this module we will discuss two
basic principles for convolutional decoding.
619
(Refer Slide Time: 02:24)
A maximum likelihood decoder involves a minimum hamming distance code word, i.e.,
𝑑𝐻 (𝑟̅ , 𝑐̅) has to be minimized. Where 𝑟̅ is the received codeword and 𝑐̅ is the decoded
codeword. Decoding involves path finding through the trellis diagram i.e., finding the
sequence of states of the encoder.
620
(Refer Slide Time: 04:15)
621
(Refer Slide Time: 08:31)
If there exists an alternate path 𝑒′1 𝑒′2 𝑒′3 … . 𝑒′𝑗−1 which is the minimum path between 𝑠0
and 𝑠𝑗−1, while the minimum path from 𝑠0 to 𝑠𝑗 is 𝑒1 𝑒2 𝑒3 … … 𝑒𝑗 .
𝑗−1
𝑗−1
≥ ∑ 𝑑(𝑒0′ ) + 𝑑(𝑒𝑗 )
𝑖=0
This is a contradiction since 𝑒0 𝑒1 … is the minimum distance path. Thus, there will be no
alternate path 𝑒0′ 𝑒1′ …. Which has a lower 𝑑𝑚𝑖𝑛 .
What happens then is that for a particular node or state, we only need to store the
information regarding the minimum path, not all the possible paths. This is helpful since
the total number of paths will grow exponentially with time.
622
(Refer Slide Time: 09:55)
623
(Refer Slide Time: 13:01)
624
(Refer Slide Time: 16:50)
625
(Refer Slide Time: 19:18)
If the minimum distance between two words 𝑠𝑖 and 𝑠𝑗 and a third node 𝑠𝑗 is taken to be
𝑑𝑖𝑘 and 𝑑𝑗𝑘 the distance to 𝑠𝑖 , 𝑠𝑗 and 𝑠𝑘 is given by 𝑑𝑖 , 𝑑𝑗 respectively. Then the
minimum distance to 𝑠𝑘 is given by,
626
(Refer Slide Time: 25:18)
627
These principles which seem to be abstract here, are highly useful in implementation of
Viterbi algorithm for decoding. That is these are used in order to find the minimum
distance at time 𝑚 given the minimum distance nodes till time 𝑚 − 1.
628
Principles of Communication Systems - Part II
Prof. Adithya K. Jagannathan
Department of Electrical Engineering
Indian Institute of Technology, Kanpur
Lecture - 56
Viterbi Decoder for Maximum Likelihood Decoding of Convolutional
Code Using Trellis Representation, Branch Metric Calculation,
State Metric Calculation, Example
Hello. Welcome to another module in this massive open online course. So, we have looked
at the principles of decoding convolutional code.
𝑟̅ = 11 11 10 01 10 10 11
629
(Refer Slide Time: 02:16)
We draw the trellis diagram corresponding to the above message and convolution.
𝑑𝑚𝑖𝑛 (00) = 0
Then, for j=1 we find the paths with minimum hamming distance. For example, for 00 to
00 path, the output is 00, and the received code is 11 thus the hamming distance for this
path is 2 (hamming distance for each are written above the path). For (00) at j=1, there are
two paths leading to it. The shortest path is from (00) at j=0, (the other has d of infinity).
The shortest path at each node are marked with curly lines. The process is repeated till j=7.
Here we use the two principles we discussed in the previous module. That the minimum
path is the sum of the minimum paths from the previous time instants to the current node.
630
(Refer Slide Time: 07:38)
631
(Refer Slide Time: 11:35)
632
(Refer Slide Time: 26:15)
633
(Refer Slide Time: 33:46)
At the final instant the minimum distance is 𝑑 = 2 for state (00). Thus, now we trace back
our path from this node. The rectangles in the figure show the minimum distance path.
From the end point we trace the minimum distance path at each state, and the
corresponding outputs will be our received bits.
𝑐̅ = 11 10 10 00 10 10 11
634
Comparing to the received codeword,
𝑟̅ = 11 11 10 01 10 10 11
We can see that two bits are mismatched (marked red) . Thus the hamming distance
between the received and the decoded codeword ids 2
𝑑𝐻 (𝑐̅, 𝑟̅ ) = 2
1 1 0 1 1 01 0
635
THIS BOOK IS NOT FOR SALE
NOR COMMERCIAL USE