FULLTEXT02
FULLTEXT02
FULLTEXT02
Implementation of an SDR
in Verilog
Anders Skärpe
Master of Science Thesis in Communication systems
Anders Skärpe
LiTH-ISY-EX–16/5001–SE
iii
Acknowledgments
I would first like to express my gratitude to the people at Syntronic for giving
me the opportunity to make this thesis possible. I would also like to thank my
supervisors at ISY, Håkan Johansson and Kent Palmkvist, for discussing ideas and
providing feedback. I would also like to thank Altera for providing a license for
DSP Builder and Quartus Prime which was very helpful when implementing the
system.
v
Contents
Notation ix
List of Figures 1
List of Tables 3
1 Introduction 5
1.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Theory 9
2.1 Radio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 Software Defined Radio . . . . . . . . . . . . . . . . . . . . 9
2.2 Distortions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.1 Additive White Gaussian Noise . . . . . . . . . . . . . . . . 10
2.2.2 Inter Symbol Interference . . . . . . . . . . . . . . . . . . . 11
2.2.3 Phase and Frequency Offset . . . . . . . . . . . . . . . . . . 11
2.2.4 Time Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.1 Bit Error Rate . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.2 Signal Noise Ratio . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.3 Eye Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4 Receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5 Digital Decimation Filter . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5.1 Polyphase Filter . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5.2 Multistage Filter . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5.3 Square Root Raised Cosine Filter . . . . . . . . . . . . . . . 17
2.5.4 Matched Filter . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.6 Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.6.1 Carrier Synchronization . . . . . . . . . . . . . . . . . . . . 18
2.6.2 Symbol Timing . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.6.3 Packet Detection . . . . . . . . . . . . . . . . . . . . . . . . 24
vii
viii Contents
3 Method 29
3.1 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Simulink Reference Model . . . . . . . . . . . . . . . . . . . . . . . 30
3.3 DSP Builder Reference Model . . . . . . . . . . . . . . . . . . . . . 30
3.4 Transmitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.5 Receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.5.1 Decimation Filter . . . . . . . . . . . . . . . . . . . . . . . . 32
3.5.2 Carrier Synchronization . . . . . . . . . . . . . . . . . . . . 33
3.5.3 Time Synchronization . . . . . . . . . . . . . . . . . . . . . 35
3.5.4 Package Detection . . . . . . . . . . . . . . . . . . . . . . . . 37
3.6 Power Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4 Result 41
4.1 Carrier Synchronization . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2 Timing Synchronization . . . . . . . . . . . . . . . . . . . . . . . . 43
4.3 Package Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.4 Bit Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.5 Clock Rate Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.6 Resource Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5 Discussion 49
5.1 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6 Conclusion 53
6.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6.2.1 Unimplemented work . . . . . . . . . . . . . . . . . . . . . 54
6.2.2 Lower Power Consumption . . . . . . . . . . . . . . . . . . 54
Bibliography 55
Notation
Abbreviations
Abbreviation Meaning
ADC Analog to Digital Converter
AGC Automatic Gain Control
ALUT Adaptive Lookup Table
ASK Amplitude Shift Keying
AWGN Additive White Gaussian Noise
BER Bit Error Rate
dB Desibel
DD Decision Direct
DSP Digital Signal Processing
F-FH Fast Frequency Hopping
FHSS Frequency Hopping Spread Spectrum
FPGA Field-Programmable Gate Array
FSK Frequency Shift Keying
HDL Hardware Description Language
I In-phase
ICI Inter Carrier Interference
IDFT Inverse Discrete Fourier Transform
IFFT Inverse Fast Fourier Transform
ISI Inter Symbol Interference
LSB Least Significant Bit
ML Maximum Likelihood
MSK Minimum Shift Keying
NCO Numerically Controlled Oscillator
NDA Non Data Aided
OFDM Orthogonal Frequency Division Multiplexing
OSI Open System Interconnection
PSK Phase Shift Keying
Q Quadrature
QAM Quadrature Amplitude Modulation
QPSK Quadrature Phase Shift Keying
S-FH Slow Frequency Hopping
SDR Software Defined Radio
SNR Signal to Noise Ratio
SRRC Square Root Raised Cosine
TED Timing Error Detector
List of Figures
4.1 (a) Eye diagram before synchronization. (b) Eye diagram after syn-
chronization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2 Constellation plot comparing (a) before carrier synchronization (b)
after synchronization. . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3 Constellation plot after time synchronization (a) Gardner (b) Linn. 45
4.4 Magnitude comparison. . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.5 Graph over power consumption decrease with bit reduction. . . . 46
4.6 Graph over SNR when reducing the number of bits. . . . . . . . . 47
1
2 LIST OF FIGURES
3
Introduction
1
This chapter starts with outlining the problem for this thesis and its delimitations.
At the end it goes through the structure of the report and what each chapter
contains.
5
6 1 Introduction
How can optimization in software and hardware reduce the energy consumption of a
software defined radio receiver while still maintaining a bit error rate (BER) at 10−6 ?
1.2 Delimitations
Only a part of the SDR will be implemented in this thesis, more precisely the
software part. The software for the SDR will consist of the following systems:
• Oscillator
• Decimation filter
• Synchronization
• Demodulation
The software will receive a signal from an analog-to-digital converter (ADC). The
signal received will have the following specifications:
• Bandwidth is 1MHz
• Bandwidth of the signal that shall be retrieved is 25kHz
• Center frequency is 2MHz
• Modulation with quadrature phase shift keying (QPSK)
Further the sampling frequency of the ADC is 5MHz.
When transmitting the signal the channel will distort the signal with the follow-
ing distortions:
• Additive white gaussian noise (AWGN)
• Phase and frequency shift
• Time delay
The signal can also be distorted by other 25kHz signals sent in the proximity of
the signal that shall be retrieved.
The final restriction is that the system will only be implemented and optimized
for Cyclone V, a field-progammable gate array (FPGA).
• Chapter 3, Method: This chapter describes how the different problems were
solved. It first explains how the transmitter and receiver were set up and
then goes through how the power consumption was reduced.
• Chapter 4, Result: In this chapter the results from the different methods
used to reduce the power consumption are presented.
• Chapter 5, Discussion: This chapter discusses the results gained and the
methodology used to implement the system.
• Chapter 6, Conclusion: This chapter discusses the conclusions that can be
drawn from the results gained. After the discussion future work for the
system is presented.
2
Theory
This chapter presents a theoretical overview of the problem. The chapter starts
with defining what a radio and an SDR are. Then the different distortions that
will affect the signal and the different measurements are presented.
Then the purpose of the different subsystems and different solutions to these sub-
systems are presented. Finally the theory behind OFDM and FHSS is presented.
2.1 Radio
To understand what an SDR is, the first step is to define what a radio is. IEEE and
the wireless innovation forum (formerly SDR forum) defines a radio as one of the
following [4]:
(a) Technology for wirelessly transmitting or receiving electromagnetic radia-
tion to facilitate transfer of information.
(b) System or device incorporating technology as defined in (a).
(c) A general term applied to the use of radio waves.
In this thesis the first definition (a) is the most applicable definition of the radio.
Figure 2.1 displays a simple overview of a transmitter sending information over
a channel to a receiver.
9
10 2 Theory
A radio in which some or all the physical layer functions are software defined.
The physical layer is one of the seven different layers defined by the Open System
Interconnection (OSI) model [6]. Each layer in the model is a small part of a
communication system where a layer provides services to the layer above and
receives services from the layer below it. The physical layer is the lowest of the
seven layers [4].
By this definition many radios are not software defined but rather software con-
trolled. A software controlled radio is a radio that for example has different mod-
ulation blocks and the software can switch between these. The difference between
software controlled and software defined is not entirely black and white, depend-
ing on how much the software can control the hardware the software controlled
radio will eventually be a software defined radio instead [4].
To draw a line between software defined and software controlled it comes down
to that the software controlled functionality is limited by its design, whereas the
software defined may be reprogrammed for new functionality [4].
2.2 Distortions
Distortion of the signal can occur in many ways and in different places when
transmitting and receiving the signal. When preparing to transmit the signal
the signal can be distorted by inter symbol interference (ISI). While transmitting
the signal through a channel the signal can be exposed to different kind of noise
sources, i.e. AWGN. When receiving the signal the clocks at the receiver and
transmitter might be unsynchronized which causes phase and frequency offset,
and time delay.
AWGN can for example simulate thermal noise or radiation noise. Thermal noise
is caused by random movements of electrons in the transmitter and receiver. Ra-
diation noise is caused by the atmospheric and earth blackbody electromagnetic
radiation [7].
In electric engineering the noise is often assumed to have a gaussian (normal)
distribution with zero mean. The noise is assumed to have zero mean because
otherwise the mean could just be subtracted [7].
Figure 2.2: ISI, the black lines are the sample instant.
the transmitter and receiver are asynchronous and the transmission time varies
the receiver has to synchronize its clock to the received signal [7].
2.3 Measurement
To evaluate how the different algorithms compare to each other it is necessary to
measure more parameters than the power consumption. Other than the power
consumption the BER and signal to noise ratio (SNR) will be measured. To inspect
how well the synchronizations are executing, eye diagrams will be used.
where B is the bandwidth and T is the time the signal is transmitted. SNR is then
defined, in decibel (dB), as
E(x[n]2 )
SN R = 10 log10 [dB] (2.2)
E(w[n]2 )
where E(x) is the expected value of x. If x(t) is modeled as a random signal then,
in many cases, its samples x[n] have the same distribution. It is then possible
generate many signals and calculate a mean value by only using one sample from
each signal to calculate the SNR which gives [7]
E[x[1]2 ]
SN R = 10 log10 [dB] (2.3)
E[w[1]2 ]
Figure 2.3: (a) Eye with a large amount of ISI (b) Eye with a small amount of
ISI.
and the sensitivity in sample time depends on how wide the eye is. A narrow eye
means that there is less time to be certain that the correct value is sampled [9].
When receiving a signal it can also be useful to plot a constellation diagram. The
constellation diagram displays the sampled values and without noise and ISI it
should display a number of distinct points. The number of points depends on the
chosen modulation, for QPSK the number of points is four. Figure 2.4(a) displays
a constellation diagram with a large amount of ISI and Fig. 2.4(b) displays a
constellation diagram with a small amount of ISI.
2.4 Receiver
This section will explain the different subsystems that will be implemented in the
receiver. Figure 2.5 displays an overview of the receiver and how these systems
will be connected. The subsystems inside the black border is implemented in this
thesis.
14 2 Theory
Figure 2.4: (a) Constellation with a large amount of ISI (b) Constellation
with a small amount of ISI.
n
hi M ,
if n = 0, ±M, ±2M, ...
(M)
hi (n) = (2.5)
0, otherwise
(M)
Finally h(n) is obtained as the sum of shifted versions of hi (n):
M−1
(M)
X
h(n) = hi (n − i) (2.6)
i=0
To represent the polyphase form for the z-transform first the z-transformation of
the signal h(n) is required:
∞
X
H(z) = h(n)z −n (2.7)
n=0
16 2 Theory
∞ M−1 M−1 ∞
(M) (M)
X X XX
H(z) = hi (n − i)z −n = hi (n − i)z −n =
n=0 i=0 i=0 n=0
M−1 ∞ M−1 M−1
(M) (M)
X X X X
z −i hi (n)z −n = z −i Hi (z) = z −i Hi (z M ) (2.8)
i=0 n=0 i=0 i=0
The polyphase representation can then be used to create a polyphase filter. Each
Hi (z M ) represents a filter and z −i represents the delay before each filter. Figure
2.6 displays how an M-fold decimation filter is converted to a polyphase decima-
tion filter [10].
For example, in reference [11] the complexity of a single stage filter is compared
to different multistage solutions. There a one stage decimation filter of conversion
rate 20 is compared to a two stage decimation filter with conversion rates 5 and
4. The number of multiplications is reduced from 42 multiplications in the one
stage design to 13 multiplications in the two stage design [11].
p
HT (f ) = HR (f ) = HRC (f ) (2.9)
1−α
Ts , 0 ≤ |f | ≤
2Ts
" !#!
T
HRC (f ) = s πTs 1 1−α 1 + α (2.10)
1 − sin |f | − , ≤ |f | ≤
2 α 2Ts 2Ts 2Ts
0, otherwise
1/Ts is the symbol rate and α is the rolloff factor limited to 0 ≤ α ≤ 1. The greater
the value of alpha is the larger excess bandwidth the filter will have [9].
" #
1 1
A signal without excess bandwidth is confined to the frequency band − ,
" 2T
# s 2Ts
1 1
but with the excess bandwidth it becomes confined to − − , + , where
2Ts 2Ts
Ts is the sample interval and is the excess bandwidth. With this extra band-
width it is possible to choose the pulse shape and receiver filter more freely [7].
2.6 Synchronization
The synchronizations required in a radio can vary depending on how the radio
shall be implemented. In the radio implemented in this thesis the following syn-
chronizations are required:
• Carrier synchronization
• Symbol synchronization
• Packet synchronization
" #
fc 2π
s[n] = A[n] cos 2π n + θ + (m − 1) (2.11)
Fs M
2π
where m=1,2,...,M, fc is the carrier frequency, Fs is the sampling rate, and(m −
M
1) is the information bearing component of the signal phase. By using a power of
M carrier recovery the goal is to remove the information bearing!component and
f
estimate the error with the unmodulated carrier cos 2π c n + θ . By raising s[n]
Fs
by a power of M, (s[n])M will be received. This yields
2.6 Synchronization 19
" !#M !
M fc 2π fc
(s[n]) = cos 2π n + θ + (m − 1) = cos 2πM n + Mθ (2.12)
Fs M Fs
The term
2π
(m − 1)M = 2π(m − 1) (2.13)
M
loses all its information and can therefore be excluded [9].
Luise algorithm is a maximum likelihood (ML) estimation presented in [13]. There
exist many other algorithms for estimating the coarse frequency, a few of these
are presented in [14]. They are similar to Luise algorithm in implementation but
requires more computations because the maximum of different summations has
to be found. Therefore Luise algorithm is implemented to find out the perfor-
mance and power consumption of the algorithm.
To estimate the frequency with ML the maximum of the equivalent function has
to be found
N 2 N
N X
−j2π∆f˜iTs ˜
X X
Λ(∆f˜) = yi e = ∗ −j2π∆f Ts (k−m)
yk ym e (2.14)
i=1 k=1 m=1
where
is the observed signal, Ts is the sample time, θ is the phase error, w is gaussian
noise, and N is the number of samples. The problem with finding the maximum
is that it is a time consuming and complex task [13].
The calculation for estimating the error in Luise algorithm is
M
1 X
∆fˆ =
arg R(k) (2.16)
πTs (M + 1)
k=1
N
1 X
∗
R(k) = yi yi−k ,0 ≤ k ≤ N − 1 (2.17)
N −k
i=k+1
A decision direct (DD) variation is presented in [16] and displayed in Fig. 2.9,
2.6 Synchronization 21
this variation will be called DD QPSK Costas loop. DD means that the detector
takes a decision on how the signal should be interpreted at the demodulation.
This variation performs both carrier synchronization and timing synchronization.
This works if the in-phase (I) and quadrate (Q) signals are near their correct val-
ues, then the sign will ignore small deviations caused by noise or not perfect syn-
chronization. The ±1 symbol decision estimates the carrier error and feeds the
estimation through the filter H(z) to a numerically controlled oscillator (NCO)
[16].
By removing the sign from the signal that is sent forward in the system the carrier
synchronization will no longer be DD and then a timing synchronization can
be performed after, this variation will be called QPSK Costas loop. The sign
will only feed its signal forward to the multiplications to produce a carrier error
estimation.
Figure 2.10 displays a multirate timing recovery. The signal is interpolated based
on the value µ. The interpolation calculates an intermediate sample between two
samples and µ affects were this intermediate sample is located. To calculate µ a
timing error detector (TED) is first used to estimate the error. The estimation is
then filtered and passed to an NCO [9].
The calculations for the multirate timing recovery is presented in [17]. The calcu-
lation for the NCO is
The modulo 1 is because η[n] shall be positive and only the fractional part shall
remain. The NCO receives w[n] from the filters which should be nearly constant.
The NCO will on average underflow each 1/w[n] clock cycles. This gives the NCO
a period of Ti = Ts /w[n] which gives
Ts
w[n] ≈ (2.19)
Ti
where Ts is the sample time before the interpolation and Ti is the sample time
after. The value from the NCO is then used to calculate
2.6 Synchronization 23
η[n]
µ[n] = (2.20)
w[n]
Ts
α= ≈ 1/w[n] (2.21)
Ti
Although the exact value of 1/w[n] is unknown, as seen in equation (2.19) it can
be approximated to a know value [17]. With α, µ can be approximated by
To remove any timing jitter that can occur, the underflow of the NCO provides a
timing clock. When the NCO underflows it indicates correct data clocking [17].
With a working interpolation it is the TED that will affect the result gained from
the timing recovery. There are many different TEDs that uses different amounts
of samples to estimate the error. A popular TED is Gardner which is presented
below together with a similar TED.
Gardner
In [18] a TED algorithm is presented that requires double the symbol frequency
to estimate the timing error. The algorithm given is
where
The Gardner TED algorithm is a non-data aided (NDA) detector that can be trans-
formed to a DD detector. NDA means that the detector does not use a pilot signal
to train the detector in the beginning.
Linn
QPSK it will be two tables for the I signal and two tables for the Q signal [19].
The calculation for the error is
The result from the computations eI (n) and eQ (n) is limited to [−0.5, 0.5] and can
therefore easily be stored in lookup tables [19].
In reference [19] the performance of Gardner’s algorithm and Linn’s algorithm is
compared. Linn’s algorithm has a better performance than Gardner’s algorithm
but the power consumption of the two algorithms are not compared. By using
lookup tables Linn’s algorithm removes a lot of the heavy computations and there-
fore it could consume less power than Gardner’s algorithm.
For M-PSK it is possible to calculate the BER for a certain SNR. The calculation
for the BER is [22]:
max(M/4,1) !!
2 (2i − 1)π
X q
BER = Q 2SNR log2 M sin (2.29)
max(log2 M, 2) M
i=1
where Q is defined as
Z∞
1 2 /2
Q(x) = √ e−t dt (2.30)
2π
x
As can be seen in Fig. 2.12 a BER of 10−6 for QPSK gives an SNR of about 11 dB.
Figure 2.13 displays a constellation diagram of the four symbols represented by
QPSK. Through the use of both the I and Q four symbols can be represented.
for the implementation is fixed the static power consumption cannot be reduced
and therefore only the dynamic power consumption will be measured and evalu-
ated.
The first step to reduce the power consumption is to choose algorithms that have
few computations and give a result that complies with the requirements.
After efficient algorithms are chosen it is possible to reduce the number of bits
used when receiving the signal. A lesser number of bits can decrease the SNR
of the signal but it will also reduce the power consumption. By reducing the
number of bits there will be fewer bits changing each clock pulse.
The amount of computations per time unit required by an algorithm can differ,
e.g. the straightforward filter solution compared to the polyphase filter. By choos-
ing algorithms that can use fewer samples it is possible to reduce the clock rate
in certain areas of the FPGA.
As stated before the reason for OFDM is to split a high-rate datastream into a
number of lower rate datastreams. A signal in OFDM consist of a sum of sub-
carriers that are modulated by either PSK or QAM. To generate the subcarriers
used when transmitting the lower rate datastreams the inverse Fourier transform
of N QAM or PSK symbols is used, where N is the number of subcarriers. The
transform can be implemented by the inverse fast Fourier transform (IFFT) in-
stead of the inverse discrete Fourier transform (IDFT) which reduces the number
of multiplications [3].
To minimize the ISI a guard time is used before each OFDM symbol. To calcu-
late the guard time the maximum time spread has to be known, then the guard
time is chosen larger than the expected time spread. The guard time could intro-
duce inter carrier interference (ICI) if the guard time consist of no signal at all.
ICI is a distortion that occurs when the subcarriers are not orthogonal which is
caused by the OFDM symbols not having an integer number of cycles in the IFFT
interval. By cyclic extending the OFDM symbol in the guard time the ICI will be
eliminated [3].
After choosing the guard time larger than the delay spread, the symbol duration
has to be decided. It is desirable to have a large symbol duration because there is
a loss in SNR caused by the guard time. But the symbol duration cannot be too
large because it requires larger implementation complexity and is more sensitive
to phase noise and frequency offset. Therefore the symbol duration is often cho-
sen as five times larger than the guard time, which gives a 1 dB SNR loss because
of the guard time [3].
28 2 Theory
This chapter goes through how the transmitter and receiver were implemented
and which tools were used to implement and test the system.
3.1 Tools
This section will give an overview of the different tools that have been used when
implementing and analyzing the system.
Simulink
Simulink is an add-on to Matlab that can be used to simulate hardware and has
a set of predefined blocks. To Simulink there are a few extensions that contain
more predefined blocks. DSP System Toolbox and DSP Builder are the only ex-
tensions to Simulink used in the thesis.
DSP Builder
Quartus
Quartus is used to analyze and compile the HDL description, and to fit the design
to the FPGA. Quartus is also used to estimate the power consumption of the
design. To get a better power estimation the design has to be simulated with an
appropriate input.
29
30 3 Method
ModelSim
3.4 Transmitter
The base for the transmitter comes from Simulinks model "HDL Optimized QPSK
Transmitter", this model produces bits and modulates them with QPSK. To create
the required transmitter the sample time was changed to produce a 25 kHz signal
and the interpolation filter was changed. After the signal had been interpolated
the real part of the signal was multiplied by a 25 kHz cosine and the imaginary
part was multiplied by a 25 kHz sine. An overview of the transmitter can be seen
in Fig. 3.1.
The signal was then added together and multiplied by a 2 MHz sine, which puts
the signal at 2.025 MHz. The signal was then exposed to AWGN, phase and fre-
3.5 Receiver 31
quency shift, and time delay. As a final distortion a second 25 kHz QPSK ran-
domly generated signal with the same amplitude was added to the transmission
75 kHz away from the first signal. Figure 3.2 displays the frequency spectrum of
the two signals. The frequency spectrum is mirrored and is therefore identical at
-2 MHz.
3.5 Receiver
The receiver also started with a base from Simulink, the model "HDL Optimized
QPSK Receiver With Captured Data". This model contains a decimation filter,
carrier and time synchronization and package detection.
To create the receiver the first step was to create a matched SRRC filter. To create
the filter the signal was sent undistorted through the transmitter and receiver
filter and without being modulated onto the carrier frequency. The filters were
tweaked until there was only a small amount of ISI in the transmitted signal.
Because matched filters were used it was not possible to remove all of the ISI
from the signal.
The Matlab implementation was analyzed to understand the different subsys-
32 3 Method
tems. Some of the systems were only implemented with code and not Simulink
blocks, to understand how these system worked the code was replaced with
Simulink blocks instead. When the whole system was implemented with Simulink
blocks the transmitter was tweaked to add distortions to the signal. Then the sub-
systems at the receiver were changed to newer implementations that might have
a lower power consumption. An overview of the receiver implemented in DSP
Builder and an overview of how the different subsystems are connected is dis-
played in Appendix A.
a filter that did not introduce ISI to the signal. Both the order and rolloff could
probably be decreased, the affect of decreasing the order should be a reduction of
the power consumption. The reduction in power consumption should be close to
linear because it reduces the number of computations the filter calculates.
The signal was interpolated by a factor of 8 at the transmitter. At the receiver
the ADC then samples at 5 MHz which gives 200 samples per symbol. After
the 25 kHz signal has been extracted by multiplying the received signal with a 2
MHz signal the received signal is decimated by a factor of 25. This decimation
is accomplished by throwing away the samples though a decimation filter would
increase the performance of the receiver. Unfortunately there was not enough
time to change this part, and it might not be possible to keep this decimation
with FHSS. The reason that the decimation can not be accomplished with FHSS
is because the frequency carrier is not known and therefore it can not be deter-
mined how many samples that are required. Then a decimation by a factor of 4
was accomplished by the decimation filter at the receiver. This gives one extra
sample per symbol to get a better synchronization and for the chosen time syn-
chronization algorithms it is required to have two samples when synchronizing
the timing error. The signal will be decimated by a factor of 2 when performing
the time synchronization.
1 1 1
+ + = 0, 005882 (3.1)
511 510 509
By passing the value one through the implementation with one division the result
is
3
= 0, 0058593 (3.2)
512
This shows that there is a small difference in the result between the two compu-
tations but it is not a large difference. Although a more detailed study would be
required to be certain how the change affect the synchronization.
The fine carrier synchronization was tested with three different algorithms, Costas
loop, DD QPSK Costas loop and QPSK Costas loop. The different versions of
Costas loop were chosen because Costas loop has a very simple design with few
computations, and the different versions also have very few computations. If the
system meets the requirements with one of these algorithms it would be hard to
decrease the number of computations in the fine carrier synchronization without
removing it.
When the fine carrier synchronization was implemented as Costas loop it did not
give a sufficiently good BER, 0.1769%, early in the simulation.
The DD QPSK Costas loop was implemented without a timing synchronization
after it because the loop does both the carrier synchronization and timing syn-
chronization. It did fairly well when there was a small time delay. When the
time delay increased, the synchronization could not perfectly synchronize and
the BER increased. After 6300 symbols the BER was 0.0006349%. Although more
symbols would be required to ascertain the BER it was possible to notice during
the simulation that the algorithm could not synchronize large time delays. The
BER could be sufficient depending on the application but for this thesis the BER
shall be below 0.0001%.
The DD QPSK Costas loop was then changed to only use the signed signal for the
error estimation and a timing recovery was added after the carrier synchroniza-
tion. This implementation fulfilled the required BER.
The QPSK Costas loop was implemented in two different ways. The first imple-
mentation was according to Fig. 2.9. The other implementation was according
to Fig. 3.3. The difference in these two implementations are the computations
required while the result still remains the same. Table 3.1 shows the difference
in calculations. The table shows the signed I and Q value, the most significant
bit (MSB) of I and Q, what the multiplexer will output, and what the summation
with multiplier will output.
As can be seen in Fig. 3.4 the error estimations from the coarse and fine carrier
synchronization are added together and then forwarded to an NCO. The phase
3.5 Receiver 35
error detector contains the computations in the fine carrier synchronization. The
NCO calculates a sine and cosine wave with a frequency based on the received
value. The waves are then multiplied by the signal to remove any phase and
frequency errors. A more detailed overview of how the coarse and fine frequency
carrier synchronizations are connected is shown in Appendix A.
1
the sample rate is not changed due to the interpolation. With w[n] ≈ , the NCO
α
will underflow on average every second clock cycle. The underflow will be used
to signal that the current sample is valid, therefore the signal will be downsam-
pled by a factor of two.
Two algorithms were tested for the TED, Gardner and Linn. Gardner is a pop-
ular detector because it only requires two samples per symbol and has a good
performance [19]. In [19] Gardner’s and Linn’s performance is compared and
Linn’s has a better performance in general. Therefore it was off interest to know
if Linn’s implementation also had a lower power consumption.
Figure 3.6 displays the implementation of Gardner. This implementation was
very easy to implement and it also met the BER requirements.
When implementing Linn’s algorithm it had to be changed slightly. In DSP Builder
the lookup tables do not support two inputs and therefore a slightly different im-
plementation than the one proposed in [19] was implemented, see Fig. 3.7 for
the implemented TED. In order to find a matching value in the lookup table, the
eight least significant bits (LSB) are used because the lookup table only contains
64 values. Because the signal has been synchronized with coarse and fine car-
rier synchronization the signal does not contain a large amount of error. The
largest amount of error is contained in the LSBs and therefore they are used for
the lookup table.
An overview of the timing synchronization is displayed in Appendix A. To get
a better value at the correlation the sign of the samples were extracted and sent
to the package detection, this means that the sample value only could be 1 or
−1. The reason this gives a better result is because if there are fluctuations in
the sample values, around ±0.1, then the correlation is reduces and there is a
higher chance to miss a package. The downside with extracting the sign is that it
implies a higher chance to have false detection because the correlation increases
for patterns similar to the frame header.
3.5 Receiver 37
q
|V | = C12 + C22 (3.3)
This calculation requires many multiplications and the square root is a complex
and expensive calculation. An algorithm for estimating the magnitude is pre-
sented in [23] called "alpha max plus beta min"
The accuracy of this estimation varies depending on the chosen values for α and β.
The largest value of the C1 and C2 is the max value and the other is the min value
of C1 and C2 [23]. Because the magnitude is only used when comparing with the
threshold it could be possible to use (I)2 + (Q)2 but to compare the estimations of
αMax + βMin the calculation for the magnitude is used. Comparing (I)2 + (Q)2
to αMax + βMin a conclusion can be made that αMax + βMin should consume
less power. This is because the number of bits required to represent the result is
less for αMax + βMin. Depending on the choice of α and β it is also possible to
implement the multiplications with bit shifts in αMax + βMin.
For the first implementation of the αMax+βMin, the values α = 1 and β = 0.4
were used, which will be called Max+0.4Min. This gave a good enough estimation
of the magnitude to decide if a package was detected or not.
The values α = 1 and β = 0.5 were also tested, which will be called Max+0.5Min.
With these values it is possible to use bit shift instead of multiplication which
could require less power.
The values for αMax+βMin were chosen from [23] to give good results. There are
other values as well for both α and β but because these two gave a good result no
other values were required. The reason the αMax+βMin algorithm were chosen
3.6 Power Optimization 39
to calculate the magnitude was because it gives an accurate enough result for
the package detection and reduces the number of calculations required in the
package detection.
By observing the correlation value an appropriate value for the threshold could
be decided. The threshold value was set to 21 and the largest correlation value
is 25. If more knowledge of the signal would be collected, then [20] presents a
calculation for the threshold value based on the requirements on detecting false
packages and missing packages.
After a package has been detected the conjugated value of the filter output is
stored and multiplied by the samples for that package. This is performed to
rotate the samples correctly if the carrier synchronization has locked on to the
wrong phase. To demodulate the signal and get the correct bit the MSB is taken
from each sample. The MSB is the correct demodulated bit because of the chosen
constellation points. An overview of the package detection and demodulation
block can be seen in Appendix A, there it is also possible to see an overview of
the package detection.
ble.
To measure the SNR the modulus of the expected value, |xe [n]|, of the signal was
calculated by sending the signal through the transmitter and receiver filter with-
out any distortions. Then the distorted signal was transmitted and the absolute
value of the signal, |xr [n]| was calculated. With these two values the noise could
be calculated by
w[n] = |xr [n]| − |xe [n]| (3.5)
With the calculated noise it was possible to estimate the SNR by
PN 2
!
0 (xe [n])
SN R = 10 log10 PN [dB] (3.6)
2
0 (w[n])
In this chapter the results from the different evaluations and measurements will
be presented. The chapter starts with comparing the power consumption of the
system with different algorithms implemented and comparing the SNR when ap-
plicable. The power consumption of the whole system is measured, not only a
certain component, except for the oscillator and filter. There was not enough
time to lower the power consumption in these subsystems and therefore it can
be valuable to know how much power these consume. In Table 4.1 the power
consumption of the filter and oscillator with 10 bits is displayed.
The final design has a power consumption of 1.05mW and a BER of 0 after 37440
bits had been retrieved. To be certain that the BER is 10−6 more bits would have
to be received, at least more than 106 bits but it took too long to simulate, the
simulation had been runing over 24 hours for the 37440 bits. With 10 bits the
SNR was around 14 dB which according to (2.29) yields a BER of 1.5 × 10−7 .
Table 4.2 displays the subsystems implemented in the final system, for the most
part the signal was represented using 10 bits.
When comparing the algorithms all parts except one was the same as the final
system, Max+0.4Min was used instead of Max+0.5Min. When comparing the
different algorithms only one part is replaced, i.e. when comparing Gardner and
Linn the rest of the system will stay the same.
41
42 4 Result
Subsystems
Oscillator
Digital decimation filter
Luise algorithm with one division
QPSK Costas loop with multiplexer
Linns TED
Matched filter
Max+0.5Min
The transmitter adds a random signal as a distortion and therefore the received
signal can differ slightly which can affect the estimated power consumption. There-
fore the same signal is used when comparing algorithms in the same subsystem.
It is also important to note that it is only an estimation of the power consumption
and therefore the power consumption could differ when implementing the actual
system.
As stated earlier two implementations of the fine carrier synchronization did not
meet the required BER. Although the DD QPSK Costas loop did not meet the re-
quirements the power consumption of the system was measured because it could
be used if a slightly higher BER was acceptable. Table 4.4 compares the DD QPSK
Costas loop, QPSK Costas loop with multipliers, and QPSK Costas loop with mul-
tiplexer.
The result of the carrier synchronization can be seen in the two eye diagrams in
Fig. 4.1. The first eye diagram is taken before the synchronization and the second
eye diagram is taken after the carrier synchronization. The carrier synchroniza-
tions used in the second eye diagram are the Luise algorithm and QPSK Costas
loop with multiplexer.
The constellation plots before and after the carrier synchronization were also
4.2 Timing Synchronization 43
Figure 4.1: (a) Eye diagram before synchronization. (b) Eye diagram after
synchronization.
Figure 4.3 displays two constellation diagrams after the time synchronization.
44 4 Result
Figure 4.3(a) is after Gardner and Fig. 4.3(b) is after Linn. The constellation
diagrams are after the time synchronization but before the sign is extracted from
the sample.
The αMax+βMin algorithm with two different beta values were compared to the
calculation for the magnitude. Figure 4.4 displays the estimated magnitude of the
three implementations with the magnitude being rounded to the nearest integer.
Because the magnitude is being rounded to the nearest integer the calculations
have a small quantization error.
Table 4.6 compares the power consumption of these three algorithms. Both of
the αMax+βMin implementations use nine bits to represent the value while the
magnitude calculation requires more bits to represent the square value.
4.3 Package Detection 45
Figure 4.3: Constellation plot after time synchronization (a) Gardner (b)
Linn.
The number of bits was reduced until the BER became too high or could not be
lowered any more because of the algorithm. A few algorithms have limitations
on the number of bits required because of divisions, i.e in Luise algorithm the
bits are shifted by nine bits and therefore it would be impractical to reduce the
number of bits below ten.
Figure 4.5 displays a graph comparing the bits used and the power consumption.
When 20 bits are used the whole system uses the same bit length but when the
number of bits is lowered some computations require more bits and can therefore
not be lowered.
Figure 4.5: Graph over power consumption decrease with bit reduction.
It is also important to compare the number of bits with SNR. The SNR is mea-
sured after the timing synchronization but before it is signed. Figure 4.6 displays
a graph with the number of bits and the SNR measured with 500 symbols.
4.5 Clock Rate Reduction 47
Figure 4.6: Graph over SNR when reducing the number of bits.
5.1 Result
The power consumption of the filter and oscillator together is 0.66mW which is
about a third of the power consumption of the whole system when the clock rate
is not reduced, 1.86mW. The power consumption of the oscillator and filter is
not reduced when the clock rate is reduced. Therefore it becomes an even larger
percentage of the power consumption when the clock rate is reduced. The filter
and oscillator consumes 63% of the total power consumption of the system with
a reduced clock after the filter.
When comparing the fine carrier synchronization it was expected that the QPSK
Costas loop with multiplexer would have a lower power consumption compared
to the implementation with the multiplier. The result from the two measure-
ments were 1.91mW for the multiplier and 1.90mW for the multiplexer which
gives a 0.01mW difference in power consumption. But because it is only a power
estimation this could be in the error margin. Therefore both implementations
could be thought of having the same power consumption.
When comparing the TEDs it was interesting that Linn’s implementation with the
lookup tables had a lower power consumption than Gardner even though Linn’s
required a few calculations before the lookup tables.
Comparing the two αMax+βMin algorithms to the magnitude calculation it can
49
50 5 Discussion
5.2 Method
When implementing both the transmitter and receiver a model from Matlab was
used as base. When implementing the transmitter this was the best choice be-
cause the focus for the thesis is the receiver but to test the receiver a transmitter
is required. By using the transmitter from Matlab as a base and changing the
necessary parts it took less time setting up the transmitter compared to imple-
menting the transmitter from scratch.
When implementing the receiver it could have been implemented from scratch
which might have yielded a different design. Using the Matlabs receiver as a
base gave a better and quicker understanding of the system and which subsys-
tems that were necessary. It was also good to have a base when testing new al-
gorithms because the whole system could be simulated and the retrieved signal
could be compared to the transmitted signal. If the receiver was implemented
from scratch this would only have been possible after the design was working
correctly.
Changing the design first in Simulink and then switching over to DSP Builder
might have been an unnecessary step to take. It could have been better to use
DSP Builder directly from the start but because the base for the receiver was in
Simulink it was easier to continue using Simulink until a final design had been
implemented.
The choice to measure the power consumption of the whole system with a fix
setup and then change a certain algorithm makes it a bit harder to estimate how
much the power consumption has been lowered by using different algorithms.
It would have been possible to implement the receiver with what could have been
assumed to be the most power consuming algorithms and then lower the power
consumption from this setup. This would of course give the final design a much
lower power consumption when comparing the final design to the first design.
But to intentionally create a bad design first does not give an accurate result of
what is actually accomplished.
Another way to measure the difference in power consumption between the algo-
rithms would have been to isolate each subsystem and only measure the power of
that subsystem. This would be good if only the algorithms were the focus and not
the whole system. Changing one part of the whole system will probably affect the
following parts of the system and could slightly change the power consumption
52 5 Discussion
6.1 Discussion
Most of the results were as expected although the gain from some changes were
larger than expected. As can be seen when comparing the bit reduction it reduces
the power consumption with 28% which is a large reduction. When comparing
the power consumption before and after the clock reduction it also provides a
large power reduction of 44%. Therefore it seems important to choose algorithms
that allows a lower clock rate to be used and a smaller amount of bits.
When comparing the gain from the different algorithms the power was often re-
duced about 5-15 mW which adds up if multiple power efficient algorithms are
used. Therefore the conclusion can be made that reducing the clock and number
of bits is the first important step to take when reducing the power consumption.
When the clock rate and number of bits cannot be reduced any more then it is
important to compare different algorithms.
As can be seen when comparing the two implementations of Luise algorithm the
power consumption can be reduced by reducing the number of arithmetic opera-
tions. Three divisions are exchanged for one division through bit shifting which
yields a reduction in the power consumption from 2.12 mW to 1.91 mW. A similar
effect can be seen when comparing the different values for αMax+βMin. When
switching out a division to a bit shift the power consumption is reduced from
1.90 mW to 1.84 mW. Through this it can be seen that the power consumption
can be reduced by reducing the number of arithmetic operations and through
53
54 6 Conclusion
choosing appropriate values that can be implemented with bit shifting instead of
multiplication and division.
[1] J. Mitola, “Software radios: Survey, critical evaluation and future direc-
tions,” IEEE Aerospace and Electronic Systems Mag., vol. 8, no. 4, pp. 25–36,
1993.
[2] T. Ulversoy, “Software defined radio: Challenges and opportunities,” IEEE
Commun. Surveys & Tutorials, vol. 12, no. 4, pp. 531–550, 2010.
[3] R. V. Nee and R. Prasad, OFDM For Wireless Multimedia Communications.
Artech House Publishers, 2000.
[4] E. Grayver, Implementing Software Defined Radio. Springer, 2013.
[5] SDRF, “SDRF cognitive radio definitions working document, SDRF –06-R-
001-V 1.0.0,”
[6] H. Zimmermann, “OSI reference model–the ISO model of architecture for
open systems interconnection,” IEEE Trans. on Commun., vol. 28, no. 4,
pp. 425–432, 1980.
[7] E. G. Larsson, Signals, Information and Communications. Liu Press, 2012.
[8] I. A. Glover and P. M. Grant, Digital Communications. Prentice Hall Europe,
1998.
[9] G. J. Miao, Signal Processing in Digital Communications. Artech House,
2007.
[10] L. Wanhammar and H. Johansson, Digital Filter. Linkoping University,
2000.
[11] M. Renfors and T. Saramaki, “Recurise Nth-band digital filters-Part II: De-
sign of multistage decimators and interpolators,” IEEE Trans. on Circuits
and Systems, 1987.
[12] J. G. Proakis and M. Salehi, Digital Communications. McGraw-Hill, 2008.
[13] M. Luise and R. Reggiannini, “Carrier frequency recovery in all-digital
55
56 Bibliography
59
60 A Detailed Descriptions of the systems