Signal Processing For Wireless Communications
Signal Processing For Wireless Communications
Signal Processing For Wireless Communications
Vol. 1. Dietl, G. K. E.
Linear Estimation and Detection in Krylov Subspaces, 2007
ISBN 978-3-540-68478-7
Vol. 2. Dietrich, F. A.
Robust Signal Processing for Wireless Communications, 2008
ISBN 978-3-540-74246-3
Frank A. Dietrich
123
Series Editors:
Rudolf Mathar
RWTH Aachen University
Institute of Theoretical
Information Technology
52056 Aachen, Germany
Author:
Frank A. Dietrich
TU Munich
Associate Institute for Signal
Processing
Arcisstrasse 21
80290 Munich, Germany
E-mail: Dietrich@mytum.de
DOI 10.1007/978-3-540-74249-4
Springer Series in
Foundations in Signal Processing, Communications and Networking ISSN 1863-8538
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting,
reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9,
1965, in its current version, and permission for use must always be obtained from Springer. Violations
are liable for prosecution under the German Copyright Law.
The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply,
even in the absence of a specific statement, that such names are exempt from the relevant protective laws
and regulations and therefore free for general use.
987654321
springer.com
To my family
Preface
vii
viii Preface
dependently from each other since they include the necessary signal models.
Chapter 5 additionally requires the signal model from Sections 4.1 and 4.2
and some of the ideas presented in Section 4.3.
Finally, the author would like to emphasize that the successful realization
of this book project was enabled by many people and a very creative and
excellent environment.
First of all, I deeply thank Prof. Dr.-Ing. Wolfgang Utschick for being my
academic teacher and a good friend: His steady support, numerous intensive
discussions, his encouragement, as well as his dedication to foster fundamen-
tal research on signal processing methodology have enabled and guided the
research leading to this book.
I am indebted to Prof. Dr. Björn Ottersten from Royal Institute of Tech-
nology, Stockholm, for reviewing the manuscript and for his feedback. More-
over, I thank Prof. Dr. Ralf Kötter, Technische Universität München, for his
support.
I would like to express my gratitude to Prof. Dr. techn. Josef A. Nossek for
his guidance in the first phase of this work and for his continuous support.
I thank my colleague Dr.-Ing. Michael Joham who has always been open to
share his ideas and insights and has spent time listening; his fundamental
contributions in the area of precoding have had a significant impact on the
second part of this book.
The results presented in this book were also stimulated by the excellent
working environment and good atmosphere at the Associate Institute for Sig-
nal Processing and the Institute for Signal Processing and Network Theory,
Technische Universität München: I thank all colleagues for the open exchange
of ideas, their support, and friendship. Thanks also to my students for their
inspiring questions and their commitment.
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Robust Signal Processing under Model Uncertainties . . . . . . . . 1
1.2 Overview of Chapters and Selected Wireless Applications . . . . 4
1.3 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
ix
x Contents
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
Chapter 1
Introduction
1
It is defined as the data rate normalized by the utilized frequency band.
1
2 1 Introduction
Fig. 1.1 Conventional approach to deal with model uncertainties: Treat the esti-
mated parameters and the model as if they were true and perfectly known. The
parameters for the signal processing algorithm are optimized under these idealized
conditions.
Fig. 1.2 Sensitivity of different design paradigms to the size of the model uncertainty
or the parameter error.
Fig. 1.4 Robust optimization of signal processing in the physical layer of a wireless
communication system: Interfaces between signal processing tasks are extended by a
suitable model for the parameter errors or uncertainties.
The first clearly shows the enhanced interface compared to Figure 1.1 and
is suitable for treating parameter errors (Figure 1.3(b)). The second version
guarantees a worst-case performance for the uncertainty set C employing a
maxmin or minimax criterion: In a first step, it chooses the least-favorable
model or parameters in C w.r.t. the conventional optimization criterion of
the considered signal processing task. Thus, optimization of the algorithm is
identical to Figure 1.1, but based on the worst-case model. In important prac-
tical cases, this is identical to the problem of designing a maximally robust
algorithm (see Section 2.4).
Finally, we would like to emphasize that the systematic approach to robust
design has a long history and many applications: Since the early 1960s, robust
optimization has been treated systematically in mathematics, engineering,
and other sciences. Important references are [97, 226, 122, 61, 16, 17, 171]
which give a broader overview of the subject than this brief introduction.
Other relevant classes of uncertainties and their applications are treated
there.
2
This serves as an example for adaptive processing of the signal containing the
information bearing data.
6 1 Introduction
Chapters 2 and 3 deal with estimation of the channel state information (CSI).
As signal processing application which deals with the transmission of data,
we consider precoding of the data symbols for the wireless broadcast chan-
nel.3 For simplicity, we treat the case of a single antenna at the receiver and
multiple transmit antennas for a frequency flat channel4 . Because precoding
takes place at the transmitter, the availability of channel state information
is a crucial question.
The last two chapters present robust precoding approaches based on MSE
which can deal with partial channel state information at the transmitter. Its
foundation is the availability of the channel state information from estimators
of the channel realization and correlations which we present in Chapters 2
and 3.
• Chapter 4: Linear Precoding with Partial Channel State Information
For linear precoding, we introduce different optimum and suboptimum
performance measures, the corresponding approximate system models, and
emphasize their relation to the information rate (Section 4.3). We elaborate
on the relation between SINR-type measures to MSE-like expressions. It
includes optimization criteria for systems with a common training channel
instead of user-dedicated training sequences in the forward link.
The central aspect of this new framework is the choice of an appropri-
ate receiver model which controls the trade-off between performance and
complexity.
3
If restricted to be linear, precoding is also called preequalization or beamforming.
4
This is a finite impulse response (FIR) channel model of order zero.
1.2 Overview of Chapters and Selected Wireless Applications 7
Linear precoding is optimized iteratively for the sum MSE alternating be-
tween the receiver models and the transmitter. This intuitive approach
allows for an interesting interpretation. But optimization of sum perfor-
mance measures does not give quality of service (QoS) guarantees to the
receivers or provide fairness. We address these issues briefly in Section 4.6
which relies on the MSE-dualities between the broadcast and virtual mul-
tiple access channel developed in Section 4.5.
• Chapter 5: Nonlinear Precoding with Partial Channel State Information
Nonlinear precoding for the broadcast channel, i.e., Tomlinson-Harashima
precoding or the more general vector precoding, is a rather recent ap-
proach; an introduction is given in Section 5.1. We introduce novel robust
performance measures which incorporate partial as well as statistical chan-
nel state information (Section 5.2). As for linear precoding, the choice of
an appropriate receiver model is the central idea of this new framework.
For statistical channel state information, the solution can be considered
the first approach to nonlinear beamforming for communications.
The potential of Tomlinson-Harashima precoding (THP) with only sta-
tistical channel state information is illustrated in examples. For chan-
nels with rank-one spatial correlation matrix, we show that THP based
on statistical channel state information is not interference-limited: Every
signal-to-interference-plus-noise (SINR) ratio is achievable as for the case
of complete channel state information.
Vector precoding and Tomlinson-Harashima precoding are optimized
based on the sum MSE using the same paradigm as for linear precod-
ing: Iteratively, we alternate between optimizing the receiver models and
the transmitter (Section 5.3).
As a practical issue, we address precoding for the training sequences to
enable the implementation of the optimum receiver, which results from
nonlinear precoding at the transmitter (Section 5.4). The numerical per-
formance evaluation in Section 5.5 shows the potential of a robust design
in many interesting scenarios.
• Appendix :
In Appendix A, we briefly survey the necessary mathematical background.
The completion of covariance matrices with unknown elements and the ex-
tension of band-limited sequences is reviewed in Appendix B. Moreover,
these results are combined to prove a generalization to the band-limited
trigonometric moment problem which is required to interpolate autoco-
variance sequences in Section 3.4.
The robust optimization criteria which we apply for precoding with partial
channel state information are discussed more generally in Appendix C from
the perspective of estimation theory. The appendix concludes with detailed
derivations, the definition of the channel scenarios for the performance
evaluations, and a list of abbreviations.
8 1 Introduction
1.3 Notation
Sets
Matrix Operations
Random Variables
Random vectors and matrices are denoted by lower and upper case sans-
serif bold letters (e.g., a, A, h), whereas their realizations or deterministic
variables are, e.g., a, A, h. To describe their properties we define:
pa (a) probability density function (pdf) of random vector a
evaluated for a
pa|y (a) conditional pdf of random vector a which is
conditioned on the realization y of a random vector y
Ea [f (a)] expectation of f (a) w.r.t. random vector a
Ea|y [f (a)] conditional expectation of f (a) w.r.t. random vector a
Ea [f (a)|y; x] expectation w.r.t. random vector a
for a conditional pdf with parameters x
µa = Ea [a] mean of a random vector a
µa|y = Ea|y [a] conditional mean of random vector a
a ∼ Nc (µa , C a ) random vector with complex Gaussian pdf (Appendix A.1)
Covariance matrices and conditional covariance matrices of a random vector
a are C a = Ea [(a − µa )(a − µa )H ] and C a|y = Ea|y [(a − µa|y )(a − µa|y )H ].
For a random matrix A, we define C A = EA [(A − EA [A])(A − EA [A])H ] and
C AH = EA [(A − EA [A])H (A − EA [A])], which are the sum of the covariance
matrices for its columns and the complex conjugate of its rows, respectively.
Correlation matrices are denoted by Ra = Ea [aa H ] and RA = EA [AAH ].
Other Definitions
Introduction to Chapter
11
12 2 Channel Estimation and Prediction
Fig. 2.1 Different categories of models for the communication channel and references
to channel estimators based on the respective model.
1
Assuming that the channel order increases with the symbol rate.
2
The angles and delays of a wireless channel change slowly compared to the complex
path attenuations (fading) [157, 229, 227] and are referred to as long-term or large-
scale channel properties.
2 Channel Estimation and Prediction 13
[14, 144]. This stochastic model allows for application of Bayesian estima-
tion methods [124, 182] and leads to the conditional mean (CM) estimator
in case of an mean square error (MSE) criterion. For jointly Gaussian
observation and channel, it reduces to the linear minimum mean square
error (LMMSE) estimator. Similar to the subspace approaches, the MMSE
estimator exploits the long-term structure of the channel which is now de-
scribed by the channel covariance matrix.
Fig. 2.2 On an absolute time scale the signals are sampled at t = qTb + nTs , where
q is the block index and n the relative sample index. Within one block with index q
we only consider the relative time scale in n.
Outline of Chapter
3
See also the survey [123] by Kassam and Poor with many references.
2.1 Model for Training Channel 15
Fig. 2.4 Transmission of the nth symbol s[n] ∈ BK in block q over a frequency
selective MIMO channel described by the finite impulse response (FIR) H[q, n] ∈
CM ×K (cf. (2.1) and (2.2)).
time and τ the delay. We describe the signals and channel by their complex
baseband representations (e.g. [173, 129]).
Periodically, every transmitter sends a training sequence of Nt symbols
which is known to the receivers. The period between blocks of training se-
quences is Tb . The receive signal is sampled at the symbol period Ts (Fig-
ure 2.2). The channel is assumed constant within a block of Nt symbols. Given
an impulse response H(t, τ ) with finite support in τ , i.e., H(t, τ ) = 0M ×K
for τ < 0 and τ > LTs , the corresponding discrete time system during the
qth block is the finite impulse response (FIR) of order L
L
X
H[q, i] = H(qTb , iTs ) = H (ℓ) [q]δ[i − ℓ] ∈ CM ×K . (2.1)
ℓ=0
The training sequences are multiplexed in time with a second signal (e.g.
a data channel), which interferes with the first L samples of the received
training sequence as depicted in Figure 2.3. For all transmitters, the sequences
are chosen from the (finite) set B and collected in s[n] ∈ BK for n ∈ {1 −
L, 2− L, . . . , N }, with Nt = N + L. In every block the same training sequence
s[n] is transmitted. Including additive noise, the sampled receive signal y (t)
during block q is (Figure 2.4)
Note that the symbol index n is the relative sample index within one block.
Only samples for 1 ≤ n ≤ N are not distorted due to intersymbol interference
16 2 Channel Estimation and Prediction
from the other multiplexed signal (Figure 2.3). The relation between the
absolute time t ∈ R, the sampling of the signals at t = qTb + nTs , and the
relative sample index within one block is illustrated in Figure 2.2.
The noise is modeled as a stationary, temporally uncorrelated, and zero-
mean complex Gaussian random sequence n[q, n] ∼ Nc (0M , C n,s ). The spa-
tial covariance matrix of the noise is C n,s = E[n[q, n]n[q, n]H ].
where H[q] = [H (0) [q], H (1) [q], . . . , H (L) [q]] ∈ CM ×K(L+1) and
s[1] s[2] · · · s[1 + L] · · · s[N ]
s[0] s[1] s[N − 1]
∈ BK(L+1)×N ,
S̄ = .. . ..
. . . .
s[1 − L] · · · s[1] · · · s[N − L]
which has block Toeplitz structure. Defining y [q] = vec[Y [q]] we get the
standard linear model in Figure 2.5 (cf. Eq. A.15)
T
with system matrix S = S̄ ⊗ I M ∈ CM N ×P . The channel is described by
P = M K(L + 1) parameters h[q] = vec[H[q]] ∈ CP , where hp [q] is the pth
element of h[q].
We model the noise n[q] = vec[N[q]] ∈ CM N as n[q] ∼ Nc (0M N , C n ) with
two different choices of covariance matrices:
1. n[q] is spatially and temporally uncorrelated with C n = cn I M N and vari-
ance cn , or
2.1 Model for Training Channel 17
C n = I N ⊗ C n,s . (2.5)
where C h (ℓ) ∈ SM K
+,0 is the covariance matrix of the ℓth channel delay
h [q] = vec[H [q]] ∈ CM K .
(ℓ) (ℓ)
with S T = I Q ⊗ S.
In addition to the covariance matrix C h in space and delay domain, we
have to define the channel parameters’ temporal correlations. In general,
the temporal correlations (autocovariance sequences) are different for ev-
ery element in h[q], and C h [i] = E[h[q]h[q − i]H ] does not have a special
structure. Because h[q] is a stationary random sequence (equidistant sam-
ples of the continuous-time channel), the auto- and crosscovariance matrices
C hT,p hT,p′ = E[hT,p [q]hT,p′ [q]H ] of hT,p [q] = [hp [q − 1], hp [q − 2], . . . , hp [q −
Q]]T ∈ CQ are Toeplitz with first row [chp hp′ [0], chp hp′ [1], . . . , chp hp′ [Q − 1]]
and chp hp′ [i] = E[hp [q]hp′ [q − i]∗ ] for p, p′ ∈ {1, 2, . . . , P }. Therefore, C hT is
block-Toeplitz and can be written as
P X
X P
C hT = C hT,p hT,p′ ⊗ ep eT
p′ . (2.10)
p=1 p′ =1
4
This definition is in analogy to the notion of separable transmitter and receiver
correlations for MIMO channel models resulting in the “Kronecker model” for the
spatial covariance matrix (see e.g. [231]) and is exploited in [192, 30].
5
Separability according to this definition is a valid model for a physical propagation
channel if the following conditions hold: The Doppler frequencies do not depend on
the angles of arrival at the receiver or angles of departure at the transmitter, which
determine the spatial correlations; and they can be modeled as statistically indepen-
2.2 Channel Estimation 19
K
X
C hT = C T,k ⊗ ek eT
k ⊗ C hk , (2.11)
k=1
where C T,k is Toeplitz with first row [ck [0], ck [1], . . . , ck [Q − 1]]. For ex-
ample, this describes the situation of K mobile terminals with generally
different velocity and geographical location and M receivers located in the
same (time-invariant) environment and on the same object.
2. If all elements of h[q] have an identical autocovariance sequence c[i] and
the correlations are separable, we can write E[h[q]h[q − i]H ] = c[i]C h . This
defines c[i] implicitly, which we normalize to c[0] = 1. These assumptions
result in
C hT = C T ⊗ C h , (2.12)
where C T is Toeplitz with first row equal to [c[0], c[1], . . . , c[Q − 1]].
Separability of spatial and temporal correlations is not possible in general,
e.g., when the mobile terminals have multiple antennas and are moving. In
this case the Doppler spectrum as well as the spatial correlation are deter-
mined by the same angles of departure/arrival at the mobile terminal.
y = Sh + n ∈ CM N (2.13)
ĥ = W y ∈ CP . (2.14)
is often called the minimum MSE (MMSE) estimator or the conditional mean
(CM) estimator. Thus, in general f is a nonlinear function of y. The MMSE
Ey tr Rh|y − kµh|y k22 = tr Ey C h|y (2.19)
So far in this section we have considered a general pdf ph,y (h, y); now we
assume h and y to be zero-mean and jointly complex Gaussian distributed.
For this assumption, the conditional mean is a linear function of y and reads
(cf. (A.2))
W MMSE = C hy C −1
y
The first equality follows from the system model (2.13) and the second from
the matrix inversion lemma (A.19) [124, p. 533]. The left side of this system
of linear equations, i.e., the matrix to be “inverted”, is smaller in the third
line in (2.22). If N > K(L + 1) and C −1 n is given or C n = I N ⊗ C n,s , the
order of complexity to solve it is also reduced [216].
For the Gaussian assumption, the mean error correlation matrix in (2.20)
is equivalent to the conditional error covariance matrix, as the latter is inde-
pendent of y. It is the Schur complement of C y in the covariance matrix of
[h T , y T ]T (see A.17)
H C h C hy
C h|y = C h − W MMSE C hy = tr KC y . (2.23)
CHhy C y
For every linear estimator W , the MSE can be decomposed in the mean
squared norm of the estimator’s bias h − W Sh and the variance of the
estimator:
7
The channel covariance matrix C h is determined by the angles of arrival and depar-
ture, delays, and mean path loss. These are long-term or large-scale properties of the
channel and change slowly compared to the realization h. Thus, in principle it is not
necessary to estimate these parameters directly as in [224] to exploit the channel’s
structure for an improved channel estimate ĥ.
22 2 Channel Estimation and Prediction
Eh,y kh − W y k22 = Eh,n kh − W (Sh + n)k22 (2.24)
= Eh kh − W Shk22 + En kW nk22 . (2.25)
| {z } | {z }
mean squared norm of Bias Variance
The MMSE estimator finds the best trade-off among both contributions be-
cause it exploits the knowledge about the channel and noise covariance matrix
C h and C n , respectively.
Due to its linearity, W MMSE is called linear MMSE (LMMSE) estimator
and could also be derived from (2.16) restricting f to the class of linear
functions as in (2.24). In Section 2.4 we discuss the robust properties of the
LMMSE estimator for general (non-Gaussian) pdf ph,y (h, y).
The estimate ĥMMSE is also the solution of the regularized weighted least-
squares problem
y ∼ Nc (Sh, C n ). (2.27)
ĥML = W ML y, W ML = S †
1 H (2.31)
Ĉ n,s = Y − Ĥ ML S̄ Y − Ĥ ML S̄
N
with Ĥ ML = unvec ĥML ∈ CM ×K(L+1) . The estimates exist only if S has
full column rank, i.e., necessarily M N ≥ P . If C n = cn I M N , we have ĉn =
1 2 H H −1
M N ky − S ĥML k2 . Introducing the orthogonal projector P = S̄ (S̄ S̄ ) S̄
⊥
on the row space of S̄ and the projector P = I N −P on the space orthogonal
to the row space of S̄, we can write [129]
1
Ĉ n,s = Y P ⊥Y H (2.32)
N
1
ĉn = kY P ⊥ k2F . (2.33)
MN
The space orthogonal to the row space of S̄ contains only noise (“noise sub-
space”), because Y P ⊥ = N P ⊥ (cf. Eq. 2.3).
In contrast to (2.21), ĥML in (2.31) is an unbiased estimate of h, whereas
the bias of Ĉ n,s and ĉn is
N − K(L + 1)
En Ĉ n,s − C n,s = − 1 C n,s (2.34)
N
N − K(L + 1)
En [ĉn ] − cn = − 1 cn . (2.35)
N
It results from the scaling of the estimates in (2.32) and (2.33) by N and
M N , respectively. On average, Ĉ n,s and ĉn underestimate the true noise
covariance matrix, because E[Ĉ n,s ] C n,s . Note that the rank of P ⊥ is
N − K(L + 1). Thus, for unbiased estimates, a scaling with N − K(L + 1) and
M [N − K(L + 1)] instead of N and M N , respectively, would be sufficient.
These unbiased estimators are derived in Chapter 3.
For the general class Cn = {C n |C n ∈ SM N
+,0 } and if we assume C n to be
known, the ML problem (2.28) is equivalent to the weighted least-squares
problem [124]
min ky − ShkC −1
n
, (2.36)
h
which is solved by
−1
ĥML = W ML y, W ML = S H C −1
n S S H C −1
n . (2.37)
24 2 Channel Estimation and Prediction
Note that S H C −1
n y is a sufficient statistic for h. Applying Tikhonov regu-
larization [154, 24, 212] to (2.36), we obtain (2.26) and the LMMSE estima-
tor (2.21).
For (2.31) and (2.37), the covariance matrix of the zero-mean estimation
error ε = ĥ − h is
−1
C ε = S H C −1
n S (2.38)
−1/2 −1/2
with the eigenvalue decomposition of C ε C h C ε = V ΣV H and [Σ]i,i ≥
[Σ]i+1,i+1 . Based on this decomposition of W MMSE , the MMSE estimator
can be interpreted as follows: The first stage is an ML channel estimator fol-
lowed by a whitening of the estimation error. In the next stage, the basis in
CP is changed and the components are weighted by the diagonal elements of
−1
(Σ + I P ) Σ, which describe the reliability of the estimate in every dimen-
sion. The ith diagonal element of Σ may be interpreted as the signal to noise
1/2
ratio in the one-dimensional subspace spanned by the ith column of C ε V .
is of reduced rank R ≤ P .
8
The eigenvalues are assumed to be sorted in decreasing magnitude.
2.2 Channel Estimation 25
1 H
WC = N cs S , (2.43)
for estimating a scalar signal [108, p.131]. Thus, the MF assumes a stochastic
model for h with correlation matrix9 Rh .
The optimization problem reads
h i2
Eh,n ĥ H h
max (2.44)
W En kW nk22
and is solved by
W MF = αRh S H C −1
n , α ∈ C. (2.45)
W MF α=1,C n =C ′n
= ′lim c′n W MMSE . (2.46)
cn →∞
9
Although it is equal to the covariance matrix for our assumption of zero-mean h ,
in general the correlation matrix has to be used here.
2.2 Channel Estimation 27
Eh,n kεk22 = Eh,n kh − W (Sh + n)k22
= Eh k(I P − W S)hk22 + En kW nk22
= tr (I P − W S)C h (I P − W S)H + cn kW k2F , (2.47)
| {z } | {z }
mean squared norm of bias variance
where the first term is the mean of the squared norm of the bias and the sec-
ond the variance of the estimate ĥ = W y with C n = cn I M N . The estimators
presented in the previous sections aim at a reduced variance compared to the
ML estimator, at the price of introducing and increasing the bias.
Assuming orthogonal training sequences with S H S = N cs I P , the basic be-
havior of the estimators w.r.t. to this bias-variance trade-off can be explained.
With this assumption and the EVD C h = U ΛU H , Λ = diag [λ1 , . . . , λP ],
the estimators read
1 H
W ML = W C = S
N cs
−1
cn
W MMSE = U Λ+ IP Λ U H W ML
N cs
| {z }
Optimal weighting of subspaces
1
W MF = U ΛU H W ML
λ1
W RML = U F RML U H W ML .
10
In the figures the MSE Eh,n [kεk22 ] is normalized by tr[C h ] to achieve the same
maximum normalized MSE equal to one, for all scenarios.
28 2 Channel Estimation and Prediction
MSE, i.e., minβ Eh,n kh − βW y k22 . Thus, we focus on the estimators’ capa-
bility to estimate the channel vector h up to its norm.11
First, we consider a scenario with M = K = 8, L = 0 (frequency-flat), and
N = 16 in Figure 2.6. The binary training sequences satisfy S H S = N cs I P .
The covariance matrix C h is given by the model in Appendix E with mean
angles of arrival ϕ̄ = [−45◦ , −32.1◦ , −19.3◦ , −6.4◦ , 6.4◦ , 19.3◦ , 32.1◦ , 45◦ ] and
Laplace angular power spectrum with spread σ = 5◦ .
Due to a small N , the ML estimator requires an SNR = cs /cn at a relative
MSE of 10−1 which is about 5 dB larger than for the MMSE estimator. With
the RML estimator, this gap can be closed over a wide range of SNR reducing
the rank from R = P = 64 for the ML estimator to R = 40. For reference, we
also give the RML performance with the optimum rank w.r.t. MSE obtained
from a brute-force optimization. The MF estimator is only superior to the
ML estimator at very low SNR, as its bias is substantial.
In Figures 2.7 and 2.8, performance in a frequency selective channel with
L = 4, M = 8, K = 1, and N = 16 is investigated. The covariance matrix
C h is given by the model in Appendix E with mean angles of arrival ϕ̄ =
[−45◦ , −22.5◦ , 0◦ , 22.5◦ , 45◦ ] per delay and Laplace angular power spectrum
with spread σ = 5◦ .
In this channel scenario the MF estimator reduces the MSE compared
to the correlator (with comparable numerical complexity) for an interesting
SNR region (Figures 2.7). Asymptotically, for high SNR it is outperformed
by the correlator. The RML estimator achieves a performance close to the
MMSE approach if the rank is chosen optimally for every SNR (Figure 2.8).
For full rank (R = P = 40), the RML is equivalent to the ML estimator. For
a fixed rank of R = 5 and R = 10, the MSE saturates at high SNR due to the
11
If the application using the channel estimates is sensitive to a scaling of ĥ, esti-
mators are proposed in the literature (e.g. [15]) which also introduce a proper scaling
for ML estimators and do not rely on the knowledge of C h .
2.2 Channel Estimation 29
MMSE M 3 K 3 (L + 1)3
Red.-Rank ML (RML) M 3 K 3 (L + 1)3
Table 2.2 Order of computational complexity of channel estimators assuming C n =
cn I M N .
bias. At low SNR a smaller rank should be chosen than at high SNR. Thus,
an adaptation of the rank R to the SNR is required. The MMSE estimator
performs the necessary bias-variance trade-off based on C h and C n .
The correlator and MF estimator are simple estimators with a low and
comparable numerical complexity. As discussed in [56], only a suboptimum
estimate of C h is required for the MF. Regarding the possibilities to reduce
the complexity of the RML and MMSE estimators with efficient implemen-
tations and approximations, it is difficult to draw a final conclusion for their
complexity. Roughly, the complexity of the MMSE and RML estimators is
also similar [56], but the RML approach requires an additional EVD of C h .
The ML estimator is considerably less complex. The order of required float-
ing point operations is given in Table 2.2 for C n = cn I M N .12 For the MF,
RML, and MMSE estimators, the covariance matrix C h has to be estimated
(Section 3). This increases their complexity compared to the ML estimator.
A low-complexity tracking algorithm for the MMSE estimator is introduced
12
This assumption influences only the order of the matched filter.
30 2 Channel Estimation and Prediction
They are observed indirectly via known training sequences which determine
S T . This observation is disturbed by additive noise nT [q]. To estimate h[q]
based on y T [q], we first assume that the joint pdf ph,yT (h[q], y T [q]) is given.
Given the stochastic model ph,yT (h[q], y T [q]), Bayesian parameter estimation
gives the framework for designing optimum predictors. As in Section 2.2.1,
we choose the MSE as average measure of the estimation quality. The main
ideas are reviewed in Section 2.2.1 and can be applied directly to the prob-
lem of channel prediction. First we treat the general case of predicting the
channel vector h[q] ∈ CP before focusing on the prediction of a single chan-
nel parameter in h[q]. The restriction to the prediction of a scalar allows
us to include standard results about prediction of parameters or time series
from Wiener and Kolmogorov (see references in [182, chapter 10]) such as
asymptotic performance results for Q → ∞.13
For the model (2.48), the estimator W ∈ CP ×QM N , which can also be inter-
preted as a MIMO FIR filter of length Q, reads [124]
W = C hyT C −1
yT
−1
= C hhT S H H
T S T C hT S T + C nT (2.51)
13
The general case of vector or multivariate prediction is treated in [234, 235]. The
insights which can be obtained for the significantly simpler scalar or univariate case
are sufficient in the context considered here.
32 2 Channel Estimation and Prediction
The advantage of this decomposition is twofold: On the one hand, it allows for
a less complex implementation of W exploiting the structure of the related
system of linear equations (2.51). On the other hand, we can obtain further
insights separating the tasks of channel estimation and prediction; we focus
on the prediction stage W P of the estimator W in the sequel. Denoting the
ML channel estimates by h̃[q] = W ML y[q] and collecting them in h̃T [q] =
[h̃[q − 1]T , h̃[q − 2]T , . . . , h̃[q − Q]T ]T ∈ CQP we can rewrite (2.48)
i.e., the prediction stage is decoupled for every parameter hp [q], and we can
write
ĥp [q] = wT
p h̃T,p [q] (2.56)
Fig. 2.9 Linear prediction of hp [q] one step ahead based on Q past noisy
observations h̃p [q] and the FIR filter wp [q] with non-zero coefficients wp =
[wp [1], wp [2], . . . , wp [Q]]T : ĥp [q] = Q
P
ℓ=1 wp [ℓ]h̃p [q − ℓ].
The prediction problem can also be treated separately for every parameter
hp [q], i.e., it is understood as a univariate prediction problem. This is equiv-
alent to restricting the structure of W in optimization problem (2.50) to
W = W P (I Q ⊗ W ML ) with
P
X
WP = wT T
p ⊗ ep ep . (2.57)
p=1
Of course, the MMSE achieved by this solution is larger than for the general
multivariate prediction (2.50) because the correlations C h and C ε (2.38) are
not exploited. The MMSEs are equal for the assumptions made in Exam-
ple 2.1 at the end of the previous subsection. The advantage of this solution
is its simplicity and, consequently, reduced complexity.
The estimate of hp [q] is ĥp [q] = wT p h̃T,p [q] (2.56). It is based on the esti-
mated channel parameters in Q previous blocks h̃T,p [q] = [h̃p [q − 1], h̃p [q −
2], . . . , h̃p [q − Q]]T , which we model as
with ep [q] = eT
p W ML n[q], ep [q] ∼ Nc (0, cep ), and
cn
cep = (S H C −1
n S)
−1
p,p
≥ , (2.60)
N cs
where the inequality holds for C n = cn I M N . The last inequality gives a lower
bound, which is achieved for S H S = N cs I P .
34 2 Channel Estimation and Prediction
The terms in the cost function (2.58) are decoupled for every parameter
and the solution is
−1
wT
p = chp hT,p C hT,p + cep I Q . (2.61)
with estimation error εQ,p [q] = ĥp [q] − hp [q] (Figure 2.9).
Considering prediction based on observations of all past samples of the
sequence h̃p [q], i.e., for a predictor length Q → ∞, we arrive at the classi-
cal univariate prediction theory of Wiener and Kolmogorov [182]. For this
asymptotic case, the MMSE of the predictor can be expressed in terms of the
power spectral density
∞
X
Shp (ω) = chp [ℓ] exp(−jωℓ) (2.63)
ℓ=−∞
of hp [q] with autocovariance sequence chp [ℓ] = E[hp [q]hp [q − ℓ]∗ ] and
reads [196]
Z π
1 Sh (ω)
E |ε∞,p [q]|2 = cep exp ln 1 + p dω − 1
2π −π cep (2.64)
2
≤ E |εQ,p [q]| .
An interpretation for this relation is given by Scharf [182, p.431 and 434]:
E[|ε∞,p [q]|2 ]/chp [0] is a measure for the flatness of the spectrum because in
the case of Shp (ω) = chp [0], −π ≤ ω ≤ π, we have E[|ε∞,p [q]|2 ]/chp [0] = 1,
i.e., it is a white and not predictable random sequence. Moreover, the expo-
nent in (2.65) determines the entropy rate of a stationary Gaussian random
sequence [40, p. 274].
2.3 Channel Prediction 35
Z )
ωmax,p
1
ωmax,p = 2πfmax,p < π, Shp (ω) ≥ 0, chp [0] = Shp (ω)dω .
2π −ωmax,p
(2.66)
a finite interval is given. These references do not refer to the classical prop-
erty of perfect predictability for deterministic random processes, however, the
new proofs do not require knowledge about the shape of the power spectral
density.
Consequently, perfect prediction based on the MSE criterion requires per-
fect knowledge of the power spectral density, but perfect prediction can be
achieved without any knowledge about the power spectral density except that
it belongs to the set Bhp ,fmax,p with given maximum frequency. Algorithms
and examples are given in [27, 222]. The practical problem with perfect pre-
diction is that the norm of the prediction filter is not bounded as Q → ∞
[164, p. 198]. Thus, perfect prediction is an ill-posed problem in the sense of
Hadamard [212].
The necessary a priori information for prediction can be even reduced
further as in [28, 153, 44]: Prediction algorithms for band-limited sequences
are developed which do not require knowledge about the bandwidth, but the
price is a larger required sampling rate.
The following two observations provide more understanding of the perfect
predictability of band-limited random processes.
1. A realization of an ergodic random process with band-limited power spec-
tral density is also band-limited with probability one, i.e., every realization
itself has a rich structure.
The argumentation is as follows: The realization of a random process has
finite power (second order moment) but is generally not of finite energy,
i.e., it is not square summable. Thus, its Fourier transform does not exist
[178, 169] and the common definition of band-limitation via the limited
support of the Fourier transform of a sequence is not applicable. An alter-
native definition of band-limitation is reported in [178]: A random sequence
with finite second order moment is called band-limited if it passes a filter
g[q], whose Fourier transform is G(ω) = 1, |ω| ≤ ωmax , zero elsewhere,
without distortion for almost all realizations. Employing this definition we
can show that the realization of a random process with band-limited power
spectral density is band-limited in the generalized sense with probability
one.14
This result is important because all realizations can now be treated equiv-
alently to deterministic band-limited sequences — except that they are
power-limited and not energy-limited. Thus, the results in the literature,
e.g., [27, 222, 28, 153], for deterministic signals are also valid for random
processes with probability one.
14
Proof: Consider an ergodic random sequence h[q] with power spectral density
Sh (ω) band-limitedPto ωmax , i.e., Sh (ω)G(ω) = Sh (ω). The filter g[q] is defined above.
Because E[|h[q] − ∞ 2
ℓ=−∞ g[ℓ]h[q − ℓ]| ] = 0, i.e., the output of g[q] ∗ h[q] converges to
h[q] in the mean-square sense, the filter output is also equal to the random sequence
h[q] with probability one [200, p. 432]. Thus, almost all realization h[q] of h[q] pass
the filter undistorted, i.e., are band-limited in this generalized sense.
2.4 Minimax Mean Square Error Estimation 37
For ωmax,p = π, we get E[|ε∞,p [q]|2 ] = chp [0], whereas for ωmax,p < π and
cep → 0 perfect prediction is possible, i.e., E[|ε∞,p [q]|2 ] = 0. ⊓
⊔
When deriving the MMSE channel estimator and predictor in Sections 2.2.1
and 2.3.1 we assume perfect knowledge of the joint probability distribution
for the observation and the parameters to be estimated. For the Gaussian
distribution, the estimators are linear in the observation and depend only on
the first and second order moments which are assumed to be known perfectly.
The performance of the Bayesian approach depends on the quality of the
a priori information in terms of this probability distribution. The probability
distribution is often assumed to be Gaussian for simplicity; the covariance
matrices are either estimated or a fixed “typical” or “most likely” covariance
matrix is selected for the specified application scenario. It is the hope of
the engineer that the errors are small enough such that the performance
degradation from the optimum design for the true parameters is small.
The minimax approach to MSE estimation aims at a guaranteed MSE
performance for a given class of probability distributions optimizing the esti-
mator for the worst case scenario. Thus, for a well chosen class of covariance
matrices, e.g., based on a model of the estimation errors or restriction to
38 2 Channel Estimation and Prediction
we estimate h[q + d] with the causal IIR filter w[ℓ] (w[ℓ] = 0, ℓ ≤ 0)16
∞
X
ĥ[q + d] = w[ℓ]h̃[q − ℓ]. (2.71)
ℓ=1
15
This is called an inverse minimax problem [226].
16
In [225], the more general case of estimating the output of a general linear filter
applied to the sequence h[q] is addressed.
2.4 Minimax Mean Square Error Estimation 39
Fig. 2.10 Linear prediction of h[q] based on Q past noisy observations h̃[q].
2
F(W, Sh , Se ) = E |ε∞ [q]|2 = E h[q + d] − ĥ[q + d]
Z π
1 2
= |exp(−jωd) − W(ω)| Sh (ω) + |W(ω)|2 Se (ω) dω.
2π −π
(2.72)
It is chosen
R ∞ from the space of causal and square-integrable transfer functions
with −∞ |W(ω)|2 dω < ∞, which is called Hardy space [225] and denoted by
2
H+ . The power spectral densities of the uncorrelated signal h[q] and noise
e[q] are
∞
X
Sh (ω) = ch [ℓ] exp(−jωℓ) ∈ Ch (2.74)
ℓ=−∞
X∞
Se (ω) = ce [ℓ] exp(−jωℓ) ∈ Ce . (2.75)
ℓ=−∞
Both power spectral densities are not known perfectly, but are assumed to
belong to the uncertainty classes Ch and Ce , respectively.
An estimator W is called robust if there exists a finite upper bound on the
MSE F(W, Sh , Se ) over all Sh ∈ Ch and Se ∈ Ce . Thus, the performance
max F(W, Sh , Se )
Sh ∈Ch ,Se ∈Ce
can be guaranteed.
The most robust transfer function WR minimizes this upper bound and is
the solution of the minimax problem
The solutions SLh and SLe to the maxmin problem, which is the dual problem
to (2.76),
are termed least-favorable for causal estimation for the given uncertainty
classes. If the values of (2.76) and (2.77) are equal, then SLh and SLe are least
favorable if and only if SLh , SLe , and WL form a saddle-point solution. This
means they satisfy
As for the infinite dimensional case the optimal values of both problems
are identical if and only if f L and pLh,y (h, y) form a saddle point18 , i.e.,
The equality of (2.80) and (2.81) (minimax equality) holds, because of the
following arguments:
1. The MSE in (2.80) is a linear function in the second order moments and
convex in the parameters of f L if f L is a linear function. Due to the minimax
theorem [193], problems (2.80) and (2.81) are equal.
2. For fixed C z = C Lz (C = {C Lz }), the left inequality in (2.82) is an equality
because the MSE only depends on the second order moments of h and
y for linear f L . For general C, the left inequality is true if we choose C Lz
according to (2.83).
3. For all linear estimators and all distributions ph,y (h, y), pLh,y (h, y) and f L
form a saddle point.
4. For arbitrary nonlinear estimators f (measurable functions in F), the linear
estimate is optimum for a Gaussian distribution pLh,y (h, y) and the right
inequality in (2.82) holds. Therefore, the Gaussian distribution is least-
favorable.
17
Without loss of generality, we implicitly restrict h and y to have uncorrelated and
identically distributed real and imaginary parts.
18
Eph,y(h,y) [•] is the expectation operator given by the probability distribution
ph,y (h, y).
42 2 Channel Estimation and Prediction
The least favorable solution pLh,y (h, y) in the class P is the normal distribu-
tion
in case C Ly is non-singular.
The Schur complement in (2.83) is concave in C z . For example, given an
estimate Ĉ z and the constraints kC z − Ĉ z kF ≤ ε and C z 0, problem (2.83)
can be formulated as a semidefinite program [24, p. 168] introducing P slack
variables and applying Theorem A.2 to the diagonal elements of the Schur
complement. Numerical optimization tools are available for this class of prob-
lems (e.g., [207]) that may be too complex for online adaptation in wireless
communications at the moment. But if we can choose C such that it has a
maximum element C Lz ∈ C, in the sense that C Lz C z , ∀C z ∈ C, the solution
is very simple: If the set C has a maximum element C Lz ∈ C, then it follows
from Theorem A.1 that C Lz is a solution to (2.83).
We conclude with a few remarks on minimax estimators and the solution
given by the theorem are in order:
1. Minimax robust estimators guarantee a certain performance Gmin , but
may be very conservative if the set C is “large”. For example, this is the
case when the nominal or most likely parameter from C in the considered
application deviates significantly from the least-favorable parameter. But
a certain degree of conservatism is always the price for robustness and
can be reduced by choosing the set C as small as possible. On the other
hand, if the least-favorable parameter happens to be the most likely or the
nominal parameter, then there is no performance penalty for robustness.
2. The theorem gives the justification for using a linear MMSE estimator,
although the nominal joint pdf is not Gaussian: The Gauss pdf is the
worst case distribution.
3. The theorem also shows the importance of the constraint on the second
order moments: The efficiency of the linear estimator is determined by this
2.4 Minimax Mean Square Error Estimation 43
constraint and not the Gaussian assumption [223]. This is due to the fact
that any deviation from the Gaussian pdf with identical constraint set C
on the second order moments C z does not increase the cost in (2.82).
4. The difficulty lies in defining the constraint set C. Obtaining an accurate
characterization of this set based on a finite number of observations is
critical for a robust performance, whereas the assumption of a Gaussian
pdf is already the worst-case model. 19
(1) (1)
The effort to parameterize these sets via the upper bounds αh and αn is
very small.21 Because we employed the 2-norm, both sets have a maximum
(1) (1)
element22 given by C Lh = αh I P and C Ln = αn I M , respectively. Defining
(1) (1)
C (1) = {C z |C h ∈ Ch , C n ∈ Cn } and due to the linear structure of C z in
C h and C n , the maximum element C Lz of C (1) is determined by C Lh and C Ln .
The minimax MSE channel estimator is obtained from
min max Eh,y kh − W y k22 . (2.88)
W C ∈C z
(1)
19
The example from [223] illustrates this: Consider a distribution of a Gauss mixed
with a Cauchy distribution of infinite variance, where the latter occurs only with a
small probability. To obtain a reliable estimate of the large variance would require
many observations.
20
The 2-norm kAk2 of a matrix A is defined as maxx kAxk2 /kxk2 which is its
largest singular value.
21
They may be found exploiting tr[C h ]/P ≤ kC h k2 ≤ kC h kF ≤ tr[C h ]. Thus,
(1)
an estimate of the variances of all channel parameters is sufficient and αh , in this
example, can be chosen proportional to tr[C h ].
22
W.r.t. the Loewner partial order [96], i.e., A B if and only if A − B 0.
44 2 Channel Estimation and Prediction
If estimates Ĉ h and Ĉ n,s of the covariance matrices and a model for the
estimation errors are available, the uncertainty sets can be restricted to
n o
(2) (2)
Ch = C h C h = Ĉ h + E, kEk2 ≤ αh , C h 0 (2.90)
n o
Cn(2) = C n C n = I N ⊗ C n,s , C n,s = Ĉ n,s + E n , kE n k2 ≤ αn(2) , C n 0 .
(2.91)
Because Theorem 2.2 requires compact and convex sets, the errors have to
be modeled as being bounded.23 Again we choose the 2-norm to obtain the
(2) (2)
uncertainty classes Ch and Cn with maximum elements which are C Lh =
(2) (2)
Ĉ h + αh I P and C Ln = I N ⊗ C Ln,s with C Ln,s = Ĉ n,s + αn I M . With (2.22)
this yields the robust estimator
h i−1
(2) (2)
W MMSE = Ĉ h + αh I P S H I N ⊗ (Ĉ n,s + αn(2) I M )−1 S + I P
(2)
× Ĉ h + αh I P S H I N ⊗ (Ĉ n,s + αn(2) I M )−1 . (2.92)
(1) (2)
Both robust estimators W MMSE and W MMSE converge to the ML channel
estimator (2.31) as the 2-norm of C h and of the estimation error of the
(1) (2)
estimate Ĉ h , respectively, increase: αh → ∞ and αh → ∞. This provides
a degree of freedom to ensure a performance of the MMSE approach for
estimated second order moments which is always better or, in the worst case,
equal to the ML channel estimator. The upper bounds in (2.90) and (2.91)
can be chosen proportional to or as a monotonically increasing function of a
rough estimate for the first order moment of kEk22 ≤ kEk2F .24
23
For stochastic estimation errors, the true covariance matrices are only included in
the set with a certain probability. This probability is another design parameter and
gives an upper bound on the probability that the worst case MSE is exceeded (outage
probability).
24
A stochastic model for the estimation error of sample mean estimators for covari-
ance matrices is derived in [238].
2.4 Minimax Mean Square Error Estimation 45
Fig. 2.11 Minimax MSE prediction of band-limited random sequences with a norm
constraint on their power spectral density.
The MMSE predictor in Section 2.3.1 requires the knowledge of the autoco-
variance sequence of the channel parameters. In Section 2.3.2 we introduce the
set Bh,fmax (2.66) of power spectral densities corresponding to band-limited
autocovariance sequences with given maximum (Doppler-) frequency fmax
and variance ch [0]. We would like to obtain minimax MSE predictors for
similar classes C of autocovariance sequences, which require only knowledge
of fmax and a bound on the norm of their power spectral densities. This ro-
bust solution does not require knowledge of the autocovariance sequence and
is maximally robust, i.e., minimizes the worst-case MSE (Figure 2.11).
The case of an observation of infinite dimension Q → ∞ and of fi-
nite dimension Q are treated separately. The derivations are based on the
model (2.56) and (2.59), but in the sequel we omit the index p denoting the
pth channel parameter.
The models for Q → ∞ are given by (2.70) and (2.71) for d = 0. With (2.72)
and (2.76), the minimax robust predictor is given by
min2 max F(W, Sh , Se ) = min2 max E |ε∞ [q]|2 . (2.93)
W∈H+ Sh (ω)∈Bh,fmax W∈H+ Sh (ω)∈Bh,fmax
For simplicity, we assume white noise e[q] with power spectral density
Se (ω) = ce , i.e, known variance ce > 0. Because ce > 0, we have Se (ω) > 0.
Moreover, the set Bh,fmax (2.66) is convex and ch [0] is bounded. Therefore,
46 2 Channel Estimation and Prediction
the conditions of Theorem 2.1 are satisfied.25 Thus, the least-favorable power
spectral density SLh (ω) and the corresponding optimum predictor WL form
a saddle point for (2.93). Applying the closed-form expression from [196]
(see (2.64)) to (2.93), we obtain
where the inequality is due to ln(x + 1) ≤ x for x > −1. Equality holds if
and only if Sh (ω) = SR (ω). The optimum value of (2.93) and (2.94) is (cf.
Eq. 2.69)
" ωmax /π #
πch [0]
max min F(W, Sh , Se ) = ce 1+ − 1 . (2.99)
Sh (ω)∈Bh,fmax W ce ωmax
This is the upper bound for the prediction error for all autocovariance se-
quences with power spectral density in Bh,fmax . This argumentation yields
the following theorem.
25
The class Bh,fmax in (2.66) could also be extended to contain all power spectral
densities with L1 norm (variance ch [0]) upper-bounded by β. The following argumen-
tation shows that the least-favorable power spectral density w.r.t. this extended set
has an L1 norm equal to cLh [0] = β.
2.4 Minimax Mean Square Error Estimation 47
sin(2πfmax ℓ)
cLh [ℓ] = ch [0]sinc(2fmax ℓ) = ch [0] . (2.100)
2πfmax ℓ
(2.102)
with C h̃T = C hT + ce I Q , C hT ∈ TQ +,0 with first row [ch [0], . . . , ch [Q − 1]], and
chT h = [ch [1]∗ , . . . , ch [Q]∗ ]T .
1
The set Ch,f max
can be characterized equivalently by Theorem B.3 from
Arun and Potter [6], which yields positive semidefinite constraints on the
Toeplitz matrices C T ∈ TQ+1 ′
+,0 with first row [ch [0], . . . , ch [Q]] and C T ∈ T+,0
Q
with first row [c′h [0], . . . , c′h [Q−1]], where c′h [ℓ] = ch [ℓ−1]−2 cos(2πfmax )ch [ℓ]+
ch [ℓ + 1] and ch [0] = cLh [0] = β. The least-favorable variance is cLh [0] = β.
Therefore, problem (2.103) can be reformulated as
−1
max β − cH
hT h C h̃ chT h s.t. C T 0, C ′T 0. (2.104)
{ch [ℓ]}Q
ℓ=1
T
1
Obviously, for finite Q, uncertainty set Ch,f max
, and fmax < 0.5, the least-
favorable autocovariance sequence differs from (2.100) and the predictor cor-
responding to a rectangular power spectral density is not maximally robust.27
26
Introducing the slack variable t we have
−1
min t s.t. cH
hT h C chT h ≤ t, C T 0, C ′T 0, (2.105)
t,{ch [ℓ]}Q h̃T
ℓ=1
It has often been stated in the literature that a rectangular power spec-
tral density is a “good” choice for parameterizing an MMSE predictor for
band-limited signals. But for which uncertainty class is this choice maximally
robust?
We change the definition of the uncertainty set and derive a maximally
robust predictor which depends on the (least-favorable) autocovariance se-
quence {cLh [ℓ]}Q
ℓ=0 related to a band-limited rectangular power spectral den-
sity.
In the proofs of [153, 27] for perfect predictability of band-limited random
sequences, where both references consider the case of no noise ce = 0 in
the observation, the class of autocovariance sequences with band-limited and
bounded power spectral density is considered. This motivates us to define the
uncertainty set
n Z π
∞ Q 1
Ch,f = {ch [ℓ]}ℓ=0 ∃Sh (ω) ≥ 0 : ch [ℓ] = Sh (ω) exp(jωℓ)dω,
max
2π −π
o
Sh (ω) ≤ γ, Sh (ω) = 0 for ωmax < |ω| ≤ π, ωmax = 2πfmax , (2.108)
directly, where w = [w[1], w[2], . . . , w[Q]]T ∈ CQ .28 With (2.56) and (2.59)
(omitting the index p), the estimate of h[q] reads
Q
X Q
X Q
X
ĥ[q] = w[ℓ]h̃[q − ℓ] = w[ℓ]h[q − ℓ] + w[ℓ]e[q − ℓ], (2.110)
ℓ=1 ℓ=1 ℓ=1
where we assume white noise e[q] with variance ce . Similar to (2.72), the MSE
E[|εQ [q]|2 ] of the estimation error εQ [q] = ĥ[q] − h[q] can be written as
E |εQ [q]|2
ˆ ˜
max (2.107)
{ch [ℓ]}Q
ℓ=0 ∈C
1
h ,fmax
numerically.
28
Problem (2.109) can also be solved using the minimax Theorem 2.2 together with
Theorem A.1.
50 2 Channel Estimation and Prediction
Z Q 2
ωmax
1 X 2
E |εQ [q]|2 = Sh (ω) 1 − w[ℓ] exp(−jωℓ) dω + ce kwk2 .
2π −ωmax ℓ=1
(2.111)
Z Q 2
ωmax
1 X 2
E |εQ [q]|2 ≤ γ 1− w[ℓ] exp(−jωℓ) dω + ce kwk2 (2.112)
2π −ωmax ℓ=1
Q X
X Q
= γ 2fmax + γ w[ℓ1 ]w[ℓ2 ]∗ 2fmax sinc(2fmax (ℓ1 − ℓ2 ))
ℓ1 =1 ℓ2 =1
Q
X Q
X
−γ w[ℓ] 2fmax sinc(2fmax ℓ) − γ w[ℓ]∗ 2fmax sinc(2fmax ℓ)
ℓ=1 ℓ=1
2
+ ce kwk2
= c̄εQ {w[ℓ]}Q
ℓ=1 , (2.113)
which is convex in w. The bound on the MSE is achieved with equality for
the rectangular power spectral density SR (ω) with variance cLh [0] = 2fmax γ.
∞
Due to the convexity of Ch,f max
, Theorem 2.2 holds, and SR (ω) is the least-
favorable power spectral density. This is similar to Q → ∞ and Bh,fmax (2.93),
but now the variance of the uncertainty set (2.108) is not fixed.
The necessary and sufficient conditions for the minimum of the upper
bound c̄εQ [q] ({w[ℓ]}Q
ℓ=1 ) form a system of Q linear equations
Q Q
!
∂c̄εQ [q] {w[ℓ]}ℓ=1 X
= 2fmax γ w[ℓ1 ]sinc(2fmax (ℓ1 − ℓ)) − sinc(2fmax ℓ)
∂w[ℓ]∗
ℓ1 =1
+ ce w[ℓ] = 0, ℓ = 1, 2, . . . , Q. (2.114)
29
Applying the Levinson algorithm [170] it can be solved with a computational
complexity of order Q2 .
2.4 Minimax Mean Square Error Estimation 51
∀ {ch [ℓ]}Q ∞
ℓ=0 ∈ Ch,fmax . (2.117)
30
For example, when predicting a random sequence, a Tikhonov regularization [212]
of the approximation problem in equation (1.2) of [153] would also lead to (2.113).
But in the cost function (2.113), the regularization parameter is determined by the
considered uncertainty class and the noise model.
52 2 Channel Estimation and Prediction
Fig. 2.12 Rectangular, band-pass, and Clarke’s power spectral density (psd) Sh (2πf )
[206, p. 40] with maximum frequency fmax = 0.1 (ωmax = 2πfmax ) and ch [0] = 1.
n Z π
Q 1
Chgen = {ch [ℓ]}ℓ=0 ∃Sh (ω) ≥ 0 : ch [ℓ] = Sh (ω) exp(jωℓ)dω,
2π −π
o
L(ω) ≤ Sh (ω) ≤ U(ω) , (2.118)
with arbitrary upper and lower bounds U(ω) ≥ 0 and L(ω) ≥ 0. The least-
favorable power spectral density is given by SLh (ω) = U(ω), since the MSE can
be bounded from above, similar to (2.113). Thus, in case of a nominal power
spectral density deviating significantly from the rectangular power spectral
density, the uncertainty class can be easily modified to obtain a less conserva-
tive predictor. For example, piece-wise constant power spectral densities can
be chosen whose additional parameters can be adapted measuring the signal
power in different predefined frequency bands.
Fig. 2.13 Rectangular power spectral density (psd) Sh (2πf ) with maximum frequen-
cies fmax ∈ {0.05, 0.1, 0.2} and ch [0] = 1.
and (2.117) (“Minimax MSE L1 (upper bound)” and “Minimax MSE L∞ (up-
per bound)”) are guaranteed for the corresponding minimax predictor if the
1 ∞
considered power spectral density belongs to Ch,f max
or Ch,f max
, respectively.
1
For Ch,fmax , we choose β = 1 which is the true variance ch [0] = 1. When
we use Clarke’s power spectral density, which is not bounded and does not
∞
belong to Ch,f max
for finite γ, the choice of γ = 2/fmax yields only an approx-
imate characterization of the considered nominal power spectral density (see
Figure 2.12). At first we assume perfect knowledge of fmax .
Figure 2.14 illustrates that prediction based on Q = 5 past observations
{h̃[q − ℓ]}Q ℓ=1 is already close to the asymptotic MSE for Q → ∞.
31
The
result is given for Clarke’s power spectral density (fmax = 0.1, ch [0] = 1)
and ce = 10−3 , but Q = 5 is also a good choice w.r.t. performance and
numerical complexity for other parameters and power spectral densities. We
choose Q = 5 in the sequel.
Because Clarke’s power spectral density is rather close to a rectangular
power spectral density (Figure 2.12), except for f ≈ ±fmax , the MSEs of the
minimax predictors (γ = 2/fmax ) are close to the MMSE predictor with per-
fect knowledge (Figure 2.15). The graph for perfect knowledge is not shown,
because it cannot be distinguished from the minimax MSE performance based
1 ∞
on Ch,f max
. Furthermore, the minimax MSE based on Ch,f max
is almost iden-
1
tical to the upper bound for Ch,fmax . The upper bounds on the MSEs (2.104)
and (2.117) are rather tight for low fmax .
A different approach to the prediction or interpolation of band-limited
signals is based on discrete prolate spheroidal sequences (DPSS) [195, 48].
Prolate spheroidal sequences can be shown to be the basis which achieves
the most compact representation of band-limited sequences concentrated in
31
The MSE for Q → ∞ is given by (2.64) with Equations (27)-(30) in [134].
54 2 Channel Estimation and Prediction
Fig. 2.14 MSE E[|εQ [q]|2 ] for prediction of h[q] for different number Q of obser-
vations {h̃[q − ℓ]}Q
ℓ=1 . (Model: Clarke’s power spectral density (Figure 2.12) with
ch [0] = 1, ce = 10−3 , fmax = 0.1)
Fig. 2.15 MSE E[|εQ [q]|2 ] vs. maximum frequency fmax for prediction of h[q] based
on Q = 5 observations {h̃[q − ℓ]}Q ℓ=1 . (Model: Clarke’s power spectral density (Fig-
ure 2.12) with ch [0] = 1, ce = 10−3 , Q = 5)
the time domain with a fixed number of basis vectors, i.e., a reduced dimen-
sion. But the norm of the resulting linear filters becomes unbounded as the
observation length increases. This requires regularization methods [154] to
decrease the norm and amplification of the observation noise. A truncated
singular valued decomposition [48] or a reduction of the dimension [242] are
most common.
We use two approaches as references in Figure 2.16: “DPSS (fixed di-
mension)” chooses the approximate dimension ⌈2fmax Q⌉ + 1 of the signal
subspace of band-limited and approximately time-limited signal, which is the
time-bandwidth dimension [48, 195], and uses the algorithm in Section V of
2.4 Minimax Mean Square Error Estimation 55
Fig. 2.16 MSE E[|εQ [q]|2 ] vs. maximum frequency fmax for prediction of h[q] based
on Q = 5 observations {h̃[q − ℓ]}Q
ℓ=1 : Comparison of minimax prediction (2.115) with
prediction based on discrete prolate spheroidal sequences (DPSS) [48, Sec. V]. (Model:
Clarke’s power spectral density (Figure 2.12) with ch [0] = 1, ce = 10−3 , Q = 5)
Fig. 2.17 MSE E[|εQ [q]|2 ] vs. maximum frequency fmax for prediction of h[q] based
on Q = 5 observations {h̃[q − ℓ]}Q ℓ=1 : Sensitivity of minimax prediction (2.115) de-
fix
signed for a fixed maximum frequency fmax = 0.1 and γ = 20 to a different nominal
maximum frequency fmax . (Model: Rectangular power spectral density (Figure 2.13)
with ch [0] = 1, ce = 10−3 , Q = 5)
[48] without any additional regularization. This choice of dimension for the
subspace is also proposed in [242], which solves an interpolation problem.
The performance of the DPSS based predictor depends on the selected di-
mension of the signal subspace, which is given from a bias-variance trade-off.
For “DPSS (optimum dimension)”, the MSE is minimized over all all possible
dimensions in the set {1, 2, . . . , Q}, i.e., we assume the genie-aided optimum
choice of dimension w.r.t. the bias-variance trade-off. The minimax robust
∞
predictor for Ch,f max
determines its regularization parameter from the noise
56 2 Channel Estimation and Prediction
Fig. 2.18 Relative degradation of the MSE E[|εQ [q]|2 ] for minimax predic-
tion (2.115) designed as for Figure 2.17 w.r.t. the MMSE predictor with perfect
knowledge of the power spectral density and the nominal maximum frequency fmax .
(Model: Rectangular power spectral density (Figure 2.13) with ch [0] = 1, ce = 10−3 ,
Q = 5)
and signal uncertainty class. Besides its simplicity, its MSE is smaller than
for the DPSS based approach (Figure 2.16).
Next we investigate the sensitivity of the minimax robust predictor (2.115)
∞
for Ch,f max
to the knowledge of fmax . As shown in Figure 2.13, we fix fmax
fix
to fmax = 0.1 for designing the robust predictor, but fmax of the nomi-
nal rectangular power spectral density changes from 0.02 to 0.2. Choosing
γ = 20, the uncertainty set contains the nominal power spectral density for
fmax ∈ [0.025, 0.1] because ch [0] = 1 = 2fmax γ (2.116) has to be satisfied.
Thus, the upper bound (2.117) is the guaranteed performance only in the
interval [0.025, 0.1] and in Figure 2.17 we give the upper bound only for
fix
fmax ≤ 0.1. The shape of the graphs is also typical for other choices of fmax
and types of power spectral densities. For a deviation in MSE (of the mis-
matched minimax predictor relative to the MMSE with perfect knowledge of
the power spectral density) which is smaller than two, the nominal maximum
frequency is allowed to deviate from the design point of fmax by −40% or
+60% (Figure 2.18). The relative degradation is approximately symmetric
w.r.t. fmax = 0.1. Overestimating fmax is less critical because the worst-case
MSE is given by (2.117).
Finally, we choose the band-pass power spectral density from Figure 2.12
with support in the intervals [0.9fmax , fmax ] and [−fmax , −0.9fmax ] and vari-
ance ch [0] = 1. This power spectral density deviates significantly from the
rectangular power spectral density (Figure 2.12). For example, for fmax = 0.1
∞
we have to set γ = 50 to include the power spectral density into Ch,f max
. The
MSEs of the minimax predictors as well as their upper bounds increase with
fmax , as for Clarke’s power spectral density (Figure 2.19). But the MSE of
the MMSE predictor with perfect knowledge of the power spectral density is
2.4 Minimax Mean Square Error Estimation 57
Fig. 2.19 MSE E[|εQ [q]|2 ] vs. maximum frequency fmax for prediction of h[q] based
on Q = 5 observations {h̃[q − ℓ]}Q ℓ=1 . (Model: Band-pass power spectral density
(Figure 2.12) with ch [0] = 1, ce = 10−3 , Q = 5, γ = 1/(fmax − 0.9fmax )/2)
For wireless communications, the L1 -norm is the more appropriate norm for
the channel parameters’ power spectral density Sh (ω) because the variance
ch [0] can be estimated directly. The related robust predictor from (2.104)
yields a smaller MSE than its counterpart for the L∞ -norm. Its major dis-
advantage is the computational complexity for solving (2.104). In general,
it is considerably larger (by one order in Q) than for the linear system of
equations in (2.115). But, since the results depend only on β and fmax , they
could be computed offline with sufficient accuracy. On the other hand, the
approach based on the L∞ -norm can be modified easily to decrease the size of
the corresponding uncertainty class such as in (2.118). This is very important
in practice to improve performance in scenarios deviating significantly from
the rectangular power spectral density.
Only for Q → ∞, the rectangular power spectral density is least-favorable
1
for Ch,f max
(Theorem 2.3). Whereas the corresponding partial autocovariance
∞
sequence is always least-favorable for Ch,f max
(Theorem 2.4) with generally
larger worst-case MSE. Thus, this prominent “robust” choice of the MMSE
1 ∞
predictor is not maximally robust for Ch,f max
but for Ch,f max
.
58 2 Channel Estimation and Prediction
32
cL ′
h [1] results from C T 0 and maximizes the MMSE in (2.119).
Chapter 3
Estimation of Channel and Noise
Covariance Matrices
Introduction to Chapter
1
The likelihood equation is the gradient of the likelihood function set to zero.
59
60 3 Estimation of Channel and Noise Covariance Matrices
Outline of Chapter
B
Y
Ĉ x = argmax px (x[q]; C x ) . (3.1)
C x ∈M q=1
PB
The sample covariance matrix C̃ x = B1 q=1 x[q]x[q]H is a sufficient statistic
for estimating C x .
The properties of optimization problem (3.1) for different choices of M and
assumptions on C̃ x , which can be found in the literature, are summarized in
the sequel:
1. If C̃ x ∈ SM M
+ , i.e., B ≥ M necessarily, and M ⊆ S+,0 is closed, then (3.1)
has positive definite solutions Ĉ x [29]. But if C̃ x is singular, the existence
of a positive definite solution is not guaranteed, although very often (3.1)
will yield a positive definite estimate (see Section 3.1.2).
In [81, 78, 149] conditions are given under which the likelihood func-
tion (3.2) is unbounded above, i.e., the solution of (3.1) is possibly singular.
A necessary condition is that C̃ x must be singular. Moreover, whether the
likelihood function for realizations {x[q]}B q=1 of the random vector x is
unbounded depends on the particular realizations observed, i.e., the prob-
ability of the likelihood to be unbounded may be non-zero.
3.1 Maximum Likelihood Estimation of Structured Covariance Matrices 63
Ĉ x = C̃ x . (3.3)
The likelihood equation in C x follows from the necessary condition that the
gradient of the log-likelihood function w.r.t. C x must be zero and reads
C̃ x = C x . (3.4)
If M = SM M
+,0 , we obtain (3.3). For C x ∈ M ⊂ S+,0 , it does not have a
2
solution in general. Very often (3.4) is only satisfied with probability zero,
but the probability increases with B (see Section 3.5). If C̃ x ∈ M and we
have a unique parameterization of M, then the unstructured ML estimate
Ĉ x = C̃ x is also the ML estimate for the restricted class of covariance
matrices C x ∈ M due to the invariance principle [124, 46].
3. In general, the necessary conditions, i.e., the likelihood equation, for a so-
lution of (3.1) based on the gradient form a nonlinear system of equations.
For example,
PV the necessary conditions for ci in case of a linear structure
C x = i=1 ci S i are [4, 5]
tr C −1 −1
x (C x − C̃ x )C x S i = 0, i = 1, 2, . . . , V. (3.5)
(3.6)
−1
Here, the weighting matrices C̃ x correspond to the inverses
of consistent
estimates for the error covariance matrix of c̃x = vec C̃ x which is C c̃x =
1 T 3
B (C x ⊗ C x ).
In general, the EXIP aims at a simple solution of the ML problem (3.1) by
first extending the set M (here to SM
+,0 ) such that a simple solution (here:
2
It is not the likelihood equation for this constrained set M.
3
Precisely, (3.6) results from (A.15) and the consistent estimate C̃ x of C x which
yield kc̃x − cx k2 −1 = BkC̃ x − C x k2 −1 for cx = vec[C x ].
C̃ c̃ C̃ x
x
64 3 Estimation of Channel and Noise Covariance Matrices
M = {C x |C x ∈ TM −1 M M
+ , C x ∈ T+ } = Tc+ , (3.7)
Example 3.2. If C̃ x ∈ SM M −1 M
+ , M ⊆ S+,0 is closed, and {C x |C x ∈ M ∩ S+ } is
convex, then (3.1) has a unique positive definite solution [176].
An example for M with these properties is the set of positive semidefinite
doubly symmetric matrices, i.e., matrices symmetric w.r.t. their main and
anti-diagonal. A subset for this choice of M is the set of positive semidefinite
Toeplitz matrices. But for Toeplitz matrices the set of inverses of Toeplitz
matrices is not a convex set [176]. ⊓
⊔
4
This theorem has been, for example, in array processing and spectral estimation
[203, 105].
66 3 Estimation of Channel and Noise Covariance Matrices
rem treats this case for the example of estimating the very simple choice of
C x = cI M , which is a Toeplitz covariance matrix.
Theorem 3.2 (Fuhrmann and Barton [79]). For B = 1, x[1] ∈ CM , and
C x = cI M the ML estimate is given by (3.1) with M = TM 5
+,0 . The probability
P that the ML estimate is positive definite is upper bounded as
2−(M −2)
P≤ . (3.10)
(M − 1)!
lim P = 0. (3.11)
M →∞
1
P≤ . (3.12)
2M −2
For C x with general Toeplitz structure, a similar result is given in [81], but
the bound decreases more slowly with M . Due to the closeness of Toeplitz to
circulant matrices [81, 85] it can be expected that the probability of having
a positive definite estimate for C x ∈ TM M
+,0 and M = T+,0 does not deviate
significantly from the bound given for circulant covariance matrices.
The ML problem (3.1) is typically ill-posed because the probability of
obtaining the required positive definite solution becomes arbitrarily small
for increasing M . Thus, a regularization of the problem becomes necessary.
Fuhrmann and Miller [81] propose to restrict the set M from Toeplitz matrices
TM+,0 to contain only M × M Toeplitz matrices with positive semidefinite
circulant extension of period M̄ > M . We denote this set as TM M
ext+,0 ⊂ T+,0 .
By this definition we mean that for all C x ∈ TM ext+,0 a positive semidefinite
circulant matrix in TM̄
c+,0 exists whose upper-left M × M block is C x . With
M
this constrained set Text+,0 the ML estimate (3.1) has the desired positive
definite property as stated by the next theorem.
5
Although cI M ⊂ TM M
+,0 , c ≥ 0, optimization is performed over M = T+,0 .
6
Note that the bound decreases exponentially with M .
3.1 Maximum Likelihood Estimation of Structured Covariance Matrices 67
is given by
c[0] c[1] c[2] c[2]∗ c[1]∗
c[1]∗ c[0] c[1] c[2] c[2]∗
∗
5
c[2]
C x̄ = c[1]∗ c[0] c[1] c[2] ∈ Tc , (3.14)
c[2] c[2]∗ c[1]∗ c[0] c[1]
c[1] c[2] c[2]∗ c[1]∗ c[0]
C x̄ = V DV H , (3.15)
M̄
X
H
C x = V M DV M = dℓ v M,ℓ v H
M,ℓ , (3.16)
ℓ=1
M̄ −1
sequence c[ℓ]. {c[ℓ]}ℓ=0 is the extension of {c[ℓ]}M −1
ℓ=0 to one full period of
the periodic autocovariance sequence of the elements in x.7
Dembo et al.[47] and Newsam et al.[155] give sufficient conditions on C x
to be element of TM ext+,0 : They derive lower bounds on M̄ which depend on
the observation length M and the minimum eigenvalue of C x for positive
definite C x . The bound increases with decreasing smallest eigenvalue and for
a singular C x a periodic extension does not exist. Their constructive proofs
also provide algorithms which construct an extension. But the extension of
{c[ℓ]}M −1
ℓ=0 is not unique. And if the bounds are loose, the computational
complexity of the corresponding ML estimator increases. It has not been
addressed in the literature, how a positive semidefinite circulant extension
can be constructed or chosen that would be a good initialization for the
algorithms in Section 3.3. We address this issue at the end of Sections 3.3.3
and 3.3.5.
C n = I N ⊗ C n,s , (3.19)
i.e., uncorrelated in time and correlated over the M receiver dimensions with
C n,s = E[n[q, n]n[q, n]H ]. As as special case, we also consider spatially and
temporally uncorrelated noise with C n = cn I M .
7
[c[0], c[1], . . . , c[M − 1]] is the first row of C x .
8
In general the rows of S̄ ∈ CK(L+1)×N form a (known) basis of the signal subspace
in CN (compare with (2.3)).
3.2 Signal Models 69
9
ℓi − ℓi−1 is constant for all 2 ≤ i ≤ B.
70 3 Estimation of Channel and Noise Covariance Matrices
Fig. 3.1 System model for estimation of C h and C n,s with hidden data spaces h [q]
and n [q] (Section 3.3.1); transformation of the observation y [q] with T yields the
sufficient statistic ĥ [q] and n̄ [q].
Y
LyT ({y T [q]}q∈OT ; C hT , C n,s ) = ln pyT (y T [q]; C hT , C nT )
q∈OT
= −IBM N ln π − I ln det S T C hT S H T + C nT
h −1 i
− Itr C̃ yT S T C hT S H
T + C nT (3.21)
1 X
C̃ y = y[q]y[q]H . (3.24)
B
q∈O
where
S †T I B ⊗ S†
TT = = ∈ CBM N ×BM N (3.28)
AH T I B ⊗ AH
1 X
C̃ ĥT = ĥT [q]ĥT [q]H (3.30)
I
q∈OT
and
1 X
C̃ n̄T = n̄T [q]n̄T [q]H . (3.31)
I
q∈OT
− BM N̄ ln π − B ln det[I N̄ ⊗ C n,s ]
− Btr C̃ n̄ I N̄ ⊗ C −1
n,s (3.33)
with
1 X
C̃ ĥ = ĥ[q]ĥ[q]H (3.34)
B
q∈O
and
1 X
C̃ n̄ = n̄[q]n̄[q]H . (3.35)
B
q∈O
The problem of estimating the channel and noise covariance matrix can be
formulated in the framework of estimation of a structured covariance matrix
reviewed in Section 3.1: The covariance matrix of the observations {y T [q]}Iq=1
is structured and uniquely parameterized by the channel covariance matrix
C hT and noise covariance matrix C n,s as C yT = S T C hT S H
T +I N ⊗C n,s (3.21)
(Model 1). Our goal is not necessarily a more accurate estimation of C yT , but
we are mainly interested in its parameters C hT and C n,s . The corresponding
ML problem is
3.3 Maximum Likelihood Estimation 73
ML ML
{Ĉ hT , Ĉ n,s } = argmax LyT ({y T [q]}q∈OT ; C hT , C n,s )
C hT ∈TP,B M
+,0 ,C n,s ∈S+,0
From the discussion in Section 3.1.1 (the necessary conditions (3.5) in par-
ticular), it seems that an explicit solution of these optimization problems is
impossible. Alternatives are the numerical solution of the system of nonlin-
ear equations (3.5) [5] or a numerical and iterative optimization of the un-
constrained optimization problems [124]. Numerical optimization techniques
create a sequence of optimization variables corresponding to monotonically
increasing values of the log-likelihood function.
Fig. 3.2 SAGE iterations selecting a hidden data space (HDS) in every iteration for
Model 2 (similarly for model 1).
for the chosen hidden-data. We are completely free in choosing a suitable se-
quence of hidden-data spaces. This can improve convergence rate. Generating
a monotonically increasing sequence of log-likelihood values and estimates, it
converges asymptotically to a local maximum of the log-likelihood function.
First we have to define suitable hidden data spaces for our estimation prob-
lems, which result in simple M-Steps. We start with the following Gedanken-
3.3 Maximum Likelihood Estimation 75
experiment for model 2:10 If h[q] and n[q] were observable for q ∈ O, the
corresponding ML problems would be
Y
max ph (h[q]; C h ) (3.38)
C h ∈SP
+,0 q∈O
and
B
YY
max pn[q,n] (n[q, n]; C n,s ) (3.39)
C n,s ∈SM
+,0 q∈O n=1
with n[q] = [n[q, 1]T , n[q, 2]T , . . . , n[q, N ]T ]T (3.18). Due to the complex
Gaussian pdf the solution to both problems is simple and given by the sample
covariance matrices of the corresponding (hypothetical) observations
1 X
C̃ h = h[q]h[q]H , (3.40)
B
q∈O
N
1 XX
C̃ n,s = n[q, n]n[q, n]H , (3.41)
BN n=1
q∈O
P PN
or c̃n = BN1 M q∈O n=1 kn[q, n]k22 for spatially uncorrelated noise.
We define the following two hidden-data spaces (HDS) for model 2 :
• HDS 1: The random vector h[q] is an admissible HDS w.r.t. C h be-
cause py ,h (y[q], h[q]; C h , C n,s ) = py |h[q] (y[q]|h[q]; C n,s ) ph (h[q]; C h ), i.e.,
the conditional distribution is independent of C h [68].
• HDS 2: The random vector n[q] is an admissible HDS w.r.t. C n,s (or cn )
because py ,n (y[q], n[q]; C h , C n,s ) = py |n[q] (y[q]|n[q]; C h ) pn (n[q]; C n,s ),
i.e., the conditional distribution is independent of C n,s [68].
For model 1, we will slightly extend the definition of HDS 1 in Section 3.3.3
to be able to incorporate the block-Toeplitz structure of C hT , but the general
algorithm introduced below is still applicable.
(0) (0)
We start with an initialization C h and C n,s . Every iteration of the SAGE
algorithm to solve the ML problem (3.37) (and (3.36) for model 1) consists
of three steps (Figure 3.2):
1. Choose a HDS for iteration i + 1. We propose three alternatives:
a. Start with HDS 1 and then alternate between HDS 1 and HDS 2. Alter-
nating with period Niter = 1 may result in an increased estimation error
(i)
for C n,s , since the algorithm cannot distinguish sufficiently between the
10
The ML problem for model 2 is simpler than for model 1 as C h does not have any
special structure.
76 3 Estimation of Channel and Noise Covariance Matrices
correlations in y [q] origining from the signal and noise, respectively, and
is trapped in a local maximum.
b. Choose HDS 1 in the first Niter iterations, then switch to HDS 2 for Niter
iterations. Now alternate with this period. This has the advantage that
(N )
the estimate C h iter of C h has already improved considerably before
(0)
we start improving C n,s . The resulting local maximum of the likelihood
function corresponds to a better estimation accuracy.
c. Choose HDS 1 and HDS 2 simultaneously, which is similar to the orig-
inal EM algorithm as we do not alternate between the HDSs. The con-
vergence is slower than for the first variation.
2. E-step: Perform the CM estimate of the log-likelihood for the selected HDS
(i) (i)
given C h and C n,s from the previous iteration.
3. M-step: Maximize the estimated log-likelihood w.r.t. C h for HDS 1 and
C n,s for HDS 2.
The algorithm is shown in Figure 3.2, where we choose a maximum number
of iterations imax to have a fixed complexity. In the following subsections we
derive the E- and M-step for both hidden data spaces explicitly: We start with
HDS 2 for the noise covariance matrix C n,s or variance cn , respectively. For
problem (3.36) and model 1, we give the iterative solution in Section 3.3.3. As
a special case, we treat model 1 with P = 1 in Section 3.3.5. In Section 3.3.4
we present both steps for estimating the channel correlations based on HDS 1,
model 2, and problem (3.37).
The HDS 2 is defined as nT [q] and n[q] for model 1 and model 2, respectively.
We derive the E- and M-step starting with the more general model 1.
Model 1
B N
1 X XX
C̃ n,s = n[q − b, n]n[q − b, n]H (3.43)
IBN n=1
q∈OT b=1
based on the temporal samples n[q, n] of the noise process, which are defined
in (3.18).
The CM estimate of the log-likelihood function (3.42)
h i
(i)
EnT LnT ({nT [q]}q∈OT ; C n,s )|{y T [q]}q∈OT ; C hT , C (i)
n,s (3.44)
which now substitutes C̃ n,s in (3.42). As n[q −b, n] = J b,n nT [q] with selection
matrix J b,n = eT
(b−1)N +n ⊗ I M ∈ {0, 1}
M ×BM N
and the definition of OT in
Section 3.2, we obtain
h i
(i) (i)
En n[q − b, n]n[q − b, n]H |y T [q]; C hT , C (i) T
n,s = J b,n RnT |y T [q] J b,n (3.46)
for every term in (3.43). With (A.2) and (A.3) the conditional correlation
matrix reads
h i
(i) (i) (i) (i) (i)
RnT |yT [q] = EnT nT [q]nT [q]H |y T [q]; C hT , C (i) H
n,s = n̂T [q]n̂T [q] + C nT |y T [q] ,
(3.47)
where
h i
(i) (i)
n̂T [q] = EnT nT [q]|y T [q]; C hT , C (i)
n,s
= C (i) (i),−1
nT yT C yT y T [q] = W (i)
nT y T [q] (3.48)
is the CM estimate of the noise nT [q] based on the covariance matrices from
the previous iteration i. With the definitions in Section 3.2 we have
(i)
C (i) H (i)
yT = S T C hT S T + I BN ⊗ C n,s
B N
1 X XX (i)
C (i+1)
n,s = J b,n RnT |yT [q] J T
b,n
IBN n=1
q∈OT b=1
(3.50)
B N
1 XX (i)
= J b,n W (i) (i),H
nT C̃ yT W nT + C nT |yT [q] J T
b,n
BN n=1
b=1
(i)
with sample covariance matrix of the observations C̃ yT (3.22) and W nT
from (3.48).
Assuming spatially white noise the estimate for the noise variance is
1 h i
(i)
c(i+1)
n = En tr C̃ n,s |{y T [q]}q∈OT ; C hT , c(i)
n
M
1 X h (i) i
= tr RnT |yT [q]
IBM N
q∈OT
1 h i
(i)
= tr W (i) (i),H
nT C̃ yT W nT + C nT |yT [q] (3.51)
BM N
1
P PB 2
because tr[C̃ n,s ] = IBN q∈OT b=1 kn[q − b]k2 .
The final estimates are given after iteration imax of the SAGE algorithm
Ĉ n,s = C (i
n,s
max )
ĉn = c(i
n
max )
. (3.52)
Model 2
with
h i
(i)
n̂(i) [q] = En n[q]|y[q]; C h , C (i)
n,s
= C (i) (i),−1
ny C y y[q] = W (i)
n y[q] (3.59)
(i)
C n|y[q] = C (i)
n − W (i) (i),H
n C ny = C (i)
n − C (i)
n Cy
(i),−1 (i)
Cn (3.60)
and
C (i) (i)
n = I N ⊗ C n,s
C (i) (i)
ny = C n
(i)
C (i) H (i)
y = SC h S + C n .
Initialization
B
X X
C (0)
n,s = argmax ln pn̄ (n̄[q − b]; C n,s )
C n,s ∈SM
+,0 q∈OT b=1
= argmax IB N̄ − ln det[C n,s ] − tr C̃ n̄,s C −1
n,s , (3.63)
C n,s ∈SM
+,0
where we used det[I B N̄ ⊗ C n,s ] = (det C n,s )B N̄ (A.13) in the last step. Be-
cause C n,s ∈ SM+,0 , its maximization yields the sample covariance matrix of
the noise in the noise subspace
B N̄
1 X XX
C (0)
n,s = C̃ n̄,s = n̄[q − b, n]n̄[q − b, n]H
IB N̄ q∈O b=1 n=1
T
B
1 X X H
= Y [q − b]Ā ĀY [q − b]H
IB N̄ q∈O b=1
T
B
1 X X
= (Y [q − b] − Ĥ[q − b]S̄)(Y [q − b] − Ĥ[q − b]S̄)H .
IB N̄ q∈O b=1
T
(3.64)
Here, we define n̄[q, n] by n̄[q] = [n̄[q, 1]T , n̄[q, 2]T , . . . , n̄[q, N̄ ]T ]T (similar
†
to (3.18)) and the ML channel estimate Ĥ[q] = Ĥ ML [q] = Y [q]S̄ from (2.31)
⊥ H †
with Ĥ ML = unvec ĥML .11 P̄ = Ā Ā = I N − S̄ S̄ is the projector on
† H H
the noise subspace with S̄ = S̄ (S̄ S̄ )−1 .
This estimate is unbiased in contrast to the estimate of the noise covariance
matrix in (2.32). Note that the difference is due to the scaling by N̄ instead
of N .
For spatially white noise, the ML estimation problem is (cf. (3.29))
B
X X
max ln pn̄ (n̄[q − b]; C n,s = cn I M ) . (3.65)
cn ∈R+,0
q∈OT b=1
(3.66)
11 ∗
Additionally, the relations n̄[q] = AH y[q] = (Ā ⊗I M )y[q] and n̄[q] = vec[N̄ [q]],
y[q] = vec[Y [q]] together with (A.15) are required (Section 3.2).
3.3 Maximum Likelihood Estimation 81
B
1 X X H
c(0)
n = c̃n = kY [q − b]Ā k2F
IB N̄ M q∈O b=1
T
B
1 X X
= kY [q − b] − Ĥ[q − b]S̄k2F . (3.67)
IB N̄ M q∈O b=1
T
Interpretation
Consider the updates (3.51) and (3.62) for estimation of the noise variance
cn . How do they change the initialization from (3.67), which is based on the
noise subspace only?
The estimate in iteration i + 1 (3.51) can be written as
!!
(i)
(i+1) 1 (i) βcn
cn = M N̄ c̃n + P cn 1− , (3.68)
MN PB
where
h −1 (i),−1 i
(i),−1 (i),−1
β = tr S H S
T T C ĥ
− C ĥ
C̃ C
ĥT ĥ >0 (3.69)
T T T
c(i+1)
n < c̃n , i > 0, (3.70)
(0)
with initialization cn = c̃n from (3.67).
For a sufficiently large noise subspace, c̃n is already a very accurate es-
timate. Numerical evaluations show that the SAGE algorithm tends to in-
troduce a bias for a sufficiently large size of the noise subspace and for this
(i)
initialization. In this case the MSE of the estimate cn for the noise variance
cn is increased compared to the initialization.
12
The positive semidefinite Least-Squares approaches in Section 3.5.3 show a similar
behavior, but they take into account only a few dimensions of the signal subspace.
82 3 Estimation of Channel and Noise Covariance Matrices
C z = (V ⊗ I P )C a (V H ⊗ I P ), (3.72)
13
To ensure an asymptotically unbiased estimate, B̄ has to be chosen such that the
true C hT has a positive definite circulant extension of size B̄. In case P = 1, sufficient
lower bounds on B̄ are given by [47, 155], which depend on the minimum eigenvalue
of C hT . If the observation size I and B are small, unbiasedness is not important, as
the error is determined by the small sample size. Thus, we can choose the minimum
B̄ = 2B − 1.
3.3 Maximum Likelihood Estimation 83
C a[1]
Ca = ..
(3.73)
.
C a[B̄]
is the covariance matrix of a[q] = [a[q, 1]T , a[q, 2]T , . . . , a[q, B̄]T ]T , a[q, n] ∈
CP is given by z[q] = (V ⊗ I P )a[q], and C a[n] = E[a[q, n]a[q, n]H ]. For
V B ∈ CB×B̄ , we define the partitioning of the unitary matrix [V ]k,ℓ =
√1 exp(j2π(k − 1)(ℓ − 1)/B̄), k, ℓ = 1, 2, . . . , B̄,
B̄
VB
V = . (3.74)
V̄ B
with
P (hypothetical) sample covariance matrix of the hidden data space C̃ z =
1 H
I q∈OT z[q]z[q] . It can be simplified to
B̄
X B̄
X h i
Lz ({z[q]}q∈OT ; C z ) = −I B̄P ln π − I ln det C a[n] − I tr C̃ a[n] C −1
a[n]
n=1 n=1
B̄
X
= La[n] ({a[q, n]}q∈OT ; C a[n] ) (3.77)
n=1
B̄ h i
X −1
tr C̃ z C −1
z = tr C̃ a C −1
a = tr C̃ a[n] C a[n] (3.78)
n=1
with C̃ a[n] = J ′n C̃ a J ′T
n and J ′n = [0P ×P (n−1) , I P , 0P ×P (B̄−n) ] ∈
P ×P B̄
{0, 1} . Note that the log-likelihood function (3.77) of {z[q]}q∈OT
is equivalent to the log-likelihood function of the observations
{a[q, n]}q∈OT , n = 1, 2, . . . , B̄, of B̄ statistically independent random
vectors a[q, n].
84 3 Estimation of Channel and Noise Covariance Matrices
E- and M-Step
With (A.2) and (A.3) the CM estimate of the sample covariance matrix C̃ z
of the HDS can be written as
h i
(i)
Ez C̃ z |{ĥT [q]}q∈OT ; C (i) (i) (i) (i),H
z , C n,s = W z C̃ ĥT W z + C z|ĥ [q] (3.80)
T
(i)
C z|ĥ = C (i) (i) (i)
z − W z [I BP , 0P (B̄−B)×BP ]C z . (3.83)
T [q]
It follows from the definition of tT [q] (3.27) that ĥT [q] and n̄T [q] are statis-
tically independent and the conditioning on tT [q] in (3.79) reduces to ĥT [q]
in (3.80) and (3.81).
We can rewrite (3.79) similarly to (3.77), which corresponds to the es-
timation of B̄ unstructured positive definite covariance matrices C a[n] , n =
1, 2, . . . , B̄. Therefore, maximization of (3.79) results in the estimate of the
blocks
(i+1) (i)
C a[n] = J ′n (V H ⊗ I P ) W (i)z C̃ ĥT W (i),H
z + C z|ĥ [q]
(V ⊗ I P )J ′,T
n (3.84)
T
(i+1) (i+1)
on the diagonal of C a = (V H ⊗ I P )C z (V ⊗ I P ) in iteration i + 1. The
(i+1)
upper left block of C z is the estimate of interest to us:
3.3 Maximum Likelihood Estimation 85
(i+1)
C hT = (V B ⊗ I P )C (i+1)
a (V H
B ⊗ IP )
Initialization
(0)
The SAGE algorithm requires an initialization C z of C z . We start with an
estimate of C hT (for the choice of OT in Section 3.2)
I
(0) 1 X
C hT = Ĥ[q]Ĥ[q]H ∈ TP,B
+,0 , (3.86)
IB q=1
where
ĥ[q − 1] . . . ĥ[q − B]
Ĥ[q] = .. ..
∈C
BP ×2B−1
(3.87)
. .
ĥ[q − 1] . . . ĥ[q − B]
(0)
is block Toeplitz. C hT is also block Toeplitz. Its blocks are given by the
estimates of C h [ℓ]
I B−ℓ
(0) 1 XX
C h [ℓ] = ĥ[q − i]ĥ[q − i − ℓ]H , ℓ = 0, 1, . . . , B − 1, (3.88)
IB q=1 i=1
14
Assuming a full column rank of Ĥ[q], a necessary condition for positive definite
(0) (0)
C hT is I ≥ P/(2 − 1/B) (at least I ≥ P/2). But non-singularity of C hT is not
required by the algorithm.
86 3 Estimation of Channel and Noise Covariance Matrices
Ĥext [q] =
ĥ[q − B] 0P ×1 ... 0P ×1 ĥ[q − 1] . . . ĥ[q − B + 1]
.. .. .. ..
. . . . . (3.91)
ĥ[q − 2] . . . ĥ[q − B] 0P ×1 . . . 0P ×1 ĥ[q − 1]
(0)
The unique block circulant extension of C h [ℓ] (3.86) is
I
1 Xˆ ˆ H (2B−1)P
C (0)
z = H̄[q]H̄[q] ∈ S+,0 , (3.92)
IB q=1
15 ˆ
If H̄[q], q ∈ OT , have full column rank (see [214]), then I ≥ P is necessary for
(0)
non-singularity of C z .
3.3 Maximum Likelihood Estimation 87
Initialization
The iterations are initialized with the sample mean estimate of the covariance
matrix C h
88 3 Estimation of Channel and Noise Covariance Matrices
B
(0) 1 X
C h = C̃ ĥ = ĥ[q]ĥ[q]H , (3.96)
B q=1
(i) (i)
with W z and C z|ĥ as in (3.82) and (3.83).
T [q]
3.3 Maximum Likelihood Estimation 89
Initialization
(0)
As initialization for C hT we choose (3.86) for P = 1 and I = 1, which is a
positive definite and biased estimate [169, p. 153]. For B̄ = 2B − 1, we obtain
(0)
the unique positive definite circulant extension C z based on Theorem 3.5.16
18
The parameterization in (3.98) is unique up to a real-valued scaling.
3.4 Completion of Partial Band-Limited Autocovariance Sequences 91
and the variance ce of e[q]. We are interested in ch [ℓ] for ℓ ∈ {0, 1, . . . , 2B −1}
but from the maximum likelihood problem we can only infer ch [ℓ] for ℓ ∈
{0, 2, . . . , 2(B − 1)}. Therefore, the problem is ill-posed.
Assuming that ch [ℓ] is a band-limited sequence with maximum frequency
fmax ≤ 0.25, the condition of the sampling theorem [170] is satisfied. In
principle, it would be possible to reconstruct ch [ℓ] for odd values of ℓ from its
samples for even ℓ, if the decimated sequence of infinite length was available.
We would only have to ensure that the interpolated sequence remains positive
semidefinite.
This problem is referred to as “missing data”, “gapped data”, or “missing
observations” in the literature, for example, in spectral analysis or ARMA
parameter estimation [203, 180]. Different approaches are possible:
• Interpolate the observations h̃[q] and estimate the auto-covariance se-
quence based on the completed sequence. This method is applied in spec-
tral analysis [203, p. 259]. If we apply a ML approach to the interpolated
observations, we have to model the resulting error sequence which now
also depends on the autocovariance sequence ch [ℓ]. This complicates the
problem, if we do not ignore this effect.
• Apply the SAGE algorithm from Section 3.3 with the assumption that
only even samples for the first 2B samples of h̃[q − ℓ] are given. Modeling
the random sequence as periodic with period B̄ ≥ 4B − 1 and a HDS
given by the full period (including the samples for odd ℓ), we can proceed
iteratively. We only require a good initialization of ch [ℓ] which is difficult to
obtain because more than half of the samples in one period B̄ are missing.
92 3 Estimation of Channel and Noise Covariance Matrices
Given ch [ℓ] for the periodic pattern ℓ = 2i with i ∈ {0, 1, . . . , B − 1}, find
a completion of this sequence for ℓ ∈ {1, 3, . . . , 2B − 1} which satisfies the
following two conditions on ch [ℓ], ℓ = 0, 1, . . . , 2B − 1:
1. It has a positive semidefinite extension on Z, i.e., its Fourier transform is
real and positive.
2. An extension on Z exists which is band-limited with maximum frequency
fmax ≤ 0.25.19
This corresponds to interpolation of the given decimated sequence ch [ℓ] at
ℓ ∈ {1, 3, . . . , 2B − 1}.
The required background to solve this problem is given in Appendix B.
The characterization of the corresponding generalized band-limited trigono-
metric moment problem in Theorem B.4 gives the necessary and sufficient
conditions required for the existence of a solution: The Toeplitz matri-
′
ces C̄ T with first row [ch [0], ch [2], . . . , ch [2(B − 1)]] and C̄ T with first row
′ ′ ′ ′ ′
[ch [0], ch [2], . . . , ch [2B − 4]] for ch [2i] = ch [2(i − 1)] − 2 cos(2πfmax )ch [2i] +
′
ch [2(i + 1)] and fmax = 2fmax are required to be positive semidefinite. Gen-
erally, neither of both matrices is singular and the band-limited positive
semidefinite extension is not unique.
First, we assume that both conditions are satisfied as ch [2i] is taken from
a autocovariance sequence band-limited to fmax . We are only interested
in the completion on the interval {0, 1, . . . , 2B − 1}. This corresponds to
19
The Fourier transform of the extension is non-zero only on the interval
[−2πfmax , 2πfmax ].
3.4 Completion of Partial Band-Limited Autocovariance Sequences 93
with the expression for the MMSE parameterized in the unknown chT h . For
this uncertainty set, the minimax MSE solution is equivalent to the minimum
(weighted) norm completion of the covariance matrix C T
According to Example B.2 in Section B.2 the trivial solution chT h = 0B×1 is
avoided if fmax < 0.25.
Problem (3.104) can be easily transformed to an equivalent formulation
involving a linear cost function and constraints describing symmetric cones
Fig. 3.3 Relative root mean square (RMS) error (3.106) of completed partial band-
limited autocovariance sequence: Minimum norm completion with fixed fmax = 0.25
and true fmax (adaptive).
for the completion ĉh [ℓ] is shown in Figure 3.3. For fmax ≤ 0.1, the relative
RMS error is below 10−3 . As the performance of both versions is very simi-
lar, the completion based on a fixed fmax = 0.25 should be preferred: It does
not require estimation of fmax , which is often also based on the (decimated)
autocovariance sequence ch [2i] [211]. Obviously, constraining the partial au-
tocovariance sequence to have a band-limited extension is a rather accurate
characterization. Furthermore, its interpolation accuracy shows that most of
the information on ch [ℓ] for ℓ ∈ {0, 1, . . . , 2B − 1} is concentrated in this
interval.
The MSE for prediction of h[q] based on the completion with true fmax
(adaptive) and fixed fmax = 0.25 in Figure 3.4 prove the effectiveness of our
approach. The upper bound given by the worst case MSE (3.103) is only
shown for adaptive fmax ; for fixed fmax = 0.25 it is almost identical to the
actual MSE given in Figure 3.4. Both upper bounds are rather tight w.r.t.
the MSE for perfect knowledge of {ch [ℓ]}2B−1
ℓ=0 . ⊓
⊔
3.4 Completion of Partial Band-Limited Autocovariance Sequences 95
Fig. 3.4 MSE E[|εQ [q]|2 ] for minimax MSE prediction with filter order Q = B = 5
based on the completed partial band-limited autocovariance sequence.
If the true ch [ℓ] for ℓ ∈ {0, 2, . . . , 2(B − 1)} is not available, we propose the
following two-stage procedure:
• Estimate ch [ℓ] for ℓ ∈ {0, 2, . . . , 2(B − 1)} based on the SAGE algorithm
introduced in Section 3.3.5.
• Solve the optimization problem (3.102) or (3.104) for ĉh [ℓ], ℓ ∈
{0, 2, . . . , 2(B−1)}, to construct an estimate for ch [ℓ] for ℓ ∈ {0, 1, . . . , 2B−
1}, which results in a positive semidefinite Toeplitz matrix with first row
ĉh [ℓ], ℓ ∈ {0, 1, . . . , 2B − 1}. If the completion is based on the side informa-
tion fmax ≤ 0.25, i.e., we choose fmax = 0.25, a solution to (3.104) always
exists (Theorem B.4).
But if we assume an fmax < 0.25, a completion of ĉh [ℓ] for ℓ ∈
{0, 2, . . . , 2(B − 1)} such that it has positive definite and band-limited
extension on Z may not exist. In this case we first project ĉ =
[ĉh [0], ĉh [2], . . . , ĉh [2(B − 1)]]T on the space of positive semidefinite se-
quences band-limited to fmax solving the optimization problem
′
min kc − ĉk2 s.t. C̄ T 0, C̄ T 0 (3.107)
c
with minimizer ĉLS . The constraints result from Theorem B.4, where
′
C̄ T is Toeplitz with first row [ĉh [0], ĉh [2], . . . , ĉh [2(B − 1)] and C̄ T is
Toeplitz with first row [ĉ′h [0], ĉ′h [2], . . . , ĉ′h [2B − 4] for ĉ′h [2i] = ĉh [2(i −
′ ′
1)] − 2 cos(2πfmax )ĉh [2i] + ĉh [2(i + 1)] and fmax = 2fmax . This problem
can be transformed into a cone program with second order and positive
semidefinite cone constraints.
96 3 Estimation of Channel and Noise Covariance Matrices
In Section 3.3 we propose the SAGE algorithm to solve the ML problem (3.37)
for estimation of C h and C n,s (Model 2) iteratively. Of course, this ML
problem inherently yields an estimate of C y taking into account its linear
structure, i.e., C y is restricted to the set
S = C y |C y = SC h S H + I N ⊗ C n,s , C h 0, C n,s 0 (3.108)
H −1
H ⊥
= C y |C y = S C h + (S S) (I N ⊗ C n,s ) S + P (I N ⊗ C n,s ),
C h 0, C n,s 0} . (3.109)
20
Introducing a convex uncertainty set for ĉh [ℓ] would turn it into a robust minimax
optimization problem again.
3.5 Least-Squares Approaches 97
Sext2 = {C y |C y 0} = SM N
+,0 . (3.113)
with N̄ = N − K(L + 1) and Y [q] = [y[q, 1], y[q, 2], . . . , y[q, N ]] (y[q, n]
defined equivalently to (3.18)). The corresponding weighted LS approxi-
mation of these parameters reads (Step 2)
√ −1
The weighting matrix is W = B C̃ y [133] based on the consistent esti-
mate C̃ y of C y .
All weighted LS problems resulting from the EXIP yield estimates of C h
and C n,s (or cn ) which are linear in C̃ ĥ , C̃ n,s , or C̃ y . As this approach is only
asymptotically equivalent to ML, the estimates may be indefinite for small
B. Moreover, for small B the weighting matrices are not defined because
the corresponding matrices are not invertible. For example, C̃ y is invertible
only if B ≥ M N . But M N , e.g., M = 4 and N = 32, can be larger than
the maximum number of statistically independent observations y[q] available
within the stationarity time of the channel parameters h[q].
Therefore, we consider unweighted LS approximation with W 1 = I P ,
W 2 = I M , w2 = 1, and W = I M N in the sequel. First, we derive solu-
tions for LS approximations assuming C n,s = cn I M , which we generalize in
Section 3.5.4.
B
Heur 1 X
Ĉ h = S † C̃ y S †,H = ĥ[q]ĥ[q]H . (3.121)
B q=1
Heur
Implicitly the noise variance is assumed small and neglected. Therefore, Ĉ h
is biased with bias given by the last term of
h Heur i
E Ĉ h = C h + cn (S H S)−1 . (3.123)
100 3 Estimation of Channel and Noise Covariance Matrices
The LS approximation problem (3.116) derived from the EXIP based on Sext1
(Step 2) is
LS
(Ĉ h , ĉLS
n )= argmin kC̃ ĥ − C h − (S H S)−1 cn k2F + (c̃n − cn )2 . (3.124)
C h ∈CP ×P ,cn ∈R
1
ĉLS
n = trace C̃ y − P C̃ y P
M N̄
1 (3.125)
= trace P ⊥ C̃ y P ⊥
M N̄
LS
Ĉ h = S † C̃ y S †,H − (S H S)−1 ĉLS
n
kC̃ y − SC h S H − cn I M N k2F =
kP C̃ y P − SC h S H − cn P k2F + kP ⊥ C̃ y P ⊥ − cn P ⊥ k2F . (3.127)
yields ĉLS
n .
24 Heur
Note that the SAGE algorithm (Section 3.3.4) provides Ĉ h in the first iteration
(0)
when initialized with cn = 0.
25
Note that kA + Bk2F = kAk2F + kBk2F for two matrices A and B with inner
product tr[AH B] = 0 (“Theorem of Pythagoras”).
3.5 Least-Squares Approaches 101
LS
Both LS problems (3.124) and (3.126) result in the same estimates Ĉ h
Heur LS
and ĉLS
n . The bias of the heuristic (3.123) is removed in Ĉ h
estimator Ĉ h
LS
[41], which is now unbiased. The estimate Ĉ h is
indefinite with a non-zero
probability which is mainly determined by C h , B, N , and cn . Depending on
the application, this can lead to a substantial performance degradation.
LS LS
On the other hand, if Ĉ h is positive semidefinite, Ĉ h and ĉLS n are the
LS LS
ML estimates: Ĉ h and ĉn are the unique positive semidefinite solutions
to the likelihood equation (cf. (3.4)). Here, the likelihood equation is C̃ y =
SC h S H +cn I M N and the approximation error in (3.126) is zero in this case.26
psd2
(Ĉ h , ĉpsd2
n ) = argmin kC̃ y − SC h S H − cn I M N k2F s.t. C h 0. (3.129)
C h ,cn
S ′H C̃ y S ′ = V ΣV H (3.130)
X = (S H S)1/2 C h (S H S)1/2 = U x D x U H
x (3.131)
26
In [60] the solution (3.125) is also obtained as a special case of the iterative solution
of (3.110) with the “expectation-conditional maximization either algorithm”.
102 3 Estimation of Channel and Noise Covariance Matrices
H H
kC̃ y − SC h S H − cn I M N k2F = kS ′ (S ′ C̃ y S ′ − X − cn I P )S ′ k2F +
+ kP ⊥ C̃ y P ⊥ − cn P ⊥ k2F
H
= kS ′ C̃ y S ′ − U x D x U H 2 ⊥
x − cn I P kF + kP C̃ y P
⊥
− cn P ⊥ k2F . (3.132)
H
F(U x , {di }P P ′ ′ H 2
i=1 , cn , {mi }i=1 , M ) = kS C̃ y S − U x D x U x − cn I P kF
+ kP ⊥ C̃ y P ⊥ − cn P ⊥ k2F + tr M T (U H ∗ H
x U x − I P ) + tr M (U x U x − I P ) −
P
X
− d i mi . (3.133)
i=1
∂F
= U x M T + U x M ∗ − 2(S ′H C̃ y S ′ − cn I P )U x D x = 0P ×P . (3.134)
∂U ∗x
Completing the squares in (3.136) and omitting the constant, the optimiza-
tion problem (3.129) is
P
X 2
min (σi − di − cn )2 + M N̄ cn − ĉLS
n s.t. di ≥ 0. (3.137)
{di }M K
i=1 ,cn i=1
27
The SAGE algorithm for solving the ML problem (3.110) on S also exploits the
noise in the signal space to improve its estimate of cn (Section 3.3.2).
104 3 Estimation of Channel and Noise Covariance Matrices
psd1
(Ĉ h , ĉpsd1
n ) = argmin kC̃ ĥ − C h − (S H S)−1 cn k2F + (c̃n − cn )2
C h ,cn (3.140)
s.t. C h 0.
with the set Z of indices i for which di = 0. For the same set Z, this estimate
is closer to ĉLS
n than ĉn
psd2
because ĉpsd2
n ≤ ĉpsd1
n ≤ ĉLS LS
n . If ĉn is already
accurate, i.e., for sufficiently large B and N̄ , the bias increases the MSE
w.r.t. cn ; but for small N̄ and B the bias reduces the estimation error (this
is not observed for the selected scenarios in Section 3.6).
A Heuristic Modification
Because ĉpsd1
n and ĉpsd2
n are biased and the their estimation error is typically
larger than for the unbiased estimate ĉLSn , the following heuristic achieves
better result, when applied to MMSE channel estimation:
1. Estimate cn using ĉLS
n (3.125), which is unbiased and based on the true
noise subspace.
2. Solve optimization problem (3.141)
P
X
min (σi − di − ĉLS
n )
2
s.t. di ≥ 0, (3.143)
{di }M K
i=1 i=1
psd3
Ĉ h = V D+ V H , (3.144)
Computational Complexity
The additional complexity for computing the positive definite solution com-
H
pared to (3.125) results from the EVD of S ′ C̃ y S ′ and computation of
H 1/2
(S S) and
(S H S)−1/2 (only for Algorithm 3.1). For the indefinite least-squares esti-
mate (3.125), a tracking-algorithm of very low-complexity was presented by
[141], whereas tracking of eigenvalues and eigenvectors is more difficult and
complex.
As before the first term is zero choosing C h from the space of Hermitian
P -dimensional matrices as
LS LS
Ĉ h = S † (C̃ y − I N ⊗ Ĉ n,S )S †,H (3.147)
LS LS
given an estimate Ĉ n,S as described below. The estimate Ĉ h is unbiased,
but indefinite in general. Now, we can minimize the second term in (3.146)
N
LS 1 X
Ĉ n,s = (eT ⊗ I M )P ⊥ C̃ y P ⊥ (en ⊗ I M ), (3.149)
N − K n=1 n
n̂[q, n] = (eT ⊥
n ⊗ I M )P y[q] = y[q, n] − Ĥ[q]s[n] (3.150)
This leads to
B N B
LS 1 XX 1 X ⊥
Ĉ n,s = H
n̂[q, n]n̂[q, n] = Y [q]P̄ Y [q]H (3.152)
B N̄ q∈O n=1 B N̄ q∈O
as in (3.64) and (3.114). This estimate of the noise covariance matrix is very
similar to the well-known ML estimate [158] of the noise covariance matrix,
when estimated jointly with the channel h[q] (cf. (2.32)). The difference is in
the scaling by N̄ = N − K instead of N which yields an unbiased estimate
with smaller estimation error in this case.
Optimization problem (3.115) yields the same solution. Again both prob-
lems differ if we add a positive semidefinite constraint on C h .
Here a simple positive semidefinite solution of C h can be obtained similar
to the heuristic introduced at the end of Section 3.5.3: A priori we choose
LS
Ĉ n,s as the estimate of C n,s . Then, for (3.115) with positive semidefinite
LS
constraint Ĉ h is given by the positive semidefinite part of Ĉ h (3.147), i.e.,
setting the negative eigenvalues to zero.
The estimators for channel and noise covariance matrices which we present
in Sections 3.3 and 3.5 differ significantly regarding their computational com-
plexity. But how much do they differ in estimation quality?
Another important question is, how many observations (measured by B
and I) are necessary to obtain sufficient MMSE channel estimation and pre-
diction quality and whether this number is available for a typical stationarity
time of the wireless channel.
Estimation quality is often measured by the mean square error (MSE)
of the estimates. The MSE does not reflect the requirements of a partic-
ular application on the estimation accuracy. Therefore, we apply the esti-
108 3 Estimation of Channel and Noise Covariance Matrices
28
Given ML estimates of the covariance matrices, it is the ML estimate of the MMSE
channel estimator according to the ML invariance principle.
29
This degree of ill-conditioning does not lead to numerical problems on a typical
floating point architecture.
110 3 Estimation of Channel and Noise Covariance Matrices
Fig. 3.5 Normalized MSE of noise variance cn vs. B for Scenario A (cn = 1).
nel estimator for B ≥ 50. For large B, SAGE outperforms the heuristic due
to the reduced or removed bias. The MMSE performance of all other esti-
mators30 (positive semidefinite LS, SAGE with HDS for noise, and ECME)
is almost identical to SAGE. In Figure 3.6 and 3.7 the heuristic sometimes
outperforms SAGE, because its bias serves as loading in the inverse similar
to minimax robust channel estimators (Section 2.4.2). Figure 3.7 shows the
MSE vs. SNR = −10 log(cn ) for B = 100; the degradation in MSE of the
unbiased LS is large (Figure 3.7).
A performance of the MMSE estimators, which is worse than for ML chan-
nel estimation, can be avoided by a (robust) minimax MSE design as pre-
(2)
sented in Section 2.4.2: The set size described by αh (2.90) has to be adapted
to B, C h , and the SNR, but a good adaptation rule is not straightforward.
The number of statistically uncorrelated observations B available depends
on the ratio of stationarity time to coherence time: Channel measurements
show that this ratio is about 100 in an urban/suburban environment [227].31
Consequently, the relevant number of observations B is certainly below 200,
where the heuristic estimator is close to optimum for uncorrelated noise. This
changes for correlated noise.
Scenario B: We choose M = 8, K = 1, and C h for a Laplace angular power
spectrum with spread σ = 5◦ and ϕ̄ = 0◦ . The spatial noise correlations are
C n,s = C i + cn I M , which models white noise with cn = 0.01 additionally to
an interferer with mean direction ϕ̄ = 30◦ , 30◦ uniform angular power spec-
trum, and tr[C i ] = M ci (ci is the variance of the interfering signal). Among
the LS approaches (besides the simple heuristic) only the unbiased and the
30
This results are not shown here.
31
The measure employed by [227] quantifies the time-invariance of the principle
eigenmode of C h . It is not clear yet, whether these results can be generalized to the
whole covariance matrix and to channel estimation.
112 3 Estimation of Channel and Noise Covariance Matrices
Fig. 3.7 Normalized MSE of channel estimates ĥ = W y vs. SNR = −10 log(cn ) for
Scenario A (B = 100).
Fig. 3.8 Normalized MSE of spatial covariance matrix vs. B for correlated noise
(Scenario B, ci = 1, cn = 0.01).
Fig. 3.9 Normalized MSE of spatial covariance matrix vs. SIR = −10 log(ci ) for
correlated noise (Scenario B, B = 100).
Fig. 3.10 Normalized MSE of channel estimates ĥ vs. B for correlated noise (Sce-
nario B, ci = 1).
Fig. 3.11 Normalized MSE of channel estimates ĥ vs. SIR = −10 log(ci ) for corre-
lated noise (Scenario B, B = 100).
pleted. The minimax completion is solved for the second order cone formu-
lation (3.105) with SeDuMi [207].32
The estimation accuracy is measured by the average MSE of univariate
MMSE prediction E[|εQ [q]|2 ] (ch [0] = 1) with estimation error εQ [q] = ĥ[q] −
h[q] and ĥ[q] = wT yT [q] (Section 2.3.1). It is averaged over 1000 estimates.
We choose a prediction order Q = 5, which requires a minimum of B = 6
observations to be able to estimate the necessary Q + 1 = 6 samples of the
autocovariance sequence (for NP = 1). Here, we do not consider the MSE in
32
Because we choose fmax = 0.25 the optimization problem (3.102) has a solution
for all estimates and the projection of the estimates on the set of partial sequences
which have a positive semidefinite band-limited (to fmax = 0.25) extension (3.107) is
not necessary.
3.6 Performance Comparison 115
Fig. 3.12 MSE of channel prediction (Q = 5) vs. maximum frequency fmax (Sce-
nario C, ch [0] = 1, ce = 0.01, B = 20).
Fig. 3.15 MSE of channel prediction (Q = 5) vs. maximum frequency fmax (Sce-
nario D, bandpass power spectral density, ch [0] = 1, ce = 0.01, B = 20).
Fig. 3.16 MSE of channel prediction (Q = 5) vs. maximum frequency fmax with
completion of autocovariance sequence for period NP = 2 of observations (Scenario C,
ch [0] = 1, ce = 0.01, B = 20).
Fig. 3.17 MSE of channel prediction (Q = 5) vs. maximum frequency fmax with
completion of autocovariance sequence for period NP = 2 of observations (Scenario D,
bandpass power spectral density, ch [0] = 1, ce = 0.01, B = 20).
choose the ML channel estimate based on the last observed training sequence
(“ML channel est. (Outdating)”). The lower bound is given by the MSE for
MMSE prediction with perfect knowledge of the covariance matrices.
Scenario E: Due to the high computational complexity for estimation of
C hT with block Toeplitz structure, we choose M = K = 4. The structure of
C hT is C hT = C T ⊗ C h (2.12), i.e., identical temporal correlations for all K
receivers. The block-diagonal C h is based on a Laplace angular power spec-
trum with spread σ = 5◦ and mean azimuth angles ϕ̄ = [−45◦ , −15◦ , 15◦ , 45◦ ]
(normalized to tr[C hT ] = M K). The temporal correlations are described by
Clarke’s power spectral density. We choose I = B = 20, i.e., 400 correlated
observations.
Although the number of observations is large, the performance degrada-
tion of SAGE with block Toeplitz assumption is significant and the gain over
the heuristic biased estimator too small to justify its high computational
complexity (Figure 3.18). But the assumption of C hT with Kronecker struc-
ture (3.98) improves the performance significantly and reduces the complexity
tremendously.
Chapter 4
Linear Precoding with Partial Channel
State Information
Introduction to Chapter
1
Linear precoding is also referred to as beamforming, transmit processing, preequal-
ization (for complete CSI), or sometimes predistortion.
121
122 4 Linear Precoding with Partial Channel State Information
Fig. 4.1 Precoding for the wireless broadcast channel (BC) with multiple transmit
antennas and single antenna receivers.
They can also be distinguished by the channel state information (CSI), which
is assumed to be available at the transmitter (compare [19]): Complete CSI
(C-CSI), partial CSI (P-CSI), and statistical CSI (S-CSI) or channel distri-
bution information (see also Table 4.1 and Section 4.2 for our definitions).
For C-CSI, algorithms have been developed based on SINR, MSE, and infor-
mation rate. The case of P-CSI is mostly addressed for a quantized feedback
[107, 188]. Recently, sum rate is optimized in [31]. Robust optimization based
on MSE, which applies the Bayesian paradigm (Appendix C) for P-CSI, is
proposed in [53, 98, 58]. The standard approach related to SINR for S-CSI
is also surveyed in [19, 185]. A first step towards MSE-based optimization
for S-CSI which is based on receiver beamforming is introduced in [18]. A
similar idea is developed for sum MSE [112, 243] assuming a matched filter
receiver. A more systematic derivation for sum MSE is given in [58], which
include S-CSI as a special case.
A significant number of algorithms is based on the duality of the BC and
the (virtual) multiple access channel (MAC), which is proved for SINR in
[185] and for MSE (C-CSI) in [221, 146, 7].
Outline of Chapter
The system model for the broadcast channel and necessary (logical) training
channels is given in Section 4.1. After the definition of different degrees of
CSI at the transmitter and receiver in Section 4.2, we introduce the perfor-
mance measures for P-CSI in Section 4.3: The goal is to develop MSE-related
and mathematically tractable performance measures; we start from the point
of view of information theory to point out the relevance and deficiencies of
these practical optimization criteria. Our focus is the conceptual difference
between MSE and information rate; moreover, we develop approximations of
the BC which result in simple and suboptimum performance measures. We
also discuss adaptive linear precoding when only a common training chan-
nel is available, i.e., the receivers have incomplete CSI, which is important
4.1 System Model for the Broadcast Channel 123
Fig. 4.2 Time division duplex slot structure with alternating allocation of forward
link (FL) and reverse link (RL) time slots.
in practice. Algorithms for minimizing the sum MSE are introduced in Sec-
tion 4.4. They can be generalized to nonlinear precoding, for example. Two
dualities between the BC and the (virtual) MAC for P-CSI are developed in
Section 4.5. In Section 4.6, we briefly introduce optimization of linear pre-
coding under quality of service constraints, in particular, for systems with a
common training channel.
Fig. 4.3 Forward link system model: Linear precoding with M transmit antennas to
K non-cooperative single-antenna receivers.
Fig. 4.4 System model for the forward link in matrix-vector notation.
where hk ∈ CM contains the channel coefficients to the kth receiver (in the
complex baseband representation). The receive signal of the kth receiver for
the nth data symbol (n ∈ {Nt + 1, . . . , Nt + Nd }) is
K
X
rk [q tx , n] = hk [q tx ]T pk dk [q tx , n] + hk [q tx ]T pi di [q tx , n] + nk [q tx , n] (4.2)
i=1
i6=k
K
X
rk = hT T
k pk dk + hk pi di + nk
i=1 (4.3)
i6=k
= hT
k P d + nk .
4.1 System Model for the Broadcast Channel 125
d̂k = gk rk . (4.4)
d̂ = Gr = GHP d + Gn ∈ CK (4.5)
respectively.
The precoder P is designed subject to a total power constraint
E[kP d k22 ] = tr P P H ≤ Ptx . (4.6)
Fig. 4.6 Model of forward link training channel to receiver k for orthogonal training
sequences.
(Figure 4.5). For now, we assume one training sequence for each receiver.4
The training sequences are precoded with Q = [q 1 , . . . , q K ] ∈ CM ×K . For
example, Q can be chosen equal to P . The power constraint for the training
channel is
kQk2F ≤ Ptx
t
. (4.7)
at all receivers is
The noise random process is identical to the forward link data channel in (4.2)
and (4.5).
Omitting the current slot index q tx as above the signal at the kth receiver
reads
yf,k [n] = hT
k Qsf [n] + nk [n]. (4.9)
yT T T Nt
f,k = hk QS̄ f + nf,k ∈ C . (4.10)
H
Assuming orthogonal training sequences S̄ f S̄ f = Nt I K the receivers’ ML
channel estimate is (based on the corresponding model in Figure 4.6)
\T q = sH y T
(4.11)
hk k f,k f,k /Nt = hk q k + ek
4
Due to constraints in a standardized system, the number of training sequences may
be limited (compare Section 4.2.2).
4.1 System Model for the Broadcast Channel 127
Fig. 4.7 Observations of training sequences from Q previous reverse link slots.
with S = [s[1], s[2], . . . , s[N ]]T ⊗ I M (as in Section 2.1) and h[q] =
vec H[q]T , where the channels hk [q] of all K terminals are stacked above
each other.
Due to our assumption of time slots which alternate between forward and
reverse link (Figure 4.2) at a period of NP = 2, observations y[q] from the
reverse link are available for q ∈ {q tx − 1, q tx − 3, . . . , q tx − (2Q − 1)} (for
some Q). All available observations about the forward link channel H[q tx ] at
time index q tx are summarized in
y T [q tx ] = S T hT [q tx ] + v T [q tx ] ∈ CQM N (4.14)
y T = S T hT + v T ∈ CQM N . (4.15)
Table 4.1 Categories of channel state information (CSI) at the transmitter and their
definitions.
4.2.1 Transmitter
5
The assumption of a TDD system and reciprocity simplifies the derivations below.
In principal, the derivations are also possible for frequency division duplex system
with a (quantized) feedback of the CSI available at the receivers to the transmitter.
4.2 Channel State Information at the Transmitter and Receivers 129
4.2.2 Receivers
The kth receiver has only access to P and H via the received training
sequence in the forward link y f,k (4.10) and the observations of the data
rk (4.3). It is obvious that the variance crk of the received data signal can be
estimated directly as well as the overall channel hT k q k (4.11), which consists
of the physical channel to receiver k and the precoder for the kth training
sequence. With the methods from Chapter 3 also the sum of the noise and
interference variance is available.
Therefore, the receivers do not have access to P and H separately, but
only to a projection of the corresponding channel hk on a precoding vector
of the training channel q k . If the precoding for the data and training sig-
nals are identical, i.e., Q = P , we expect to have all relevant information
at a receiver for its signal processing, e.g., an MMSE receiver gk or ML de-
coding (depending on the conceptual framework in Section 4.3).8 But these
realizability constraints have to be checked when optimizing the receivers.
There are standardized systems, where the number of training sequences
S is smaller than the number of receivers. For example, in WCDMA of the
UMTS standard [1] different training (pilot) channels are available (see the
detailed discussion in [152, 167]):
6
The assumption about the noise is justified by the TDD model; in case of a low
rate quantized feedback it is not valid anymore.
7
In cellular systems with interference from adjacent cells which is not a stationary
process, this issue requires further care.
8
It may be inaccurate due to the finite number Nt of training symbols.
130 4 Linear Precoding with Partial Channel State Information
ik nk
hT
k pk gk
rk
dk d̂k
Fig. 4.8 Link model for kth receiver in the broadcast channel (BC) (equivalent to
Figure 4.3).
Figure 4.8 gives the link model to the kth receiver, which is equivalent to the
broadcast channel in (4.3) and Figure 4.3. The interference from the data
{di }i6=k designated to the other receivers is summarized in
K
X
ik = hT
k pi di , (4.16)
i=1
i6=k
9
For example, the MMSE determines also the bit error rate for QAM modulation
[160].
132 4 Linear Precoding with Partial Channel State Information
K
X
cik = |hT 2
k pi | . (4.17)
i=1
i6=k
with
|hT
k pk |
2
SINRk (P ; hk ) = K
. (4.19)
P
|hT
k pi |
2 + cnk
i=1
i6=k
in terms of
10
The Gaussian probability distribution is capacity achieving in the broadcast chan-
nel with C-CSI at the transmitter, which requires dirty paper coding [232]. Under the
restriction of P-CSI it is not known whether this distribution is capacity achieving.
4.3 Performance Measures for Partial Channel State Information 133
K
!1/K K
Y X .
MMSEk (P ; hk ) ≤ MMSEk (P ; hk ) K. (4.23)
k=1 k=1
The minimum of
MSEk (P , gk ; hk ) = Erk ,dk |gk rk − dk |2 = 1 + |gk |2 crk − 2Re gk hT
k pk
K
X
= 1 + |gk |2 |hT 2 2 T
k pi | + |gk | cnk − 2Re gk hk pk (4.24)
i=1
gk = gmmse
k (P , hk ) = (hT ∗
k pk ) /crk , (4.25)
|hT
k pk |
2
MMSEk (P ; hk ) = min MSEk (P , gk ; hk ) = 1 − . (4.26)
gk crk
Example 4.1. An algorithm which maximizes (4.20) directly for C-CSI (see
also (4.28)) is given in [59].12 The comparison in [183] shows that other algo-
rithms perform similarly. In Figure 4.9 we compare the sum rate obtained by
[59] (“Max. sum rate”) with the solution for minimizing the sum MSE, i.e.,
maximizing the lower bound (4.21), given by [102]. As reference we give the
sum rate for time division multiple access (TDMA) with maximum through-
put (TP) scheduling where a matched filter pk = h∗k /khk k22 is applied for the
selected receiver; for M = 1, this strategy maximizes sum rate. Fairness is
not an issue in all schemes.
The results show that the sum MSE yields a performance close to max-
imizing the sum rate if K/M < 1. For K = M = 8, the difference is sub-
stantial: Minimizing the MSE does not yield the full multiplexing gain [106]
limPtx →∞ Rsum (P ; h)/ log2[Ptx ] = min(M, K) at high SNR.
The gain of both linear precoders compared to TDMA (with almost iden-
tical rate for K = 8 and K = 6), which schedules only one receiver at a time,
is substantial. ⊓
⊔
Fig. 4.9 Mean sum rate for linear precoding with C-CSI: Comparison of direct max-
imization of sum rate [59], minimization of sum MSE (4.69), and TDMA with maxi-
mum throughput (TP) scheduling (Scenario 1 with M = 8, cf. Section 4.4.4).
ph|yT (h|y T ) (Table 4.1) is available for designing P . There are two standard
approaches for dealing with this uncertainty in CSI:
• The description of the link by an outage rate and its outage probability,
which aims at the transmission of delay sensitive data.
• The mean or ergodic information rate with the expectation of (4.18) w.r.t.
the probability distribution of the unknown channel h and the observations
yT . Feasibility of this information rate requires a code long enough to
include many statistically independent realizations of h and yT .
Here, we aim at the mean information rate subject to an instantaneous power
constraint on every realization of P(y T ). We maximize the sum of the mean
information rates to all receivers
max Eh,yT [Rsum (P(yT ); h)] s.t. kP(y T )k2F ≤ Ptx . (4.27)
P
P : CM N Q → CM ×K , P = P(y T ).
Thus, Ehk |yT [MMSEk (P ; hk )] gives an inner bound on the mean rate re-
gion, which is certainly achievable, whereas the upper bound based on
Ehk |yT [SINRk (P ; hk )] does not guarantee achievability. With these bounds,
the mean sum rate is lower bounded by the mean sum of receivers’ MMSEs
and upper bounded by the sum rate corresponding to mean SINRs:
" K
#
X
K log2[K] − K log2 Ehk |yT [MMSEk (P ; hk )]
k=1
≤ Eh|yT [Rsum (P ; h)]
K
X
≤ log2 1 + Ehk |yT [SINRk (P ; hk )] . (4.30)
k=1
But the lower as well as the upper bound cannot be computed analytically
either, because MMSEk and SINRk are ratios of quadratic forms in complex
Gaussian random vectors hk and no explicit expression for their first order
moment has been found yet [161].13
The lower bound, although it may be loose, yields an information theoretic
justification for minimizing the conditional mean estimate of the sum MMSE
K
X
min Ehk |yT [MMSEk (P ; hk )] s.t. kP k2F ≤ Ptx . (4.31)
P
k=1
13
The probability distribution of the SINR [139] suggests that an analytical expres-
sion would not be tractable.
136 4 Linear Precoding with Partial Channel State Information
Fig. 4.10 Simplified AWGN fading BC model for the link to receiver k in Figure 4.8.
• Gaussian code books (coded data) of infinite length with rate equal to
Ehk ,yT [Rk (P(yT ); hk )],
• many statistically independent realizations of h and yT over the code
length,
• knowledge of hT k pk and cik + cnk at receiver k,
• receivers with ML decoding (or decoding based on typicality).
The conditional mean estimate Ehk |yT [MMSEk (P ; hk )] has a close relation to
the mean information rate. Thinking in the framework of estimation theory
(and not information theory), it assumes only LMMSE receivers.
It is interesting that the upper bound depends on the widely employed SINR
definition14 (e.g. [19, 185])
Ehk |yT |hkT pk |2 pH ∗
k Rhk |y T pk
SINRk (P ) = K = , (4.36)
P T
2
c̄ik + cnk
Ehk |yT |hk pi | + cnk
i=1
i6=k
where the expected value is taken in the numerator and denominator of (4.19)
independently.15
For S-CSI with µh = 0, we have Rayleigh fading channels to every receiver.
Their mean information rate can be computed explicitly16
Ehk |yT RFAWGN
k (P ; hk ) = E1 SINRk (P )−1 exp SINRk (P )−1 log2[e] .
Again the mean rate is determined by SINRk .17 The dependency on only
SINRk has its origin in our Gaussian model for the interference, which is un-
14
Typically, no reasoning is given in the literature about the relevance of this rather
intuitive performance measure for describing the broadcast channel in Figure 4.8.
15
This corresponds to the approximation of the first order moment for the SINR
based on [161], which is accurate for P-CSI close to C-CSI (small error covariance
matrix C h|yT ).
16
The exponential integral is defined as [2]
Z ∞ exp(−t)
E1 (x) = dt. (4.37)
x t
17
Similarly, under the same assumption, the mean MMSE can be computed explicitly
138 4 Linear Precoding with Partial Channel State Information
Fig. 4.11 Simplified AWGN BC model for the link to receiver k in Figure 4.8.
correlated from the received data signal. But due to the exponential integral
the explicit expression of the mean rate for S-CSI is not very appealing for
optimization.
The minimum MMSEFAWGN k (P ; hk ) (4.35) relies on the statistical signal
model
dk ∗ ∗ 1 (hT k pk )∗
Edk ,rk dk rk = T 0 (4.38)
rk hk pk |hT 2
k pk | + c̄ik + cnk
as a function of 1/SINRk (P ).
18
Similar approximations are made for system level considerations in a commu-
nication network. In [112, 243] a similar idea is pursued for optimization of linear
precoding for rank-one covariance matrices C hk .
4.3 Performance Measures for Partial Channel State Information 139
with a relation between MMSEk and SINRk in analogy to (4.22) for C-CSI
at the transmitter.
Back at the information theoretic point of view, maximization of the sum
rate for the AWGN BC model
K
X AWGN
max Rk (P ) s.t. kP k2F ≤ Ptx (4.44)
P
k=1
19
For example, it is not convex in pk keeping pi , i 6= k, constant.
140 4 Linear Precoding with Partial Channel State Information
Fig. 4.12 Bounds on mean sum rate evaluated for the solution of (4.46) based on S-
CSI with the algorithm from Section 4.4.1: Upper bound based on mean SINR (4.30),
lower bound with mean sum MMSE (4.30), and the sum rate of the AWGN BC
model (4.44) (Scenario 1, M = K = 8 and σ = 5◦ ).
The relation of the information rate for the AWGN BC model to the
rate for the original model in Figure 4.8 is unclear. Numerical evaluation of
PK AWGN
the sum rate shows that it is k=1 Rk (P ) typically an upper bound to
Eh,yT [Rsum (P(yT ); h)] (4.27).
Example 4.2. In Figure 4.12 we evaluate the different bounds on the mean
sum rate (4.20) numerically for Scenario 1 defined in Section 4.4.4.20 As
precoder P we choose the solution given in Section 4.4.1 to problem (4.46)
for S-CSI. The sum rate for mean SINR (4.30) (upper bound) and the rate for
mean sum MMSE (4.30) (lower bound) are a good approximation for small
SNR. The sum rate of the AWGN BC model (4.44) is also close to the true
mean sum rate for small SNR, but the good approximation quality at high
SNR depends on the selected P . ⊓
⊔
The previous two models result from a Gaussian approximation of the inter-
ference and data signal. But there is also the possibility of assuming a more
limited CSI at the receiver or receivers with limited signal processing capabil-
ities. Following this concept, we construct an alternative approach to obtain
a solution of (4.46) (cf. Section 4.4). It can be viewed as a mere mathematical
trick, which is motivated by a relevant receiver. Let us start presenting some
intuitive ideas.
20
The bounds for the AWGN fading BC are not shown, because they are of minor
importance in the sequel.
4.3 Performance Measures for Partial Channel State Information 141
(hT
k pk )
∗
gk = gcor
k (P , hk ) = . (4.47)
c̄rk
maximizing the real part of the crosscorrelation between d̂k and dk with a
regularization of the norm of gk . This is equivalent to minimizing
CORk (P , gk ; hk ) = 1 + |gk |2 c̄rk − 2Re gk hT
k pk , (4.49)
which yields
|hT
k pk |
2
MCORk (P ; hk ) = min CORk (P , gk ; hk ) = 1 − . (4.50)
gk c̄rk
21
It is realizable under the conditions of Section 4.2.2.
22
It is negative whenever
1 (hT ∗
» –
k pk ) (4.51)
T
hk pk c̄rk
ceiver cost.23 But this argument is of limited interest, because the relevance
of CORk (P , gk ; hk ) for a communication link remains unclear: Only for large
Ptx and C hk with rank one, it yields the identical cost as the MMSE receiver.
K
X
Ehk |yf,k [MSEk (P , gk ; hk )] = 1 + |gk |2 pH ∗ 2
i Rhk |y f,k pi + |gk | cnk
i=1
T
− 2Re gk ĥf,k pk , (4.53)
H
where ĥf,k = Ehk |yf,k [hk ] and Rhk |yf,k = ĥf,k ĥf,k + C hk |yf,k . The minimum
cost which can be achieved by the receiver
T
(ĥf,k pk )∗
gk = P T
(4.54)
K
i=1 (|ĥf,k pi |
2 + pH ∗
i C hk |y f,k pi ) + cnk
is
23
Compared to the implicit assumption of an LMMSE receiver (4.25) in (4.31), less
information about rk is exploited by (4.47)
4.3 Performance Measures for Partial Channel State Information 143
MMSEicsi
k (P ; y f,k ) = min Ehk |y f,k [MSEk (P , gk ; hk )]
gk
1
= . (4.55)
1+ SINRicsi
k (P ; y f,k )
(4.56)
now includes the error covariance matrix C hk |yf,k for the receiver’s CSI in
hk .24
Because the transmitter knows the distribution of yf,k (or ĥf,k ), it can
compute the mean of MMSEicsi k (P ; yf,k ) w.r.t. this distribution. As before,
an explicit expression for the conditional expectation does not exist and an
approximation similar to the AWGN BC (Figure 4.11) or based on a scaled
MF receiver can be found. The latter is equivalent to taking the conditional
expectation of the numerator and denominator in SINRicsi k separately.
144 4 Linear Precoding with Partial Channel State Information
yf,k = hT
k qs . (4.57)
Given yf,k , the conditional mean estimate of hk and its error covariance ma-
trix required for the cost function (4.55) are
ĥf,k = µhk + C hk q ∗s (q T ∗ −1
s C hk q s ) hT T
k q s − µhk q s (4.58)
C hk q ∗s q T
s C hk
C hk |yf,k = C hk − . (4.59)
qT
s C ∗
hk q s
24
Alternatively, we could also consider an LMMSE receiver based on a ML channel
estimate (4.11) with zero-mean estimation error and given error variance.
25
Note that Eyf,k [Rhk |yf,k ] = Rhk .
4.3 Performance Measures for Partial Channel State Information 145
K
X
min CORicsi
k (P ; yf,k ) = min 1 + |gk |
2
pH ∗ 2
i Rhk pi + |gk | cnk
gk gk
i=1
T
− 2Re gk ĥf,k pk . (4.60)
with
icsi
pH ∗
k Rĥ pk
f,k
SINRk (P ) = PK , (4.62)
H ∗ H ∗
i=1 pi Rhk pi − pk Rĥ pk + cnk f,k
C hk q ∗s q T
s C hk
Rĥf,k = µhk µH
hk + . (4.63)
qT
s C ∗
hk q s
For µh = 0, this SINR definition is equivalent to [36]. The general case yields
the rank-two correlation matrix Rĥf,k for the kth receiver’s channel.
Instead of a receiver model with a scaled MF and the conditional mean es-
timate (4.58), we suggest another approach: Choose a receiver model, whose
first part depends only on the available CSI (4.57) and the second component
is a scaling independent of the channel realization hk . We are free to choose
a suitable model for the CSI-dependent component. Motivated by standard-
ization (UMTS FDD mode [1]), the available CSI (4.57) from the common
training channel is only utilized to compensate the phase. This results in the
receiver model
(hkT q s )∗
gk (hk ) = βk . (4.64)
|hkT q s |
The scaling βk models a slow automatic gain control at the receiver, which
is optimized together with the precoder P based on S-CSI. The relevant cost
function at the transmitter is the mean of the MSE (4.24)26 resulting from
this receiver model
26
For rank-one channels and CDMA, a sum MSE criterion is developed in [244] based
on an average signal model. An ad-hoc extension to general channels is presented in
[243, 25].
146 4 Linear Precoding with Partial Channel State Information
K
!
(hkT q s )∗ X
Ehk MSEk P , βk ; hk = 1 + |βk |2 pH ∗
i Rhk pi + cnk
|hkT q s | i=1
T !
(hkT q s )∗
− 2Re βk Ehk hk pk . (4.65)
|hkT q s |
The expectation of the phase compensation and the channel hk can be com-
puted analytically as shown in Appendix D.2.
Minimizing (4.65) w.r.t. βk yields
(h T q s )∗ 1
min Ehk MSEk P , kT βk ; hk = phase
(4.66)
βk |hk q s | 1 + SINRk (P )
with
h iT 2
(hkT q s )∗
Ehk h
|hkT q s | k
pk
phase
SINRk (P ) = 2 . (4.67)
K h iT
P (hkT q s )∗
pH ∗
i Rhk pi − Ehk h
|hkT q s | k
pk + cnk
i=1
It differs from (4.62): The matrix substituting the correlation matrix of the
kth data signal’s channel in the numerator of (4.67) is of rank one (compare
to (4.62) and (4.63)). For µh = 0, we get
T 2
(hkT q s )∗ π T 2
Ehk hk pk = (q C hk q ∗s )−1 q H ∗
s C hk pk
|hkT q s | 4 s
instead of (q T ∗ −1 H ∗
s C hk q s ) |q s C hk pk |2 in (4.62) (cf. Appendix D.2). The differ-
ence is the factor π/4 ≈ 0.79 which results from a larger error in CSI at the
receiver for only a phase correction.
This impact of the receiver model on the transmitter’s SINR-definition is
illustrated further in the following example.
Example 4.3. For a zero-mean channel with rank[C hk ] = 1, the error co-
variance matrix (4.59) for the conditional mean estimator is zero, because
the channel can be estimated perfectly by the receiver.27 Thus, (4.63) con-
verges to Rhk and the transmitter’s SINR definition (4.62) becomes iden-
tical to (4.36). On the other hand, for only a phase compensation at the
receiver, the data signal’s channel is always modeled as rank-one (cf. numer-
phase
ator of (4.67)) and SINRk (P ) is not identical to (4.36) in general. ⊓
⊔
The choice of an optimization criterion for precoding with a common train-
ing channel depends on the detailed requirements of the application, i.e., the
27
Note that the observation in (4.57) is noise-free. Moreover, we assume that q s is not
orthogonal to the eigenvector of C hk which corresponds to the non-zero eigenvalue.
4.3 Performance Measures for Partial Channel State Information 147
most adequate model for the receiver. Moreover, rank-one matrices describ-
ing the desired receiver’s channel as in the numerator of (4.67) reduce the
computational complexity (Section 4.6.2 and [19]).
Above, we show the relevance of MSE for the achievable data rate to every
receiver. In particular, the sum of MSEs for all receivers gives a lower bound
on the sum information rate (or throughput) to all receivers. Although it may
be loose, this establishes the relevance of MSE for a communication system as
an alternative performance measure to SINR. Moreover, defining simplified
models for the BC (Section 4.3.2) we illustrated the relevance and intuition
involved when using the suboptimum performance measure SINRk (4.36)
for P-CSI and S-CSI: On the one hand, it can be seen as a heuristic with
similar mathematical structure as SINRk , on the other hand it describes the
data rates for the fading AWGN and AWGN BC (Section 4.3.2). Numerical
evaluation of the mean sum rate shows that the resulting suboptimum but
analytically more tractable performance measures are good approximations
for small SNR and small K/M < 1 whereas they are more loose in the
interference-limited region (Figure 4.12).
The derived cost functions Ehk |yT [MMSEk (P ; hk )] and MMSEk (P )
(equivalently SINRk (P )) are related to the achievable data rate on the consid-
ered link. But an optimization of precoding based on a linear receiver model
and the MSE is interesting on its own right. Optimization based on the MSE
can be generalized to frequency selective channels with linear finite impulse
response precoding [109, 113], multiple antennas at the receivers [146], and
nonlinear precoding.
As an example, we now consider the sum MSEs (for these definitions).
We compare the different possibilities for formulating optimization problems
and, whenever existing, refer to the proposed algorithms in the literature.
Other important optimization problems are: Minimization of the transmit
power under quality of service (QoS) requirements, which are based on the
MSE expression [146]; Minimization of the maximum MSE (balancing of
MSEs) among all receivers [101, 189, 146]. They are addressed in Section 4.6.
K
X
min MMSEk (P ; hk ) s.t. kP k2F ≤ Ptx . (4.68)
P
k=1
28
Assuming a block-wise constant channel.
4.3 Performance Measures for Partial Channel State Information 149
K
X
min Ehk |yT [MSEk (P , gk ; hk )] s.t. kP k2F ≤ Ptx . (4.71)
P ,{gk }K
k=1 k=1
The relation of MMSEk to CORk and the conditional mean of MCORk (4.52)
can be exploited to develop an algorithm for solving this problem, which we
address together with (4.72) in Section 4.4.
is proposed in [58, 7]. A solution can be given directly with Appendix D.1
(see also [116]).
For the SINR definition (4.62), power minimization is addressed in [36].
Balancing of the MSEs (4.65) and power minimization which is based on MSE
duality and different receivers βk is treated in [7, 8] for phase compensation
150 4 Linear Precoding with Partial Channel State Information
at the receiver. We present a new proof for the MSE duality in Section 4.5.2
and briefly address the power minimization and the minimax MSE problems
(balancing) in Section 4.6.
In Section 4.3 we state that the mean MMSE gives a lower bound on the
mean information rate (4.29) and the mean sum rate can be lower bounded
by the sum of mean MMSEs (4.30). Maximization of this lower bound with
a constraint on the total transmit power reads (compare (4.31) and (4.72))
K
X
min Ehk |yT [MMSEk (P ; hk )] s.t. kP k2F ≤ Ptx . (4.75)
P
k=1
(m−1) (m−1)
Assuming {gk }K
k=1 with gk = gmmse
k (P (m−1) ; hk ) (4.25) for the re-
ceivers given by the previous iteration m − 1, the optimum precoder together
with the common scaling β at the receivers is obtained from
K
X h i
min Ehk |yT MSEk (P , β gmmse
k (P (m−1)
; hk ); h k ) s.t. kP k2F ≤ Ptx .
P ,β
k=1
(4.77)
K
X h i
Ehk |yT MSEk (P , β gmmse
k (P (m−1)
; h k ); h k )
k=1
K
X
2
mmse (m−1)
= Ehk |yT Erk ,dk β gk (P ; hk )rk − dk (4.78)
k=1
which yields
K
" K
!#
X X
2
K+ Ehk |yT |β| |gmmse
k (P (m−1) ; hk )|2 |hkT pi |2 + cnk
k=1 i=1
K
X h i
− Ehk |yT 2Re β gmmse
k (P (m−1) ; hk )hkT pk
k=1
h h i i
2 (m−1)
= K + |β| Ḡ + |β|2 tr P H Eh|yT H H G(m−1)(h)H G(m−1)(h)H P
h i
(m−1)
− 2Re β tr H G P . (4.79)
K
X
(m−1) (m−1)
Ḡ = cnk Gk . (4.81)
k=1
30 (m−1)
If β (m) 6= 1, then gk necessarily changed in the previous iteration leading to
a new P (m) 6= P (m−1) : β (m) ensures that the squared norm of the updated P (m)
equals Ptx . On the other hand, if the algorithm has converged and P (m) = P (m−1) ,
then β (m) = 1.
154 4 Linear Precoding with Partial Channel State Information
The scaling β at the receivers is introduced for the same reason as above.
The cost function is
K
X h i
Ehk |yT CORk (P , β gcor
k (P
(m−1)
; hk ); hk )
k=1
K
" K
!#
X 2 X
2
=K+ Ehk |yT |β| gcor
k (P
(m−1)
; hk ) pH ∗
i Rhk |y T pi + cnk
k=1 i=1
K
X h i
− Ehk |yT 2Re β gcor
k (P (m−1)
; h )h T
k k k p
k=1
" K
! #
X (m−1) ∗
2 (m−1) 2 H
= K + |β| Ḡ + |β| tr P Gk Rhk |yT P
k=1
h h i i
− 2Re β tr Eh|yT G(m−1)(h)H P (4.88)
PK (m−1)
with Ḡ(m−1) = k=1 cnk Gk , [G(m−1)(h)]k,k = gcor
k (P
(m−1)
; hk ), and
4.4 Optimization Based on the Sum Mean Square Error 155
(m−1),H ∗ (m−1)
(m−1)
2 pk Rhk |yT pk
Gk = Ehk |yT [G(m−1)(h)]k,k = (m−1),2
(4.89)
c̄rk
i R (m−1),∗
hk |y T pk
h
Ehk |yT [G(m−1)(h)]k,k hk = (m−1)
. (4.90)
c̄rk
All expectations can be computed analytically for this receiver model. More-
over, only the conditional first and second order moments are required and no
assumption about the pdf ph|yT (h|y T ) is made in contrast to problem (4.75).
The solution to (4.87) is derived in Appendix D.1 comparing (4.88)
with (D.2). It reads
K
!−1
X (m−1) ∗ Ḡ(m−1) h iH
P (m)
=β (m),−1
Gk Rhk |yT + IM Eh|yT G(m−1)(h)H
Ptx
k=1
(4.91)
with β (m) chosen to satisfy the power constraint with equality (cf. (D.13)).
In the second step of iteration m we determine the optimum receive filters
by
(m)
(m) (hT
k pk )
∗
gk = gcor
k (P
(m)
; hk ) = β (m),−1 (m)
(4.93)
c̄rk
(m)
based on the mean
variance of the receive signal c̄rk =
PK (m),H ∗ (m)
i=1 pi Rhk |yT pi + cnk . In fact, we determine the receivers as a
function of the random vector hk , which we require for evaluating (4.88) in
the next iteration.
In every iteration the cost function (4.76) decreases or stays constant:
K
X K
X
MMSEk (P (m) ) ≤ MMSEk (P (m−1) ). (4.94)
k=1 k=1
31
Problem (4.76) with an equality
√ constraint can be written as an unconstrained
optimization problem in P = P̄ Ptx /kP̄ kF . For this equivalent problem, the iter-
ation above can be interpreted as a fixed point solution to the necessary condition
on the gradient w.r.t. P̄ to be zero. The gradient for the kth column p̄k of P̄ reads
(cf. (4.52), (4.43), and (4.36))
156 4 Linear Precoding with Partial Channel State Information
(m−1)
effective channel H G = G(m−1)(h)H, and scaling of P (m)
“ √ ”
PK Ptx
∂ MMSEi P̄ PK K
1 ∗
i=1 kP̄ kF i=1 cni Gi
X
= p̄k + Gi R∗
hi |y T p̄k − R p̄
∂ p̄∗
k Ptx i=1
c̄′rk hk |yT k
′,2
p̄H ∗
and c̄′ri = cni kP̄ k2F /Ptx + K H ∗
P
with Gi = i Rhi |y T p̄i /c̄ri j=1 p̄j Rhi |y T p̄j . The re-
lation to (4.91) is evident. We proved convergence to a fixed point which is a local
minimum of the cost function. In analogy, the same is true for problem (4.75) and
its iterative solution (4.82) (assuming that the derivative and the integration can be
interchanged, i.e., cnk > 0).
32
The explicit solution is given in Appendix D.1.
4.4 Optimization Based on the Sum Mean Square Error 157
−2
(m−1) (m−1),H (m−1) Ḡ(m−1) (m−1),H
tr H G HG HG + Ptx I M HG
(m),2
β = .
Ptx
(4.97)
In [102, 99] it is shown that the global optimum is reached if initialized with
non-zero gmmse
k (P (m−1) ; hk ). An algorithm with faster convergence is derived
in [146].
If yT is statistically independent of h, the solutions from the previous
section depend only on S-CSI. The iterative solution to problem (4.76) reads
K
!−1
X (m−1) ∗ Ḡ(m−1) h iH
P (m) = β (m),−1 Gk C hk + IM EH G(m−1)(h)H
Ptx
k=1
(4.98)
4.4.3 Examples
For insights in the iterative solution to minimizing the sum MSE (4.75)
and (4.76), we consider two cases:
• P-CSI at the transmitter and channels with rank-one covariance matrices
C hk ,
• S-CSI and uncorrelated channels.
Rank-One Channels
hk = xk v k (4.101)
H = XV T (4.102)
h i
Eh|yT H H G(m−1)(h)H G(m−1)(h)H =
XK
2
= v ∗k v T E
k xk |y T g mmse
k (P (m−1)
; x v
k k ) |xk |2
= V ∗ BV T . (4.104)
k=1 | {z }
bk
The relevant parameters for the solution based on the scaled MF re-
ceiver (4.91) are
i v H p(m−1),∗ E 2
xk |y T |xk |
h
cor (m−1) k k
Ehk |yT gk (P ; hk )hk = vk (4.105)
c̄rk
| {z }
ak
and
K
X K
X
(m−1) (m−1)
Gk R∗hk |yT = Gk Exk |yT |xk |2 v ∗k v T ∗
k = V BV
T
(4.106)
k=1 k=1
| {z }
bk
(m−1)
with Gk = Exk |yT [|xk |2 ]|v T 2 2
k pk | /c̄rk , A = diag [a1 , . . . , aK ], and B =
diag [b1 , . . . , bK ].
Therefore, both approaches result in
K
!−1
X Ḡ(m−1)
P (m) = β (m),−1 v ∗k v T
k bk + IM V ∗ A∗ (4.107)
Ptx
k=1
!−1
(m),−1 ∗ T Ḡ(m−1) −1
∗
=β V V V + B B −1 A∗ , (4.108)
Ptx
4.4 Optimization Based on the Sum Mean Square Error 159
where we applied the matrix inversion lemma (A.20) in the second equation.
For large Ptx and K ≤ M , the overall channel is diagonal
−1
G(m) HP = G(m) XV T P (m) = β (m),−1 G(m) XV T V ∗ V T V ∗ B −1 A∗
(4.109)
= β (m),−1 G(m) XB −1 A∗ , (4.110)
Uncorrelated Channels
The other extreme scenario is given by S-CSI with C hk = ch,k I M , i.e., spa-
tially uncorrelated channels with different mean attenuation. Defining a nor-
malized noise variance c′nk = cnk /ch,k and assuming kP k2F = Ptx , the MMSE
for the AWGN BC model (Figure 4.11) reads
Both performance measures depend only on the norm kpk k22 , i.e., the power
allocation to every receiver.
It can be shown that the maximum sum rate is reached serving only the
receiver with smallest c′nk , which yields
K
X AWGN Ptx
Rk = log2 1 + ′ . (4.113)
cnk
k=1
160 4 Linear Precoding with Partial Channel State Information
The choice of pk is only constrained by kpk k22 = Ptx and any vector is equiv-
alent on average, which is clear intuitively.
Minimizing the sum MSE
K K
X X kpk k22
MMSEk = K − (4.114)
Ptx + c′nk
k=1 k=1
yields the same solution as problem (4.112) for unequal c′nk with minimum
K
X Ptx
K −1≤ MMSEk = K − ≤ K. (4.115)
Ptx + c′n
k=1
Note that the sum MSE (4.114) depends only on kP k2F = Ptx for identical
c′nk= cn . Thus, every initialization of the iterative algorithm in Section 4.4.1
is a stationary point. This is not true for the sum rate (4.112), which is max-
imized for selecting only one arbitrary receiver. This observation illustrates
the deficiency of using this lower bound to the sum rate.33
The proposed algorithms, which minimize the sum MSEs in (4.75) and (4.76),
are evaluated for the mean sum rate Eh,yT [Rsum (P(yT ); h)] (4.27) and differ-
ent degrees of CSI (Table 4.1). In Section 5.5 we present a comparison with
nonlinear precoding w.r.t. the mean uncoded bit error rate (BER).
The following linear precoders based on sum MSE are considered here and
in Section 5.5: 34
• For C-CSI, we solve (4.69) assuming MMSE receivers. The solution is given
in [99, 102] and is a special case of Section 4.4.1 assuming C-CSI (“MMSE-
Rx (C-CSI)”). As initialization we use the MMSE precoder from [116] (cf.
Appendix D.1 with G(m−1)(h) = I K ) for identical receivers gk = β and
C-CSI.
• The heuristic in (4.70), where a channel estimate is plugged into the cost
function for C-CSI as if it was error-free (Appendix C), is applied assuming
two different types of CSI: Outdated CSI ĥ = Eh[q′ ]|y[q′ ][h[q ′ ]] with q ′ =
q tx − 1 (O-CSI), i.e., the channel estimate from the last reverse link time
slot is used without prediction, and P-CSI with the predicted channel
ĥ = Eh|yT [h] (“Heur. MMSE-Rx (O-CSI)” and “Heur. MMSE-Rx (P-
CSI)”). It is initialized in analogy to C-CSI but with O-CSI and P-CSI,
respectively.
33
A possible compensation would be to select the active receivers before designing
the precoding P .
34
The names employed in the legends of the figures are given in quotation marks.
4.4 Optimization Based on the Sum Mean Square Error 161
35
No proof of convergence is given in [31].
36
Including the receiver selection proposed in [31] the total algorithm has a numerical
complexity of O(K 2 M 3 ).
162 4 Linear Precoding with Partial Channel State Information
Fig. 4.14 Comparison of maximum sum rate and minimum sum MSE precoding
based on the AWGN BC model with fmax = 0.2 (Scenario 1).
Discussion of Results
Because the lower bound to the sum rate, which depends on the sum MSE,
is generally not tight, we expect a performance loss when minimizing the
sum MSE compared to a direct maximization of the sum rate. For C-CSI,
the loss is larger for increasing K/M and Ptx /cn (Figure 4.12). This is also
true for P-CSI and S-CSI (Figure 4.14). But the gain compared to TDMA is
significant for all methods.
The algorithm for maximizing the sum rate of the AWGN BC model con-
verges slowly and requires 30-40 iterations, i.e., the overall complexity is sig-
nificantly larger than for the sum MSE approaches. But the latter are already
outperformed in the first iterations, because the sum MSE is a lower bound
to the sum rate (4.45) of the AWGN BC (compare with C-CSI in Figure 4.9).
4.4 Optimization Based on the Sum Mean Square Error 163
Fig. 4.15 Mean sum rate vs. Ptx /cn for P-CSI with fmax = 0.2 (Scenario 1).
Fig. 4.16 Mean sum rate vs. fmax at 10 log10 (Ptx /cn ) = 20 dB (Scenario 1).
Still the MSE is often considered as practically more relevant. For example,
when transmitting data with a fixed constellation size, the MSE achieves a
better performance.
With respect to the mean sum rate, the algorithms based on the scaled
matched filter receiver model and the MMSE receiver show almost identical
performance. Due to its simplicity, we only consider the solution based on
the scaled matched filter.
The heuristic optimization of P based on sum MSE and P-CSI only applies
a channel estimate to the cost function for C-CSI. For a maximum Doppler
frequency below fmax = 0.15, almost the whole gain in mean sum rate of
robust linear precoding with P-CSI compared to the heuristic with O-CSI
is due to channel prediction (Figures 4.15 and 4.16): Concerning the sum
MSE as the cost function, the errors in CSI are still small enough and can
be neglected (Appendix C); there is no performance advantage when we con-
164 4 Linear Precoding with Partial Channel State Information
Fig. 4.17 Convergence in mean sum rate of alternating optimization with matched
filter (MF) receiver model for C-CSI, P-CSI, and S-CSI (P-CSI with fmax = 0.2,
10 log10 (Ptx /cn ) = 30 dB, Scenario 1).
Fig. 4.18 Mean sum rate vs. Ptx /cn for P-CSI with fmax = 0.2 (Scenario 2).
sider the error covariance matrix in the optimization problem. Thus, accurate
estimation of the channel predictor’s parameters or its robust optimization
is crucial (see Chapters 2 and 3). On the other hand, a systematic robust
precoder design with P-CSI is important for fmax > 0.15, where the heuristic
optimization performs close to TDMA.
A meaningful precoder solution for S-CSI is obtained taking the expected
cost (robust precoder). About twice the sum rate of TDMA can be achieved
in this scenario. As fmax increases, a smooth transition from C-CSI to S-CSI
is achieved when minimizing the CM estimate of the receivers’ cost.
For this scenario, the robust precoder for P-CSI converges within three it-
erations (Figure 4.17). The initialization of the iterative algorithms for C-CSI
4.4 Optimization Based on the Sum Mean Square Error 165
and S-CSI is already close to the (local) optimum. For C-CSI, more iterations
are required as K/M increases, but typically 10 iterations are sufficient.
If C hk has full rank (as in scenario 1), the system is interference-limited
and the sum rate saturates for high Ptx /cn . Eventually, it is outperformed by
TDMA, which is not interference-limited; therefore, its performance does not
saturate. A proper selection of the receivers, which should be served in one
time slot, would allow for convergence to TDMA at high Ptx /cn . In [128] the
conjecture is made that the multiplexing gain of the broadcast sum capacity
for P-CSI is one,
PKi.e., only one receiver is served at the same time. On the
other hand, if k=1 rank[C hk ] ≤ M and the eigenvectors corresponding to
non-zero eigenvalues are linearly independent, the interference can be avoided
by linear precoding and the sum rate does not saturate.
In scenario 2, the covariance matrices C hk are close to rank one. Therefore,
the performance for P-CSI and S-CSI is very close to C-CSI (Figure 4.18)
for M = K. The heuristic degrades only at high Ptx /cn because the channel
estimates ĥk lie in the correct (almost) one-dimensional subspace.
In the next chapter (Section 5.5), we show the performance results for the
mean uncoded bit error rate (BER), which is averaged over the K receivers,
for comparison with nonlinear precoding. A fixed 16QAM (QAM: quadra-
ture amplitude modulation) modulation is chosen. Minimizing the sum MSE
corresponds to an optimization of the overall transmission quality without
fairness constraints. Maximization of the sum rate is not a useful criterion
to optimize precoding for a fixed modulation D. Figures 5.10 with 5.16 give
the uncoded BER for robust linear precoding with P-CSI and S-CSI in both
scenarios.
All previous performance results for linear precoding and P-CSI have assumed
perfect knowledge of the channel covariance matrix C hT and the noise covari-
ance matrix C v = cn I M . Now we apply the algorithms for estimating channel
and noise covariance matrices which are derived in Chapter 3 to estimate the
parameters of ph|yT (h|y T ).
Only B = 20 correlated realizations h[q tx − ℓ] are observed via y[q tx − ℓ]
for ℓ ∈ {1, 3, . . . , 2B − 1} , i.e., with a period of NP = 2. They are generated
according to the model (2.12); the details are described above.
For estimation, C hT is assumed to have the structure in (3.98), i.e., the K
receivers’ channels are uncorrelated and have different temporal and spatial
correlations.37 We estimate the Toeplitz matrix C T,k , which describes the
37
The real model (2.12) with block diagonal C h is a special case of (3.98).
166 4 Linear Precoding with Partial Channel State Information
Fig. 4.19 Mean sum rate vs. Ptx /cn for P-CSI with fmax = 0.1 for estimated co-
variance matrices (Scenario 1, B = 20).
Fig. 4.20 Mean sum rate vs. fmax at 10 log10 (Ptx /cn ) = 20 dB for estimated covari-
ance matrices (Scenario 1, B = 20).
39
They are only given for a spacing of NP = 2.
168 4 Linear Precoding with Partial Channel State Information
Rmac
k (p
mac
, umac
k ;h
mac
) = I(dkmac ; d̂kmac )
= log2[1 + SINRmac
k (p
mac
, umac
k ;h
mac
)] , (4.116)
which depends on
,2 mac ,T mac 2
pmac |uk hk |
SINRmac
k (p
mac
, umac
k ;h
mac
)= K
k
. (4.117)
P ,2 mac ,T mac 2
pmac
i |uk hi | + kumac
k k2
2
i=1
i6=k
Similarly, MSEmac
k (p
mac
, umac
k , gk
mac
; hmac ) = Enkmac ,d mac [|d̂kmac − dkmac |2 ] can
be defined, which results in the relation
.
mac mac
min
mac
MSE k (p , u mac mac
k , gk ; h mac
) = 1 (1 + SINRmac k (p
mac
, umac
k ;h
mac
)).
gk
(4.118)
Fig. 4.21 Broadcast channel (BC) model with linear receivers (as in Figure 4.3).
Fig. 4.22 Dual model for P-CSI with multiple access (MAC) structure and linear
receivers corresponding to the BC model in Figure 4.21.
decoupled in umac
k (cf. (4.117)). In the BC they are decoupled in gk , but
coupled in pk .
The following proofs of BC-MAC duality for P-CSI are carried out in
analogy to Utschick and Joham [221], who show the MSE duality for the
MIMO-BC and C-CSI in a more general setting.
where
CORmac
k (p
mac
, umac mac mac mac
k , gk (hk ); hk ) =
K
!
X
mac 2 mac ,T ,2 ,∗
=1+ |gmac
k (hk )| uk IM + pmac
i Rhimac |yT umac
k
i=1
mac mac ,T mac mac
− 2Re gk (hmac
k )uk hk pk . (4.119)
MCORmac
k (p
mac
, umac mac
k ; hk ) =
,2 mac ,T mac 2
pmac
k |uk hk |
=1− PK mac ,2 (4.120)
,T ,∗
umac
k I M + i=1 pi Rhimac |yT umac
k
,T mac mac ∗
umac
k hk pk
gkmac = gmac mac
k (hk ) = PK . (4.121)
,T ,2 mac ,∗
umac
k IM + i=1 pmac
i R hi
mac |y
T
u k
Therefore, a duality for the cost functions Ehk |yT [CORk (P , gk (hk ); hk )] and
Ehkmac |yT [CORmac
k (p
mac
, umac mac mac
k , gk (hk ); hkmac )] also proves a duality for the
40
This SINR definition is also assumed in the duality theory by Schubert et al.[185].
Moreover it is often considered when optimizing beamforming at a receiver based on
S-CSI, which is derived differently. Here we see that this criterion can be justified: It
is the optimization criterion for a beamformer based on P-CSI which is followed by
the scaled matched filter (4.121) with knowledge of hmac
k .
4.5 Mean Square Error Dualities of BC and MAC 171
AWGN
AWGN BC model (Figure 4.11) w.r.t. its information rate Rk (4.39) and
SINRk (4.36).41
For the proof of duality, we have to constrain the possible filters to be
consistent.
Definition 4.2. For the BC, a choice of gk (hk ) and pk is consistent, if pk =
0M and gk (hk ) ≡ 0 are only zero simultaneously. For the MAC, a choice of
umac
k and pmac
k is consistent, if pmac
k = 0 and umac
k = 0M (or gmac mac
k (hk ) ≡ 0)
only simultaneously.42
With this restriction we can summarize the BC-MAC duality in a theorem.
Theorem 4.1. Assume a consistent choice of gk (hk ) and pk for
all k and, alternatively, of umac k and pmack . Consider only the ac-
tive receivers and transmitters with pk 6= 0M or pmac k 6= 0.
Then all values of Ehk |yT [CORk (P , gk (hk ); hk )], ∀k ∈ {1, 2, . . . , K},
are achievable if and only if the same values are achievable for
Ehkmac |yT [CORmac mac mac
k (gk (hk ), umac
k ,p
mac
; hmac
k )], ∀k ∈ {1, 2, . . . , K}, with
kpmac k22 = kP k2F .
To prove this result we construct a transformation of the filters from MAC
to BC [221], which yields identical performance measures
and
K q
X
ikmac = Ehimac |yT [|umac ,T h mac |2 ]pmac d mac . (4.125)
k i i i
i=1
i6=k
Assuming complex Gaussian signaling this yields the dual MAC rate
h i
mac mac
Rk (pmac , umac
k ) = log2 1 + SINRk (pmac , umac
k ) (4.126)
mac
which is a function of SINRk (pmac , umac k ). Implicitly, we assume a separate de-
coding of the K data streams in the dual MAC.
42
If the transmit and receive filters are zero simultaneously, the receiver’s perfor-
mance measure is equal to one and duality is obvious. Therefore, we can exclude
these data streams from the following duality and treat them separately.
172 4 Linear Precoding with Partial Channel State Information
,T mac mac
Ehkmac |yT Re gmac mac
k (hk )umac
k hk p k = Ehk |yT Re gk (hk )hkT pk .
(4.128)
hmac
k = hk cn−1/2
k
(4.132)
2 −1/2
gmac mac
k (hk ) = gk (hk )Ehk |y T |gk (hk )| , (4.133)
−1/2
i.e., the dual MAC channel H mac = H T C n is not reciprocal to the BC
channel H.
Now, we have to choose {ξk }K
k=1 in order to guarantee (4.127). First, we
rewrite
W ξ = l, (4.136)
T
where l = kp1 k22 , kp2 k22 , . . . , kpK k22 and
(
−Ehv |yT |gv (hv )|2 pH ∗
u Rhv |y T p , u 6= v
[W ]u,v = 2
u 2
H ∗ .
Ehv |yT |gv (hv )| c̄rv − Ehv |yT |gv (hv )| pv Rhv |yT pv , u = v
(4.137)
Because of
X X
[W ]v,v > [W ]u,v = Ehv |yT |gv (hv )|2 pH ∗
u Rhv |y T pu , (4.138)
u6=v u6=v
Fig. 4.24 Dual model with MAC structure corresponding to the BC model in Fig-
ure 4.23.
As dual MAC we choose the structure in Figure 4.24 with P-CSI at the
receiver Gmac = [g mac
1 , g mac
2 , . . . , g mac
K ]
T
and transmitter. The conditional
mac mac
mean estimate of MSEk (p , g mac k ; h mac
) = En mac ,d mac [kd̂ mac − d mac k22 ]
45
Here we assume P-CSI at the transmitter instead of only S-CSI as in (4.65).
4.5 Mean Square Error Dualities of BC and MAC 175
K
!
X h i
mac ,H mac ,2 mac ,∗ mac ,T
+ gk pi Ehimac |yT hi hi g mac
k
i=1
,T
− 2Re g mac
k Ehkmac |yT [hkmac ] pmac
k (4.140)
The power allocation is linked to the noise variance after the receive filters
as
ξk2 cnk |βk |2 Ehk |yT |gk (hk )|2 = |pmac
k |
2
(4.143)
ξk2 kg mac 2 2
k k2 = kpk k2 . (4.144)
1/2
pmac
k = cn1/2
k
βk ξk Ehk |yT |gk (hk )|2
g mac
k = ξk−1 pk (4.145)
−1/2 −1/2
hmac
k = gk (hk )hk Ehk |yT |gk (hk )|2 cnk .
The detailed proof and the linear system of equations to compute {ξk2 }K
k=1 ∈
R+,0 are given in Appendix D.5.
Maximizing throughput (sum rate) or minimizing sum MSE does not pro-
vide any quality of service (QoS) guarantees to the receivers. For example,
receivers with small SINR are not served for small Ptx . But very often a QoS
requirement is associated with a data stream and it may be different for every
receiver. We consider QoS measures which can be formulated or bounded in
terms of the mean MSE (see Section 4.3) and give a concise overview of the
related problems. The most common algorithms for their solution rely on the
BC-MAC dualities from Section 4.5.
In Section 4.3, the minimum MSE, i.e., MMSEk (P ) (4.43), for the AWGN
BC model turned out to be the most convenient measure among all presented
alternatives.
One standard problem is to minimize the required resources, i.e., the total
transmit power, while providing the desired QoS γk to every receiver. This
power minimization problem reads
Because of the one-to-one relation (4.43) with SINRk (P ) (4.36), we can trans-
form the QoS requirements γk on MMSEk (P ) to its equivalent SINR require-
ment γksinr = 1/γk −1. There are two standard approaches for solving (4.146):
• The problem is relaxed to a semidefinite program [19], whose solution is
shown to coincide with the argument of (4.146).
• The dual MAC problem is solved iteratively alternating between power
allocation and beamforming at the receiver U mac . The algorithm by Schu-
bert and Boche [185] converges to the global optimum.
In the sequel, we focus on the second solution based on the MAC-BC duality.
4.6 Optimization with Quality of Service Constraints 177
which can now be solved with the algorithm in [185]. Note that (4.146) may
be infeasible, which is detected by the algorithm.
Alternatively, we can maximize the margin to the QoS requirement γk for
given total power constraint Ptx
MMSEk (P ) ≤ γ0 γk , k = 1, 2, . . . , K. (4.148)
If this problem is feasible, the optimum variable γ0 describing the margin lies
in 0 ≤ γ0 ≤ 1. Otherwise, the problem is infeasible. The solution is equivalent
to minimizing the normalized performance of the worst receiver
MMSEk (P )
min max s.t. kP k2F ≤ Ptx . (4.149)
P k∈{1,...,K} γk
which can be shown [185] to yield a balancing among the relative MSEs
MMSEk (P )/γk = γ0 . For unequal γk , its optimum differs from balancing
SINRk (P ).
The corresponding problem in the dual MAC is
For incomplete CSI at the receivers, which results from a common train-
ing channel in the forward link (Sections 4.2.2), we proposed to model the
receivers’ signal processing capabilities by βk gk (hk ) in Section 4.5.2.
For S-CSI, the corresponding power minimization problem is
(4.151)
Ehkmac [MSEmac
k (p
mac
, g mac mac
k ; hk )] ≤ γk , k = 1, . . . , K. (4.152)
K
!−1
H
X
,2 ,2
=1− pmac
k Ehkmac [hkmac ] IM + pmac
i Rhimac Ehkmac [hkmac ] (4.153)
i=1
With the argumentation from [236, App. II], we identify an interference func-
tion as defined by Yates [239]
46
The dual MAC channel hkmac for the common training channel defined in (4.145)
is not zero-mean.
4.6 Optimization with Quality of Service Constraints 179
,2 K
1 − γk
Jk {pmac
i }i=1 ; γk = −1 .
K
P
H ,2
Ehkmac [hkmac ] IM + pmac
i Rhimac E hkmac [hkmac ]
i=1
(4.154)
Yates [239] showed that the fixed point iteration on this equation con-
verges to a unique fixed point and minimizes the total power kpmac k22 if
,(m−1) K ,(m),2
the problem is feasible: Based on {pmac
i }i=1 we compute pmac
k =
(m−1),2
Jk ({pmac
i }K
;
i=1 kγ ) in the mth iteration. But its convergence is slow
and, for example, an approach similar to [185] as in [7] is recommended.
Similar to the AWGN BC model, we can also maximize the MSE margin
in the BC channel under a total power constraint
Ehkmac [MSEmac
k (p
mac
, g mac mac
k ; hk )] ≤ γ0 γk , k = 1, . . . , K
Introduction to Chapter
Linear precoding with independent coding of the receivers’ data does not
achieve the capacity of the vector broadcast channel (BC) for C-CSI at the
transmitter. It is achieved by successive encoding with dirty paper coding
[232]. At high SNR, the achievable sum rate of linear precoding has the
same slope (multiplexing gain) as dirty paper coding (e.g., [32, 106]). But
there is an additional power offset, which increases with the ratio of the
number of receivers to the number of transmit antennas K/M [106]. Perhaps
surprisingly, the sum capacity of the broadcast channel with non-cooperative
receivers is asymptotically (high transmit power) identical to the capacity of
the same channel, but with cooperating receivers (point-to-point multiple-
input multiple-output (MIMO) channel).
For dirty paper coding, the data streams for all receivers are jointly and
sequentially coded: The interference from previously encoded data streams is
known causally at the transmitter; thus, this knowledge can be used to code
the next data stream such that its achievable data rate is the same as if the
interference from previously coded data streams was not present (Writing on
dirty paper [39]). Some practical aspects and an algorithm with performance
close to capacity, which relies on dirty paper coding, is discussed in [210].
An implementation of writing on dirty paper for a single-input single-output
(SISO) system with interference known at the transmitter is given by Erez
and ten Brink [65].
The nonlinear precoding schemes considered in this chapter attempt to
fill the performance gap between linear precoding and dirty paper coding,
but with a smaller computational complexity than dirty paper coding. They
are suboptimum, because they do not code the data sequence in time, i.e.,
they are “one-dimensional” [241].
The most prominent representative is Tomlinson-Harashima precoding
(THP), which is proposed for the broadcast channel by [82, 241, 71]; orig-
inally, it was invented for mitigating intersymbol interference [213, 89]. In
181
182 5 Nonlinear Precoding with Partial Channel State Information
Fig. 5.1 Broadcast channel with nonlinear receivers, which include a modulo oper-
ator M(b̂k [n]) (Figure 5.3).
Besides the development for the broadcast channel, THP based on P-CSI
for cooperative receivers is treated in [135] for a SISO system. A heuristic
solution is proposed for MIMO in [72]; for S-CSI another approach is given
in [190].
Outline of Chapter
First, we derive the structure of the nonlinear vector precoder in Section 5.1.
This includes the derivation of MMSE vector precoding and its relation to
MMSE THP for C-CSI as an example. Optimization of nonlinear precoding
with P-CSI requires new performance measures, which we introduce in Sec-
tion 5.2 using the receiver models in analogy to linear precoding with P-CSI
(Section 4.3). Conceptually, we also distinguish between vector precoding
and THP. For these performance measures, we derive iterative algorithms
minimizing the overall system efficiency, e.g., the sum MSE, and discuss the
results for different examples (Section 5.3). For S-CSI, nonlinear precoding
can be understood as nonlinear beamforming for communications. Precoding
for the training channel to provide the receivers with the necessary CSI is
addressed in Section 5.4. A detailed discussion of the performance for P-CSI
and S-CSI is given in Section 5.5, which is based on the uncoded bit error
rate (BER).
Fig. 5.2 Broadcast channel with nonlinear receivers in matrix-vector notation and
representation of modulo operation by an auxiliary vector â [n] ∈ LK .
Fig. 5.4 Voronoi region V (shaded area), lattice L (filled circles), QPSK constel-
lation D with variance one (4 different markers), and periodically repeated QPSK
constellation for perturbations a ∈ L.
ity. Contrary to the modulo operation, the quantizer will not influence our
precoder design.4
The transmit signal xtx [n] ∈ CM is a nonlinear function of the modulated
data d[n] ∈ DK . To restrict the functional structure of the nonlinear precoder
and to introduce a parameterization, we have to analyze the properties of the
modulo operators at the receivers generating a nonlinear modulo channel.
The modulo operation (Figure 5.3) on x ∈ C can also be expressed as
Re(x) 1 Im(x) 1
M(x) = x − τ + − jτ + ∈ V, (5.1)
τ 2 τ 2
where the floor operation ⌊•⌋ gives the largest integer smaller than or equal
to the argument and V = {x + jy|x, y ∈ [−τ /2, τ /2)} is the fundamental
Voronoi region5 of the lattice L = τ Z + jτ Z corresponding to the modulo
operation (Figure 5.4).
In the matrix-vector notation of the BC (Figure 5.2), the modulo op-
eration on a vector M(b̂[n]) is defined element-wise, i.e., M(b̂[n]) =
[M(b̂1 [n]), . . . , M(b̂K [n])]T ∈ VK . As in Figure 5.2, the modulo operation can
also be formulated with an auxiliary vector â[n] ∈ LK on the 2K-dimensional
lattice LK = τ ZK + jτ ZK .
The representation with an auxiliary variable a ∈ L shows an important
property of the modulo operator: For any x ∈ C, we have M(x + a) = M(x)
4
In Figure 4.3 the symbol estimates d̂k [n] ∈ C are complex numbers; the quantizer is
omitted, because the linear precoder based on MSE is optimized for the linear model.
5
We introduce the common terminology for completeness, but do not require a deeper
knowledge of lattice theory.
186 5 Nonlinear Precoding with Partial Channel State Information
if a ∈ L. The precoder can generate any signal b̂k [n] + ak [n] at the receiver
with ak [n] ∈ L, which yields the same same signal at the modulo operator’s
output as b̂k [n]. For example, the estimates of the transmitted data dˆk [n] are
identical to dk [n], if the input of the modulo operator is b̂k [n] = dk [n] + ak [n]
for ak [n] ∈ L. Because of this property, it is sufficient to optimize the precoder
such that b̂k [n] ≈ dk [n]+ak [n] (virtual desired signal [113]) is achieved. These
additional degrees of freedom can be exploited by the precoder.
This observation allows for an optimization of the precoder based on the
linear part of the models in Figures 5.1 and 5.2. This is not necessary, but
simplifies the design process considerably, since the alternative is an opti-
mization including the nonlinearity of the modulo operator or, additionally,
the quantizer.
The corresponding structure of the transmitter is given in Figure 5.5, which
is referred to as vector precoder [184]. Its degrees of freedom are a linear
precoder T ∈ CM ×K and a perturbation vector a[n] ∈ LK on the lattice.
To define the receivers it remains to specify the parameter τ . It is chosen
sufficiently large such that D ⊂ V. Because of this restriction on τ , any input
from D is not changed by the modulo operation and the nearest-neighbor
quantizer Q(•) is applicable, which we have introduced already. We choose τ
such that the symbol constellation seen by the receiver in b̂k [n] without noise
(cnk = 0) and distortions (e.g., GHT = I K ), is a periodic extension of D on
C (Figure 5.4): For √ QPSK symbols √ (D = {exp(jµπ/4), µ ∈ {−3, −1, 1, 3}}),
it requires τ = 2 2 and τ = 8/ 10 for rectangular 16QAM symbols with
variance cdk = 1.
In the remainder of this section, we introduce the system models for non-
linear precoding (vector precoding and Tomlinson-Harashima precoding) of
the data for the example of C-CSI at the transmitter and identical receivers
G = βI K , which allows for an explicit solution [184].
Vector Precoding
The MSE for receiver k is defined between the input of the modulo oper-
ators b̂k [n] and the desired signal bk [n] = dk [n] + ak [n]
Nd
1 X 2
MSEVP
k T , {a[n]}N
n=1 , β; hk
d
= En b̂k [n] − bk [n] , (5.2)
Nd n=1 k
where the expectation is performed w.r.t. the noise nk [n] and the arithmetic
mean considers the given data d[n]. The dependency on {d[n]}N n=1 is not
d
denoted explicitly.
To optimize the performance of the total system we minimize the sum
MSE (similar to Section 4.3.4). The optimization problem for the precoder
with a constraint on the average transmit power for the given data block is
K
X
Nd
min MSEVP
k T , {a[n]}n=1 , β; h k s.t. tr T R̃b T H ≤ Ptx ,
Nd ,β
T ,{a[n]}n=1 k=1
(5.3)
PNd
where R̃b = n=1 b[n]b[n]H /Nd is the sample correlation matrix of b[n].
With
K Nd
X 1 X 2
Nd
MSEVP
k T , {a[n]}n=1 , β; h k = En b̂[n] − b[n] (5.4)
Nd n=1
k=1
and b̂[n] = βHT b[n] + βn[n] we obtain the explicit expression for the cost
function
K
X
MSEVP
k T , {a[n]}N 2
n=1 , β; hk = tr R̃b + |β| tr[C n ]
d
k=1
+ |β|2 tr H H HT R̃b T H − 2Re βtr HT R̃b . (5.5)
with
P = β −1 H H Π (O),T LH ∆ (5.11)
−1
F = IK − L . (5.12)
2
min ∆1/2 LΠ (O) (d[n] + a[n]) . (5.13)
a[n]∈LK 2
a(O) [n] = [ap1 [n], . . . , api−1 [n], āpi [n], ∗, . . . , ∗]T . (5.15)
1/2 T 1/2
eT
i ∆
1/2
L = δi ei (I K − F )−1 = δi eTi (I K − F + F ) (I K − F )−1
1/2 T −1
= δi ei I K + F (I K − F )
(5.16)
Defining the signal w[n] = (I K − F )−1 Π (O) b[n] problem (5.14) can be ex-
pressed in terms of a quantizer QL (x) to the point on the 2-dimensional lattice
L closest to x ∈ C:
2
api [n] = argmin δi dpi [n] + āpi [n] + eT
i F w[n]
āpi [n]∈L (5.17)
= −QL dpi [n] + eT
i F w[n] .
and the output of feedback loop in Figure 5.6 can be expressed by the modulo
operation
w[n] = Π (O) b[n] + F w[n] = M Π (O) d[n] + F w[n] . (5.19)
This relation yields the representation of the vector precoder in Figure 5.7,
which is the well known structure of THP for the BC. It is equivalent to
Figures 5.5 and 5.6 if the lattice search for a[n] is restricted as in (5.14).
PNd
Note that the sample correlation matrix R̃w = n=1 w[n]w[n]H /Nd of
w[n] is not diagonal in general because of
7
This can be proved comparing (5.11) and (5.12) with the solutions given in [127].
5.1 From Vector Precoding to Tomlinson-Harashima Precoding 191
with Φ−1K =Φ
−1
and Φ−1 H
K−1 = Ψ K − lK lK /δK .
In the first iteration we are free to choose pK in the permutation matrix
Π (O) , i.e., δK corresponding to the maximum or minimum element on the
diagonal of Φ−1 . Thus, in the first step we decide for the data stream to be
precoded last. We decide to choose pK as the index of the smallest diagonal
element of Φ−1 , which has the smallest contribution (5.14) in (5.13): The
best data stream is precoded last. Similarly, we proceed with Φ−1 K−1 .
This choice of the best data stream becomes clear in the following example:
Example 5.2. The first column of P (5.11) is p1 = β −1 h∗p1 δ1 , which is a
matched filter w.r.t. the channel of the p1 th receiver precoded first. On the
other hand, the last column of P is pK = β −1 H H Π (O),T LH eK δK . At high
Ptx we have HpK = β −1 Π (O),T L−1 eK = β −1 epK , because
192 5 Nonlinear Precoding with Partial Channel State Information
pK = β −1 H H Π (O),T LH ∆eK
= β −1 H H Φ−1 Π (O),T L−1 eK
= β −1 H H (HH H )−1 Π (O),T L−1 eK
from (5.9) for K ≤ M : Interference from the pK th data stream to all other
receivers is avoided; the performance gain in MSE is largest if we choose the
receiver with the best channel to be precoded last, i.e., it creates the smallest
contribution to the MSE. ⊓
⊔
Because the precoding order is determined while computing the symmet-
rically permuted Cholesky factorization (5.9), the computational complexity,
−1
which results from computing the inverse Φ and the Cholesky factoriza-
tion (5.9), is of order O K 3 . Note that this is the same order of complexity
as for linear precoding in Section 4.4 (see [127] for details).
In the previous section we introduce the system models for nonlinear precod-
ing (vector precoding in Figure 5.5 and THP in Figure 5.7) assuming C-CSI
at the transmitter and identical receivers G = βI K . In the sequel we consider
the general case:
• We assume P-CSI at the transmitter (Table 4.1) based on the model of
the reverse link training channel in a TDD system (Section 4.1) and
• different scaling at the receivers G = diag [g1 , . . . , gK ].
The MSE criterion for optimizing the scaling gk prior to the modulo operation
at the receivers is shown to be necessary to reach the Shannon capacity for
the AWGN channel on a single link with a modulo receiver (modulo channel)
as in Figure 5.1 [42]. Of course, although the MSE criterion is necessary for
optimizing the receivers, a capacity achieving precoding (coding) is not based
on MSE.
Because only few information theoretic results about the broadcast channel
with modulo receivers or P-CSI at the transmitter are known, the following
argumentation is pursued from the point of view of estimation theory based
on MSE. The steps are motivated by linear precoding in Section 4.3.
We first discuss the cost functions for optimization of nonlinear precod-
ing with P-CSI assuming the optimum MMSE scaling at the receivers (Sec-
tion 5.2.1). To obtain a performance criterion which can be given explicitly,
we assume a scaled matched filter at the receivers in Section 5.2.2 for op-
timization of precoding. Optimization of vector precoding (Figure 5.5) and
THP (Figure 5.7) are treated separately.
5.2 Performance Measures for Partial Channel State Information 193
Note that a relation between vector precoding and THP, which we present
in Section 5.1, has not been found yet for the general cost functions below.
Vector Precoding
Given hk , the MSE between the signal at the input of the modulo operator
b̂k [n] = gk hTk T b[n] + gk nk [n] (Figure 5.1) and the (virtual) desired signal
bk [n] = dk [n] + ak [n] (Figure 5.5) is
Nd
1 X 2
MSEVP
k T , {a[n]}N
n=1 , gk ; hk
d
= En b̂k [n] − bk [n] (5.22)
Nd n=1 k
= r̃bk + |gk |2 cnk + |gk |2 hT H ∗
k T R̃b T hk −
− 2Re gk hT k T r̃ bbk
|hTk T r̃ bbk |
2
= r̃bk − T ∗
.
hk T R̃b T H hk + cnk
is realizable if hT
k T r̃ bbk can be estimated at the receivers which is discussed
in Section 5.4. The denominator in (5.24) is the variance of the receive signal
crk .
Optimization of vector precoding is based on the conditional mean esti-
mate of (5.23) (compare Section 4.3.1 and Appendix C)
h i
Ehk |yT MMSEVP
k T , {a[n]}N
n=1 ; hk
d
. (5.25)
194 5 Nonlinear Precoding with Partial Channel State Information
An explicit expression is not known due to the ratio of quadratic forms with
correlated non-zero mean random vectors hk in (5.23).8
Tomlinson-Harashima Precoding
The MSE for THP (Figure 5.7) is derived from its equivalent representation
in Figure 5.6. The nonlinearity of the precoder, i.e., the modulo operators or,
equivalently, the restricted lattice search, is described by a stochastic model
for the output w [n] of the modulo operators at the transmitter. In [69] it
is shown that the outputs can be approximated as statistically independent,
zero-mean, and uniformly distributed on V, which results in the covariance
matrix
with cw1 = cd1 = 1 and cwk = τ 2 /6, k > 1 (wk [n] is the kth element of w [n]).
It is a good model for C-CSI and zero-forcing THP (MMSE THP at high Ptx )
[108, 69], where the approximation becomes more exact as the constellation
size of the modulation alphabet increases. We make this assumption also for
the MSE criterion and P-CSI, where it is ad-hoc. Its validity is discussed in
Section 5.5.
The feedback filter F is constrained to be lower triangular with zero diag-
onal to be realizable; this requirement is often referred to as spatial causality.
where the expectation is now also taken w.r.t. w [n]. We define the inverse
function of k = pi as i = ok , i.e., the kth receiver’s data stream is the ok th
data stream to be precoded, and the response of the permutation matrix
T
eok = Π (O) ek and epi = Π (O),T ei . Moreover, the ith row of F is denoted f̃ i .
With the expressions for the signals related to the kth receiver’s data
stream
8
This is also the case for linear precoding (cf. (4.31)).
5.2 Performance Measures for Partial Channel State Information 195
T 2
MSETHP
k
2
P , f̃ ok , gk ; hk = |gk | cnk + Ew gk hT
kP − eT
ok + f̃ ok w [n]
K
X 2
= |gk |2 cnk + cwi gk hT T
k pi − eok ei + fok ,i
i=1
(5.31)
with fok ,i = [F ]ok ,i (n[n] and w [n] are uncorrelated). Before treating the case
of P-CSI, we discuss the MSE for C-CSI as an example.
Example 5.3. Because transmitter and receiver have the same CSI (C-CSI),
we can start optimizing for f̃ ok keeping O, P , and gk fixed. Minimizing (5.31)
w.r.t. f̃ ok under the constraint on F to be lower triangular with zero diagonal,
we obtain fok ,i = −gk hT k pi , i < ok , and fok ,i = 0, i ≥ ok . The feedback filter
for the kth data stream removes all interference from previously encoded data
streams in the cost function, because
T
f̃ ok = −gk hT
k [p1 , . . . , pok −1 ] , 01×(K−ok +1) . (5.32)
The minimum is
K
X
min MSETHP
k
2
P , f̃ ok , gk ; hk = cwok + |gk | cnk + cwi |gk |2 |hT
k pi |
2
f̃ o
k i=ok
− 2cwok Re gk hT
k pok . (5.33)
cwok |hT
k pok |
2
SINRTHP
k (P ; hk ) = PK (5.35)
T 2
i=ok +1 cwi |hk pi | + cnk
as
1
min MSETHP
k P , f̃ ok , gk ; hk = cwok . (5.36)
f̃ o ,gk
k
1+ SINRTHP
k (P ; hk )
Therefore, the interference from other data streams, which has been removed
T
by f̃ ok already, is not considered anymore for optimization of the forward
filter P .
Because of the structure of SINRTHPk (P ; hk ), any value of SINRTHP
k is
achievable if Ptx is arbitrary, i.e., Ptx → ∞ results in arbitrarily large
196 5 Nonlinear Precoding with Partial Channel State Information
Fig. 5.8 Representation of THP with identical estimation error ŵ[n]−w[n] = b̂[n]−
b[n] as in Figure 5.6.
SINRTHP
k — even for K > M (e.g., [186]). This is impossible for linear pre-
coding. To verify this, consider SINRTHP k (P ; hk ) for ok = K, i.e., the receiver
to be precoded last: Any SINRTHP k (P ; h T 2
k ) = cwK |hk pK | /cnk can be en-
sured only by choosing the necessary power allocation kpK k2 ; For receiver
oi = K −1, we have SINRTHP i (P ; hi ) = cwK−1 |hT 2 T 2
i pK−1 | /(cwK |hi pK | +cni )
and any value can be achieved choosing kpK−1 k2 .
The SINR definition (5.35) also parameterizes the sum-capacity of the BC
with C-CSI, which can be achieved by dirty paper precoding [230] (see also
[186]).9 We will see that it is generally not possible to derive a relation similar
to (5.36) for P-CSI from MSE.
The representation of THP given in Figure 5.8 is equivalent to Figure 5.6
regarding the error signal used for the MSE definition: the error ŵk [n]−wk [n]
in Figure 5.8 is identical to b̂k [n] − bk [n] based on Figure 5.6. Figure 5.8 gives
the intuition for interpreting the solution for THP with C-CSI: The precoder
is optimized as if w[n] was partially known to the receiver via the lower
triangular filter F . ⊓
⊔
For P-CSI at the transmitter, we have to optimize gk first because of the
asymmetry of CSI in the system. Intuitively, we expect that a complete can-
celation of the interference from already precoded data streams is impossible
for P-CSI: From Figure 5.8 we note that F cannot cancel the interference
completely, if it does not know the channel H perfectly.
The MMSE receiver is given by
∗
∗
hT
k P C w eo k
− f̃ ok
gk = gmmse
k (hk ) = (5.37)
crk
PK
with variance of the receive signal crk = i=1 cwi |hT 2
k pi | + cnk . The THP
parameters are optimized based on the conditional mean estimate
Ehk |yT MMSETHPk P , f̃ ok ; hk (5.38)
9
{cwi kpi k22 }K
i=1 are the variances of independent data streams, which have to be
optimized to achieve sum capacity.
5.2 Performance Measures for Partial Channel State Information 197
THP cwo |v T
k pok |
2
SIRk (P ) = PK k . (5.41)
T 2
i=ok +1 cwi |v k pi |
THP
As for C-CSI, any SIRk (P ) is feasible and the resulting MSE can be made
arbitrarily small at the expense of an increasing Ptx . This is true even for
K > M. ⊓
⊔
Vector Precoding
For vector precoding (Figure 5.5), the scaled matched filter (compare
to (4.47))
(hT
k T r̃ bbk )
∗
gcor
k (hk ) = (5.42)
c̄rk
with mean variance of the receive signal c̄rk = tr[T R̃b T H R∗hk |yT ] + cnk mini-
mizes
CORVPk T , {a[n]}Nd 2 T
n=1 , gk ; hk = r̃bk + |gk | c̄rk − 2Re gk hk T r̃ bbk . (5.43)
h i r̃ H H ∗
bbk T Rhk |y T T r̃ bbk
Ehk |yT MCORVP
k T , {a[n]}N
n=1 ; hk
d
= r̃bk − ≥0
c̄rk
(5.45)
Tomlinson-Harashima Precoding
With the same model for w [n] as in Section 5.2.1, we define the cost function
(Figure 5.6)
T
∗
CORTHP
k P , f̃ ok , gk ; hk = |gk |2 c̄rk + eTok − f̃ ok C w eok − f̃ ok
∗
− 2Re gk hTk P C w eok
− f̃ ok , (5.46)
which yields the matched filter receiver (compare to the MMSE receiver
in (5.37))
5.2 Performance Measures for Partial Channel State Information 199
∗
∗
hT
k P C w eok − f̃ ok
gcor
k (hk ) = (5.47)
c̄rk
10
It yields B(αI + AB)−1 = (αI + BA)−1 B for some matrices A, B. Here, we
set A = C w P H and B = R∗
hk |y T P ok .
11
Pok −1
Note that P ok C w P H = i=1 cwi pi pH
i because the last columns of P ok are
zero.
200 5 Nonlinear Precoding with Partial Channel State Information
The solution of the feedback filter as well as the minimum of the cost function
THP
show a structure identical to C-CSI (cf. (5.32) and (5.36)). Any SINRk (P )
is achievable for an arbitrary Ptx . This is not the case for rank[C hk ] > 1.
For an optimum MMSE receiver, this behavior is only achieved for cnk = 0.
Therefore, for large Ptx and rank-one C hk , the achievable cost (5.53) with a
scaled matched filter receiver is identical to (5.40) for the optimum MMSE
receiver. But if Ptx is finite and the true receiver is an MMSE receiver, the
performance criterion (5.38) is more accurate than (5.53) and gives an ap-
propriate model for the uncertainty in CSI: This example shows that the
uncertainty in CSI is not described completely by the cost function (5.49).
⊓
⊔
The proposed performance measures for P-CSI are either based on the op-
timum MMSE receiver or a suboptimum scaled matched filter receiver. To
optimize the performance of the overall system (similar to maximizing the
mean sum rate for linear precoding in Section 4.4), we minimize the condi-
tional mean estimate of the sum MSE resulting from an MMSE receiver or
of the sum performance measure for a scaled matched filter receiver under a
total transmit power constraint.
For vector precoding, the optimization problem is
12
We use the relation (αI + abT )−1 a = a(α + bT a)−1 , for some vectors a, b, and
Pok −1
choose a = v̄ ∗ T
k and b = v̄ k
H
i=1 pi pi and apply the definition of c̄rk .
5.3 Optimization Based on Sum Performance Measures 201
K
X h i
Nd
min Ehk |yT MMSEVP
k T , {a[n]}n=1 ; hk s.t. tr T R̃b T H ≤ Ptx
d N
T ,{a[n]}n=1 k=1
(5.55)
and
K
X
min Ehk |yT MCORTHP
k P , f̃ ok ; hk s.t. tr P C w P H ≤ Ptx
P ,F ,O
k=1
(5.58)
from (5.49).
Both performance measures proposed for THP in Section 5.2 yield an
explicit solution for the feedback filter F in terms of P and O. But the
resulting problem (5.57) cannot be given in explicit form and an explicit
solution of (5.58) seems impossible. A standard numerical solution would
not allow for further insights. For vector precoding, the situation is more
severe, since the lattice search is very complex and it is not known whether
a heuristic as in Section 5.1 can be found.
Therefore, we resort to the same principle as for linear precoding (Sec-
tion 4.4): Given an initialization for the precoder parameters, we alternate
between optimizing the nonlinear precoder and the receiver models. Thus,
we improve performance over the initialization, obtain explicit expressions
for every iteration, and reach a stationary point of the corresponding cost
function. In contrast to linear precoding the global minimum is not reached
for C-CSI and an arbitrary initialization. This procedure also allows for a
solution of (5.55) and (5.57) by means of numerical integration.
202 5 Nonlinear Precoding with Partial Channel State Information
Vector Precoding
h i (5.59)
(m−1)
+|β|2 tr R(m−1) T R̃b T H − 2Re βH G T R̃b .
It is equivalent to
K
X h i
Nd mmse,(m−1)
Ehk |yT MSEVP
k T , {a[n]}n=1 , βg k (h k ); h k (5.60)
k=1
for
h i
(m−1)
HG = Eh|yT G(m−1)(h)H
h i
R(m−1) = Eh|yT H H G(m−1)(h)H G(m−1)(h)H
XK
2
(5.61)
mmse,(m−1)
Ḡ(m−1) = Ehk |yT gk (hk ) cnk
k=1
h i
(m−1) mmse,(m−1) mmse,(m−1)
G (h) = diag g1 (h1 ), . . . , gK (hK ) .
mmse,(m−1)
Minimization of (5.60) w.r.t. the function gk (hk ), e.g., before taking
the conditional expectation, yields (5.55).
The cost function for scaled matched filter receivers
K
X h i
Nd cor,(m−1)
Ehk |yT CORVP
k T , {a[n]}n=1 , βg k (h k ); hk (5.62)
k=1
For this performance measure, all parameters can be given in closed form.
cor,(m−1)
For the optimum function gk (hk ) from below (cf. (5.71)), the cost
function (5.62) is identical to (5.56).
In (5.60) and (5.62) we introduced the additional degree of freedom β
to obtain an explicit solution for T and enable an optimization alternating
between transmitter and receivers.
First we set the receiver model G(m−1)(h) equal to the optimum receiver
from the previous iteration. Minimization of (5.59) w.r.t. T and β subject to
the total power constraint tr[T R̃b T H ] ≤ Ptx yields
!−1
Ḡ(m−1) (m−1),H
T (m) = β (m),−1 R(m−1) + IM HG
Ptx
!−1
Ḡ(m−1) (m−1),H
= β (m),−1 X (m−1) + IM HG Φ(m−1),−1
Ptx
!−2
(m−1) Ḡ(m−1) (m−1),H
β (m),2 = tr R̃b H G R(m−1) + IM HG Ptx
Ptx
(5.64)
with
!−1
(m−1) (m−1) (m−1) Ḡ(m−1) (m−1),H
Φ = HG X + IM HG + IK. (5.65)
Ptx
and13
(m−1),H (m−1)
X (m−1) = R(m−1) − H G HG . (5.66)
13
X (m−1) can be considered as the covariance matrix of the additional correlated
noise due to errors in CSI for P-CSI.
204 5 Nonlinear Precoding with Partial Channel State Information
is given by (D.14)
Nd
1 X 2
F VP,(m)
{a[n]}Nd
n=1 = ka[n] + d[n]kΦ(m−1),−1
Nd n=1
Nd
1 X 1 2
= ∆(m−1), 2 L(m−1) Π (O) (a[n] + d[n]) .
Nd n=1 2
(5.68)
Thus, the optimization w.r.t. the perturbation vectors {a[n]}N n=1 on the lat-
d
K
tice L is equivalent to the case of C-CSI in (5.13) based on the symmetrically
permuted Cholesky factorization
As for linear precoding in Section 4.4, the scaling β (m) , which was introduced
artificially in the first step of every iteration (5.64), is removed by the opti-
mum receivers. Based on the improved receivers we compute the parameters
of the general cost function (5.59) and continue with (5.64) followed by the
optimization of a[n].
The computational complexity per iteration is similar to the situation of
C-CSI (Section 5.1): With a lattice search based on the permuted Cholesky
factorization (5.69) according to Equations (5.11) up to (5.14) (in analogy
to
[127]), the total number of operations is in the order of O K 3 + M 3 . The
cubic order in M results from the inverse in Φ(m−1) (5.65).
As for linear precoding (Section 4.4.1), the expressions for MMSE receivers
require a multidimensional numerical integration, which increases the com-
plexity considerably. This is not necessary for (5.62), which can be given in
closed form as a function of the second order moments of ph|yT (h|y T ).
In summary, our iterative algorithm proceeds as follows:
• We require an initialization for the receivers ((5.70) or (5.71)) or equiva-
lently for T (0) and {a(0) [n]}N
n=1 .
d
(m) Nd
{a [n]}n=1 involves an optimization of the ordering.
• It follows an update of the receivers (5.70) or (5.71), which yields the new
cost function (5.59) for the next iteration.
• Convergence can be controlled by the distance of β (m) to 1. Alternatively,
we stop after a fixed number of iterations, i.e., for a fixed amount of com-
putational complexity.
Because the cost functions are bounded by zero, the algorithm converges
to a stationary point. An improved or equal performance is guaranteed in
every step, i.e.,
K
X h i
Ehk |yT MMSEVP
k T (m) , {a(m) [n]}N
n=1 ; hk
d
k=1
K
X h i
≤ Ehk |yT MMSEVP
k T (m−1) , {a(m−1) [n]}N
n=1 ; hk
d
(5.72)
k=1
for (5.60) and for the suboptimum matched filter receiver (5.62)
206 5 Nonlinear Precoding with Partial Channel State Information
K
X h i
Nd
Ehk |yT MCORVP
k T (m)
, {a (m)
[n]}n=1 ; h k
k=1
K
X h i
≤ Ehk |yT MCORVP
k T (m−1) , {a(m−1) [n]}N
n=1 ; hk
d
. (5.73)
k=1
Tomlinson-Harashima Precoding
For THP, the iterative algorithm is very similar. The general cost function
h i
FTHP,(m)(P , F , β, O) = |β|2 Ḡ(m−1) + |β|2 tr P C w P H R(m−1)
h i
(m−1)
−2Re β tr H G P C w (I K − F )H Π (O) +tr (I K − F )H C w (I K − F )
(5.74)
can be written as
h i
FTHP,(m)(P , F , β, O) = |β|2 Ḡ(m−1) + |β|2 tr P H X (m−1) P C w
2
(5.75)
(O) (m−1)
+Ew βΠ H G P − (I K − F ) w [n]
2
with definitions
h i
(m−1)
HG = Eh|yT G(m−1)(h)H (5.76)
(m−1),H (m−1)
X (m−1) = R(m−1) − H G HG , (5.77)
which are valid for both cost functions below. With appropriate definitions
of R(m−1) , Ḡ(m−1) , and G(m−1)(h), it includes the cost functions
5.3 Optimization Based on Sum Performance Measures 207
K
X h i
mmse,(m−1)
Eh|yT MSETHP
k P , f̃ ok , βg k (h k ); h k and (5.78)
k=1
K
X h i
cor,(m−1)
Eh|yT CORTHP
k P , f̃ ok , βgk (hk ); hk , (5.79)
k=1
which result in (5.57) and (5.58), respectively, when minimizing w.r.t. the
receiver functions gk (hk ) (before taking the conditional expectation).
For (5.78), the parameters in (5.75) are defined as
h i
R(m−1) = Eh|yT H H G(m−1)(h)H G(m−1)(h)H (5.80)
K
X
2
mmse,(m−1)
Ḡ(m−1) = cnk Ehk |yT gk (hk ) (5.81)
k=1
h i
(m−1) mmse,(m−1) mmse,(m−1)
G (h) = diag g1 (h1 ), . . . , gK (hK ) . (5.82)
In this case X (m−1) (5.77) is the sum of (complex conjugate) error covari-
ance matrices for the conditional mean estimate of the effective channel
mmse,(m−1)
gk (hk )hk .
For (5.79), we obtain the explicit expressions for the parameters in (5.74)
K
X
2
cor,(m−1)
R(m−1) = Ehk |yT gk (hk ) R∗hk |yT (5.83)
k=1
K
X
2
cor,(m−1)
Ḡ(m−1) = cnk Ehk |yT gk (hk ) (5.84)
k=1
h i
cor,(m−1) cor,(m−1)
G(m−1)(h) = diag g1 (h1 ), . . . , gK (hK ) . (5.85)
Comparing the new cost function (5.75) with the cost function derived
from C-CSI, where we plug in Eh|yT G(m−1)(h)H for H, we have an addi-
tional regularization term for P . It regularizes the weighted norm of P with
weight given by X (m−1) .
The solution for F (m) , P (m) , and β (m) for a fixed ordering O and for given
receivers G(m−1)(h) is derived in Appendix D.6 with assumption (5.26). The
kth columns of F (m) and P (m) are
(m) 0k×M (m)
f k = −β (m) (O),(m−1) pk , (5.86)
Bk
and
208 5 Nonlinear Precoding with Partial Channel State Information
!−1
(m) (m),−1 (m−1) Ḡ(m−1) (O),(m−1),H
pk =β Rk + IM Ak ek (5.87)
Ptx
(O),(m−1)
where Ak ∈ Ck×M contains the conditional mean estimate of the
(O),(m−1)
effective channels to already precoded receivers and B k ∈ C(K−k)×M
to receivers which are precoded in the k + 1th to Kth precoding step. The
regularization term with X (m−1) in (5.75) yields a non-diagonal loading in
the inverse of P (m) (5.87).
(m) (m)
The precoding order O(m) = [p1 , . . . , pK ] for this iteration is deter-
mined from the minimum FTHP,(m)(O) of (5.75), which is given by (D.72).
We observe that the Kth term in the sum (i = K) depends only on
(O),(m−1) (m−1)
AK = HG and pK and is independent of pi , i < K. Therefore,
(m)
we start with i = K and choose the best receiver to be precoded last: pK is
the argument of
!−1
(m−1) (m−1),H (m−1) Ḡ(m−1) (m−1),H
max eT
k HG X (m−1)
+ HG HG + IM HG ek ,
k Ptx
(5.89)
which minimizes the contribution of the i = Kth term to the total remaining
MSE (D.72). Similarly, we continue with the (i − 1)th term, which depends
(m)
only on pi , i ∈ {K − 1, K}, but not on pi for i < K − 2.
In summary, O(m) is found by the arguments of
!−1
(O),(m−1) (m−1) Ḡ(m−1) (O),(m−1),H
max eT
k Ai Ri + IM Ai ek (5.90)
k Ptx
h i XK
(m−1)
FTHP,(m)(O) = tr ∆(m−1) C w = δk cwk . (5.91)
k=1
5.3 Optimization Based on Sum Performance Measures 209
and (5.47)
(m),∗
∗
hkT P (m) C w eo(m) − f̃ o(m)
cor,(m)
gk (hk ) = β (m),−1 h k k
i . (5.93)
tr P (m) C w P (m),H R∗hk |yT + cnk
K
X h i
(m)
Ehk |yT MMSETHP
k P (m)
, f̃ o
(m) ; hk
k
k=1
K
X h i
(m−1)
≤ Ehk |yT MMSETHP
k P (m−1) , f̃ o(m−1) ; hk (5.94)
k
k=1
K
X h i
(m)
Ehk |yT MCORTHP
k P (m) , f̃ o(m) ; hk
k
k=1
K
X h i
(m−1)
≤ Ehk |yT MCORTHP
k P (m−1) , f̃ o(m−1) ; hk . (5.95)
k
k=1
Asymptotically, β (m) → 1 as m → ∞.
The initialization is chosen similar to vector precoding above: For P-CSI,
we first compute F (0) , P (0) and O(0) applying the conditional mean estimate
Ĥ from y T to the THP for C-CSI with G = βI K , which is a special case of
Appendix D.6; for S-CSI we set F (0) = 0K , Π (O) = I K , and take P (0) equal
to the linear precoder from Appendix D.3.
210 5 Nonlinear Precoding with Partial Channel State Information
The algorithms for optimization of vector precoding with the restricted lattice
search and THP which we derive in the previous section are closely related.
We summarize their differences and similarities:
• The optimized parameters of vector precoding depend on {d[n]}N n=1 ,
d
For C-CSI, the MMSE receiver and scaled matched filter are identical. There-
fore, the corresponding optimization problems for vector precoding or THP
are also identical. The necessary parameters depend on the channel realiza-
tion H
5.3 Optimization Based on Sum Performance Measures 211
(m−1)
HG = G(m−1)(h)H (5.96)
(m−1) H (m−1) H (m−1) (m−1),H (m−1)
R =H G (h) G (h)H = HG HG (5.97)
K
X 2
mmse,(m−1)
Ḡ(m−1) = gk (hk ) cnk (5.98)
k=1
!
(m−1) Ptx (m−1) (m−1),H Ḡ(m−1)
Φ = HG HG + IK . (5.99)
Ḡ(m−1) Ptx
with β (m) chosen to satisfy tr[T (m) R̃b T (m),H ] = Ptx .14 The nonlinearity of
the precoder results from the optimization of a[n].
For THP, the effect of the modulo operators is described by C w . If as-
sumption (5.26) holds, e.g., for large Ptx and size of D, THP allows for an
interpretation of the solutions and the effect of the perturbation vector.
For C-CSI, the THP solution (5.86) and (5.87) for an optimized ordering
O = O(m) simplifies to
(m) (m) 0k×M (m)
f k = −β (O),(m−1) pk (5.102)
Bk
!−1
(m) (m),−1 (O),(m−1),H (O),(m−1) Ḡ(m−1) (O),(m−1),H
pk = β Ak Ak + IM Ak ek ,
Ptx
(5.103)
(O),(m−1)
where β (m) is determined by tr[P (m) C w P (m),H ] = Ptx and Ak ∈
(O),(m−1)
Ck×M contains the channels to already precoded receivers and B k ∈
C(K−k)×M to receivers which are precoded in the (k + 1)th to Kth step.
Therefore, our choice of a[n], i.e., the modulo operations in Figure 5.7, can
be interpreted as follows:
(m)
• f k cancels the interference from the kth precoded receiver to all subse-
(O),(m−1)
quent receivers with channel B k .
14
For G(m−1)(h) = I K we obtain our result (5.6) in the introduction.
212 5 Nonlinear Precoding with Partial Channel State Information
(m)
• Therefore, the forward filter pk for the kth precoded data stream has
to avoid interference only to the already precoded receivers with channel
(O),(m−1)
Ak , because the interference from the kth receiver’s data stream
has not been known in precoding steps 1 to k − 1 due to causality. Numer-
ically, this reduces the condition number of the matrix to be “inverted” in
(m)
pk .15 For k = 1, the full diversity and antenna gain is achieved because
(m)
p1 ∝ h∗p1 .
For small Ptx , this interpretation is not valid, because a[n] = 0, i.e., vector
precoding and THP are equivalent to linear precoding. If the modulo op-
erators are not removed from the receivers, which would require additional
signaling to the receivers, performance is worse than for linear precoding due
to the modulo loss [70, 108].
For P-CSI and in the limit for S-CSI, the channel G(m−1)(h)H is not known
perfectly.16 Therefore, the solutions are given in terms of the conditional
(m−1)
mean estimate of the total channel H G = Eh|yT [G(m−1)(h)H] and the
(m−1)
regularization matrix X . For MMSE receivers, X (m−1) is the sum of
mmse,(m−1)
error covariance matrices for the estimates Ehk |yT [gk (hk )hk ].
To judge the effect on nonlinear precoding we discuss the THP solu-
tion (5.86) and (5.87). Contrary to C-CSI, the feedback filter F (m) cannot
cancel the interference and the forward filter also has to compensate the inter-
ference to subsequently precoded data streams, which is described by X (m−1)
for O = O(m) . As long as (5.26) is valid the effect of a[n] can be understood
(O),(m−1)
as a partial cancelation of the interference described by B k . It results
in (cf. (5.87))
(O),(m−1),H (O),(m−1) (O),(m−1),H (O),(m−1)
X (m−1) + Ak Ak = R(m−1) − B k Bk ,
(5.104)
which differs in its last term from the corresponding term in (4.82) for linear
precoding.
For K ≤ M and good channels, i.e., spatially well separated receivers,
it is likely that the precoder chooses a[n] = 0 for S-CSI — even for large
Ptx . If this is the case, our interpretation of THP is not valid and nonlinear
precoding is identical to linear precoding.
15
Note that cwk is close to one for increasing size of the constellation D, i.e., the
input of P (m) has the same variance as to a linear precoder P (Section 4.1).
16
Note that the precoder always operates with a receiver model G(m−1)(h), which
can deviate from the implemented MMSE receiver, for example, if the iterative pro-
cedure has not converged yet or the scaled matched filter model is assumed.
5.3 Optimization Based on Sum Performance Measures 213
Example 5.6. Let us consider the example of channels with rank-one co-
variance matrices C hk . As in Section 4.4.3, the corresponding channel is
H = X V T with X = diag [x1 , . . . , xK ] and V ∈ CM ×K , whose columns
are the array steering vectors (Appendix E).
For this channel, X (m−1) (5.77) simplifies to
X (m−1) = V ∗ Ξ (m−1) V T
h i
(m−1) (m−1)
with Ξ (m−1) = diag ξ1 , . . . , ξK .
For MMSE receivers and vector precoding, we have
2
(m−1) 2 mmse,(m−1)
ξk = Exk ,hk |yT |xk | gk (hk )
h i2
mmse,(m−1)
− Exk ,hk |yT xk gk (hk ) .
From (5.24)
∗
(m−1) (m−1)
xk∗ v T
kT r̃ bbk
mmse,(m−1)
gk (hk ) = β (m−1),−1 (m−1)
,
|xk |2 v T
kT
(m−1) R̃
b T (m−1),H v ∗k + cnk
17
With [148], X (m−1) can be shown to be equal to one of the terms in the expression
for the sum of error covariance matrices. On the other hand, R(m−1) in (5.61) for
mmse,(m−1)
MMSE receivers is the sum of correlation matrices of gk (hk )hk ; with its
(m−1)
definition in (5.66), X is identical to the sum of error covariance matrices.
214 5 Nonlinear Precoding with Partial Channel State Information
(m−1)
with X G = Eh|yT [G(m−1)(h)X ] and mainly influenced by the spatial
separation of the receivers.
For rank-1 channels, we conclude:
• The channel estimation errors are modeled correctly when considering
MMSE receiver models. This yields an improved interference cancelation
at medium to high Ptx compared to the optimization based on a scaled
MF receiver.
• For Ptx → ∞, the solutions for MF receivers and MMSE receivers con-
verge.
• As for C-CSI (5.103), the advantage of nonlinear precoding for scaled
matched filter receivers compared to linear precoding becomes more ev-
(m)
ident considering THP: pk only considers interference to already pre-
coded data streams, which results in an improved condition number of the
matrix to be “inverted”. But in contrast to C-CSI, the interference cannot
be canceled completely by F (m) for finite Ptx . Still this leads to the full
antenna gain for the p1 th receiver because p1 ∝ v ∗p1 . Note that this inter-
pretation is not valid anymore, if a[n] = 0 due to the “good” properties
of H, i.e., V . ⊓
⊔
Q = T R̃b . (5.105)
Q = P C w (I K − F H )Π (O) (5.106)
Fig. 5.9 Cumulative distribution function of kQk2F (5.105) at 10 log10 (Ptx /cn ) =
20 dB and fmax = 0.2 for P-CSI (Scenario 1).
18
The mean angles correspond to the central three receivers of Scenario 2.
5.5 Performance Evaluation 217
Fig. 5.10 Mean uncoded BER vs. Ptx /cn with fmax = 0.2 for P-CSI (Scenario 1).
Fig. 5.11 Mean uncoded BER vs. fmax for 10 log10 (Ptx /cn ) = 20 dB (Scenario 1).
Discussion of Results
The mean uncoded bit error rate (BER), which is an average w.r.t. the K
receivers, serves as a simple performance measure.
In scenario 1, all C hk have full rank. In a typical example the largest
eigenvalue is about 5 times larger than the second and 30 times larger than
the third largest eigenvalue. Therefore, the transmitter does not know the
effective channel (including the receivers processing gk ) perfectly. Because
PK
k=1 rank[C hk ] = KM > M , the channels do not have enough structure
to allow for a full cancelation of the interference as for C-CSI. The BER
218 5 Nonlinear Precoding with Partial Channel State Information
Fig. 5.13 Mean uncoded BER vs. Ptx /cn for P-CSI with fmax = 0.2 (Scenario 2).
saturates for high Ptx /cn (Figure 5.10). The heuristic optimization for P-CSI
does not consider the interference sufficiently and the performance can be
improved by the robust optimization of THP based on the CM estimate of
the receivers’ cost, which is based on scaled matched filter receivers. Due
to the insufficient channel structure, robust THP can only achieve a small
additional reduction of the interference and the BER saturates at a lower
BER.
For increasing Doppler frequency, the heuristic design performs worse than
linear precoding (Figure 5.11). The robust solution for P-CSI ensures a per-
formance which is always better than linear precoding at high Ptx /cn . As for
5.5 Performance Evaluation 219
Fig. 5.14 Mean uncoded BER vs. Ptx /cn for S-CSI (Scenario 2).
linear precoding (Section 4.4.4), the prediction errors are small enough as
long as fmax ≤ 0.15 to yield a performance of the heuristic, which is very
close to our systematic approach.
For this scenario, approximately 10 iterations are necessary for convergence
(Figure 5.12). For S-CSI and scenario 2, the algorithms converge in the first
iteration.
The interference can already be canceled sufficiently by linear precoding
in scenario 2, for which the ratio of the first to the second eigenvalue of
C hk is about 300. Because the receivers’ channels are close to rank one,
the probability is very high that the channel realization for a receiver and
its predicted channel lie in the same one-dimensional subspace. Thus, the
performance degradation by the heuristic THP optimization is rather small
(Figure 5.13). But for S-CSI, the THP optimization based on the mean per-
formance measures gains 5 dB at high Ptx /cn compared to linear precoding
from Section 4.4 (Figure 5.14). Figure 5.15 shows the fast transition between
a performance close to C-CSI and S-CSI which is controlled by the knowledge
of the autocovariance sequence.
Optimization based on an MMSE receiver model results in a small per-
formance improvement at high Ptx /cn : For example, the discussion in Sec-
tion 5.3.2 shows that for S-CSI and rank[C hk ] = 1 the estimation error of the
effective channel is not zero, but the regularization matrix X (m−1) is zero for
optimization based on the scaled matched filter receiver model. Therefore,
the small degradation results from the mismatch to the implemented MMSE
receiver and the incomplete description of the error in CSI by X (m−1) .
220 5 Nonlinear Precoding with Partial Channel State Information
Fig. 5.15 Mean uncoded BER vs. fmax for 10 log10 (Ptx /cn ) = 20 dB (Scenario 2).
Fig. 5.16 Mean uncoded BER vs. Ptx /cn with fmax = 0.2 for P-CSI (Scenario 3,
M = 8, K = 4).
Fig. 5.17 Mean uncoded BER vs. Ptx /cn with fmax = 0.2 for P-CSI (Scenario 4,
M = 2, K = 3).
different parameters (Scenario 2). Inherently, the THP-like lattice search al-
ways chooses the first element of Π (O) a[n] to be zero, i.e., the probability to
be zero is always larger than 1/K. For small Ptx /cn , the probability converges
to one and the nonlinear precoder is linear: In BER we see a performance
degradation compared to linear precoding due to the modulo operators at the
222 5 Nonlinear Precoding with Partial Channel State Information
Fig. 5.18 Probability that a modulo operation at the transmitter is inactive (linear),
i.e., ak [n] = 0 (Scenario 2, M = 8).
Fig. 5.19 Mean uncoded BER vs. Ptx /cn of THP with fmax = 0.1 for P-CSI with
estimated covariance matrices (Scenario 1, B = 20).
Fig. 5.20 Mean uncoded BER vs. fmax of THP at 10 log10 (Ptx /cn ) = 20 dB with
estimated covariance matrices (Scenario 1, B = 20).
All previous performance results for nonlinear precoding and P-CSI have
assumed perfect knowledge of the channel covariance matrix C hT and the
noise covariance matrix C v = cn I M . As for linear precoding, we apply the
algorithms for estimating channel and noise covariance matrices which are
derived in Chapter 3 to estimate the parameters of ph|yT (h|y T ).
The model and the estimators of the covariance matrices are described
at the end of Section 4.4.4 (Page 165): Only B = 20 correlated realizations
h[q tx − ℓ] are observed via y[q tx − ℓ] for ℓ ∈ {1, 3, . . . , 2B − 1}, i.e., with a
period of NP = 2.
224 5 Nonlinear Precoding with Partial Channel State Information
The heuristic and an advanced estimator (“ML est.”) for the channel co-
variance matrix are compared; the details can be found on page 165. Only
the mean uncoded BER for THP is shown in Figures 5.19 and 5.20.
The loss in BER of THP based on the heuristic covariance estimator com-
pared to the advanced method from Chapter 3 is significant for fmax = 0.1
(Figure 5.19): The saturation level is at about 2 · 10−3 compared to (be-
low) 10−4 . THP based on the advanced estimator performs close to a perfect
knowledge of the channel correlations.
With the advanced estimator a performance close to C-CSI is achieved up
to fmax ≈ 0.1 (Figure 5.20). For fmax ≥ 0.15, both estimators yield a similar
BER of THP, because only the knowledge of the spatial channel covariance
matrix is relevant whose estimation accuracy in terms of BER is comparable
for both approaches.
Appendix A
Mathematical Background
1
px (x) = exp −(x − µx )H C −1
x (x − µx ) . (A.1)
π N det[C x]
225
226 A Mathematical Background
The trace of the square matrix A ∈ CN ×N with ijth element [A]i,j = ai,j is
N
X
△
tr[A] = ai,i . (A.4)
i=1
(A ⊗ B)T = AT ⊗ AT , (A.7)
A ⊗ α = α ⊗ A = αA, α ∈ C, (A.8)
aT ⊗ b = b ⊗ aT ∈ CN ×M , a ∈ CM , b ∈ CN , (A.9)
(A ⊗ B)(C ⊗ D) = (AC) ⊗ (BD), (A.10)
−1 −1 −1
(M ⊗ N ) =M ⊗N , M , N regular, (A.11)
tr[M ⊗ N ] = tr[M ] tr[N ] , (A.12)
N M
det[M ⊗ N ] = det[M ] det[N ] . (A.13)
The operation
a1
vec[A] = ... ∈ CM N
△
(A.14)
aN
with B ∈ CN ×P and C ∈ CP ×Q .
∂tr[AB]
= BT (A.24)
∂A
∂tr ARAH
= AR (A.25)
∂A∗
∂tr[AB]
= 0M ×N (A.26)
∂A∗
∂tr[ABAC]
= C T AT B T + B T AT C T . (A.27)
∂A
For M = N , we get
∂tr A−1 B T
= − A−1 BA−1 . (A.28)
∂A
If A is invertible and A = AH ,
∂det[A]
= det[A] A−T (A.29)
∂A
∂ln[det[A]]
= A−T . (A.30)
∂A
For Hermitian and regular C(θ) with θ ∈ C, it can be shown that [124]
∂ln[det[C(θ)]] ∂C(θ)
= tr C −1 (θ) (A.31)
∂θ ∂θ
∂C −1 (θ) ∂C(θ) −1
= −C −1 (θ) C (θ). (A.32)
∂θ ∂θ
g(x) ≤ 0S
h(x) = 0P
λ ≥ 0S (A.35)
λi gi (x) = 0, i = 1, 2, . . . , S
∂
L(x, λ, ν) = 0M
∂x
are necessary for a minimal point of problem (A.33). Alternatively, the partial
∂
derivative of the Lagrange function can be evaluated as ∂x ∗ L(x, λ, ν) = 0M
1 ∂ ∂
( ∂x L(x, λ, ν))∗ = ∂x∗
L(x, λ, ν) because L is real-valued.
Appendix B
Completion of Covariance Matrices
and Extension of Sequences
Matrix completion problems deal with partial matrices, whose entries are
only specified for a subset of the elements. The unknown elements of these
matrices have to be completed subject to additional constraints [130]. An
important issue is the existence of a completion. If it exists and is not unique,
the unknown elements can be chosen according to a suitable criterion, e.g.,
maximum entropy. Here, we are interested in partial covariance matrices and
their completions which have to satisfy the positive semidefinite constraint.
A fully specified matrix is positive semidefinite only if all its principal
submatrices are positive semidefinite. A partial positive semidefinite matrix
is Hermitian regarding the specified entries and all its fully specified principal
submatrices1 are positive semidefinite [117]. From this follows that a partial
matrix has a positive semidefinite completion only if it is partially positive
semidefinite.
The covariance matrix
c[0] c[1] . . . c[P (B − 1)]
. ..
c[1]∗ c[0] . . .
CT = ..
(B.1)
.. ..
. . . c[1]
c[P (B − 1)]∗ . . . c[1]∗ c[0]
1
A principal submatrix is obtained deleting a subset of the columns and the cor-
responding rows of a matrix. The main diagonal of a principal submatrix is on the
diagonal of the original matrix.
231
232 B Completion of Covariance Matrices and Extension of Sequences
known elements)
c[0] ? c[2] ? c[4]
? c[0] ? c[2] ?
(2) ∗
CT c[2]
= ? c[0] ? c[2]
. (B.2)
? c[2]∗ ? c[0] ?
c[4]∗ ? c[2]∗ ? c[0]
(P )
All fully specified principal submatrices of C T are the principal submatrices
of
c[0] c[P ] . . . c[P (B − 1)]
. ..
c[P ]∗ c[0] . . .
C̄ T = ..
, (B.3)
.. ..
. . . c[P ]
c[P (B − 1)]∗ . . . c[P ]∗ c[0]
(P )
which itself is a principal submatrix of C T . Thus, the known elements of
C T have to form a positive semidefinite Toeplitz matrix C̄ T . This necessary
condition makes sense intuitively, because its elements {c[P i]}B−1
i=0 form the
decimated sequence w.r.t. c[ℓ].
A set of specified positions in a partial matrix is called a pattern. For the
Toeplitz matrix C T , it is defined by
and includes the main diagonal to avoid the trivial case2 . A pattern P is said
to be positive semidefinite completable if for every partial positive semidefi-
nite Toeplitz matrix with pattern P a completion to a positive semidefinite
Toeplitz matrix exists.
Considering all possible patterns P for C T the following result is given by
Johnson et al.[117]:
Theorem B.1 (Johnson et al.[117]). A pattern P ⊆ {0, 1, . . . , P (B−1)} is
positive semidefinite completable if and only if P = {0, P, P 2, . . . , P (B − 1)}.
Alternatively to the proof given in [117], graph-theoretic arguments
(e.g. [131]) lead to the same result. In general the completion is not
unique. For example, the completion with zeros, i.e., c[P i − ℓ] = 0 for
ℓ ∈ {1, 2, . . . , P − 1} and i ∈ {1, 2, . . . , B − 1}, yields a positive definite
Toeplitz covariance matrix [117].
2
If the main diagonal was not specified, a positive definite completion could always
be found.
B.2 Band-Limited Positive Semidefinite Extension of Sequences 233
S(ω) ≥ 0, (B.5)
The frequency response of this filter is only positive on [−ωmax , ωmax ] (Fig-
ure B.1), i.e., if the input sequence is not band-limited to ωmax or indefinite
on Ω, the output is an indefinite sequence. A test for band-limited positive
semidefinite extendibility can be derived based on the in- and output of this
filter. It results in the following characterization.
Theorem B.3 (Arun and Potter [6]).
1. The sequence {c[ℓ]}N ℓ=0 has a positive semidefinite extension band-limited
to ωmax if and only if the (N + 1) × (N + 1) Toeplitz matrix C T with first
row [c[0], c[1], . . . , c[N ]] and the N × N Toeplitz matrix C ′T with first row
234 B Completion of Covariance Matrices and Extension of Sequences
[c′ [0], c′ [1], . . . , c′ [N − 1]] are both positive semidefinite, where c′ [ℓ] is given
by (B.6).
2. If this partial sequence is extendible in this sense, the extrapolation is
unique if and only if C T or C ′T , or both, is singular.
We illustrate this theorem with two examples: The first is required for our
argumentation in Section 2.4.3.2, the second example for Section 3.4.
Example B.1. Given the sequence c[0] = 1 and c[ℓ] = 0, ℓ ∈ {1, 2, . . . , B}, for
which maximum frequencies fmax does a positive semidefinite band-limited
extension exist?
The output in (B.6) is
C ′T = −2 cos(ωmax ) ≥ 0 (B.10)
which is also positive semidefinite for fmax ≥ 1/3. The necessary and sufficient
fmax increases further for B = 3: fmax ≥ 0.375. For B → ∞, we have
fmax → 0.5, i.e., no band-limited positive semidefinite extension exists in the
limit. ⊓
⊔
Example B.2. The partial decimated sequence c[0], c[P ], . . . , c[P (B − 1)] is
given. Does the sequence c[0], 0, . . . , 0, c[P ], . . . , 0, . . . , 0, c[P (B − 1)], which is
the given sequence interpolated with P − 1 zeros, have a positive semidefinite
and band-limited extension?
For P = 2, the output in (B.6) is
1 ′
2. and the following additional condition holds for the case fmax < 2P : C̄ T
with first row [c′ [0], c′ [P ], . . . , c′ [P (B − 2)]] is positive semidefinite, where
c′ [P i] is defined as c′ [P i] = c[P (i − 1)] − 2 cos(2πfmax ′
)c[P i] + c[P (i + 1)]
′
for fmax = fmax P .
Under these conditions, the spectral distribution is unique if and only if the
′ 1
Toeplitz matrix C̄ T or C̄ T , or both, is singular and fmax ≤ 2P .
Proof. The proof follows directly from Theorems B.1, B.3, and a reasoning
similar to the sampling theorem. We distinguish four cases of fmax :
1) fmax = 1/2: This corresponds to no band-limitation and is related to
Theorem B.1, which shows that a positive definite matrix completion for
(P)
all partially positive semidefinite C T exists if and only if the pattern P is
periodic. Then partial positive semidefiniteness is equivalent to C̄ T 0 (B.3).
Given a positive semidefinite completion, a positive semidefinite extension of
this sequence can be found [6]. The extension is unique given a singular
completion C T , but the matrix completion itself is not unique.
A periodic pattern is also necessary for fmax < 1/2, because a pos-
itive semidefinite C T is necessary for existence of a (band-limited) posi-
tive semidefinite sequence extension. Therefore, we restrict the following ar-
gumentation to P = {ℓ|ℓ = P i, i ∈ {0, 1, . . . , B − 1}, N = P (B − 1)}. (Note
that Theorem B.4 requires existence for every partially positive semidefinite
(P)
C T .) For the remaining cases with fmax < 1/2, we first proof (band-limited)
extendibility of the decimated sequence followed by arguments based on in-
terpolation because of the periodicity of P.
B.3 Generalized Band-Limited Trigonometric Moment Problem 237
3
To test the decimated sequence for positive semidefinite extendibility, we have to
′
consider the maximum frequency fmax of the P -fold decimated sequence c[P i].
Appendix C
Robust Optimization from the
Perspective of Estimation Theory
where the cost function depends on an unknown parameter h and the con-
straint set X is independent of h. Furthermore, an observation y with the
probability density py (y; h) is available.
The question arises, whether we should estimate the unknown parameter
h and plug this estimate into the cost function or estimate the cost function
itself. In the first case an additional question arises: What is the optimum
estimator for h in the context of this optimization problem?
239
240 C Robust Optimization from the Perspective of Estimation Theory
Fig. C.1 Optimization for an application with cost function F(x; h) and unknown
parameters h which can be estimated from the observations y.
xML
opt (ĥML ) = argmin F(x; ĥML ). (C.4)
x∈X
Many results in the literature follow this approach — but some follow this
principle, although their estimate ĥ does not maximize the likelihood (see
first heuristic approach below). What are alternative methods?
Given the a posteriori density ph|y (h|y), the estimate minimizing the MSE
w.r.t. h is the conditional mean (CM) ĥCM = Eh|y [h] (Section 2.2.1) [124].
Because we are not interested in h directly, we optimize for the cost function
with minimum MSE. This yields the CM estimate of the cost function
Robust Optimization
i.e., all a priori information included in pε|y (ε|y) has already been exploited
in ĥ.
∂F(x; ĥ + ε)
F(x; ĥ + ε) ≈ F(x; ĥ + E[ε]) + (ε − E[ε])
∂εT ε=E[ε]
∂F(x; ĥ + ε) ∗
+ (ε∗ − E[ε] ) (C.8)
∂εH ε=E[ε]
Finding a closed form expression for the expectation of the cost function as
in (C.5) and (C.6) can be difficult. Alternatively we could also estimate the
argument of the original optimization problem (C.1)
∂L
= |β|2 RP C d − β ∗ H H
G C d + µP C d = 0M ×K (D.4)
∂P ∗
∂L
∗
= β Ḡ + βtr P H RP C d − tr P H H H GC d = 0 (D.5)
∂β
tr P H P C d ≤ Ptx (D.6)
µ(tr P H P C d − Ptx ) = 0. (D.7)
1
For example, it is equivalent to the sum MSE Ed ,n,h|yT [kd̂ − d k22 ] with d̂ =
βG(h )H P d + βG(h )n based on the model in Figure 4.4.
243
244 D Detailed Derivations for Precoding with Partial CSI
H
β = tr P H H H C
G d Ḡ + tr P RP C d (D.8)
|β|2 Ḡ
µ= . (D.11)
Ptx
The expression for the minimum of (D.1) can be simplified factoring out
H G (R + PḠtx I M )−1 to the left and (R + PḠtx I M )−1 H H
G C d to the right in the
trace of the second, third, and fourth term in (D.2). Finally, minP ,β F(P , β)
can be written as
D.2 Conditional Mean for Phase Compensation at the Receiver 245
!−1
Ḡ
tr[C d ] − tr H G R+ IM HH
GC d
Ptx
!−1 −1
Ḡ
HH
= tr C d H G X+ IM G + IK (D.14)
Ptx
with X = R − H H
G H G , where we applied the matrix inversion lemma (A.20)
with C −1 = X + PḠtx I M and B = H G in the second step.
Defining the random variable zk = hkT q s we can rewrite it using the properties
of the conditional expectation
T ∗ ∗ ∗
(hk q s ) zk zk
Ehk |yT hk = E hk ,zk |y T h k = Ezk |y T Eh |z ,y [h k .
]
|hkT q s | |zk | |zk | k k T
(D.17)
Because phk |zk ,yT (hk |z k , y T ) is complex Gaussian the conditional mean
Ehk |zk ,yT [hk ] is equivalent to the (affine) LMMSE estimator
Ehk |zk ,yT [hk ] = Ehk |yT [hk ] + chk zk |yT c−1
zk |y T zk − Ezk |y T [zk ] (D.18)
with covariances
chk zk |yT = Ehk ,zk |yT hk − Ehk |yT [hk ] (zk − Ezk |yT [zk ])∗
= C hk |yT q ∗s (D.19)
czk |yT = Ezk |yT |zk − Ezk |yT [zk ] |2 = q H ∗
s C hk |y T q s . (D.20)
with ĥk = Ehk |yT [hk ]. The estimator (D.18) has to be understood as the
estimator of hk given zk under the “a priori distribution” phk |yT (hk |y T ).
Defining gk (zk ) = zk∗ /|zk | and applying (D.18) to (D.17) we obtain
Ehk ,zk |yT [gk (zk )hk ] = Ezk |yT [gk (zk )] Ehk |yT [hk ] +
+ chk zk |yT c−1
zk |y T Ezk |y T [|zk |] − Ezk |y T [gk (zk )] Ezk |y T [zk ] . (D.22)
With [148] the remaining terms, i.e., the CM of the receivers’ model gk (zk )
and the magnitude |zk |, are given as
Its explicit solution is derived in the sequel [25]. We write the cost function
explicitly
K
(h T pk )∗ X
Ehk MSEk P , β kT ; hk = 1 + |β|2 pH ∗ 2
i C hk pi + |β| cnk
|hk pk | i=1
T T !
(hk pk )∗
− 2Re β Ehk hk pk . (D.29)
|hkT pk |
K
X K
X
H
L(P , β, µ) = K + |β|2 pH
k Eh H H pk + |β|
2
cnk
k=1 k=1
K K
!
√ X 1/2 X
− πRe(β) pH ∗
k C hk pk +µ kpk k22 − Ptx . (D.31)
k=1 k=1
∂L(P , β, µ) 1√ −1/2 ∗
∗ = |β|2 Eh H H H pk − πRe(β) pH ∗
k C hk pk C hk pk
∂pk 2
+ µpk = 0M (D.32)
K K √ X K
∂L(P , β, µ) X H X π 1/2
∗
=β pH
i Eh H H pi + β cnk − pH ∗
k C hk pk =0
∂β i=1
2
k=1 k=1
(D.33)
tr P H P ≤ 0 (D.34)
µ(tr P H P − Ptx ) = 0. (D.35)
pk = αk p̃k (D.38)
−1 ∗ 2 1/2
Eh H H H + ξI M C hk p̃k = αk β √ p̃H C ∗ p̃ p̃k . (D.39)
π k hk k
2
Although the direct solution of the generalized eigenvalue problem is numerically
more efficient.
D.4 Proof of BC-MAC Duality for AWGN BC Model 249
We complete the proof in Section 4.5.1 and show the converse: For a given
dual MAC model, the same performance can be achieved in the BC with
same total transmit power. Using the transformations (4.131) to (4.133) the
left hand side of (4.127) reads (cf. (4.49))
Ehk |yT [CORk (P , gk (hk ); hk )] = 1+Ehkmac |yT |gmack (hk )| Ehk |yT |gk (hk )|2
mac 2
K
!
X
,T ,∗
× cnk + cnk ξi2 umac
i Rhkmac |yT umac
i
i=1
h i
mac mac ,T mac mac
− 2Ehkmac |yT Re gmac
k (hk )hk uk pk . (D.44)
K
!
X
Ehkmac |yT |gmac
k (hk )| ξk−2 pmac
mac 2
k
,2
1+ ξi2 umac
i
,T
Rhkmac |yT umac
i
,∗
i=1
K
!
X
,2 mac ,T ,∗
= Ehkmac |yT |gmac mac 2
k (hk )| kumac 2
k k2 + pmac
i uk Rhimac |yT umac
k ,
i=1
(D.45)
which simplifies to
K
!
X
,2 ,2 mac ,H ∗
pmac
k = ξk2 kumac 2
k k2 + pmac
i uk Rh mac |yT umac
k
i
i=1
K
X
,2 ,H ∗
− pmac
k ξi2 umac
i Rh mac |yT umac
i . (D.46)
k
i=1
,2 mac ,2 ,2 T
Defining lmac = [pmac
1 , p2 , . . . , pmac
K ] and
mac ,2 mac ,H ∗
−pu
uv Rhmac |yT umac
v , v 6= u
u
K
[W mac ]u,v = kumac k2 + P
pmac ,2 mac ,H ∗
uv Rh mac |yT umac ,u=v (D.47)
v
2
i=1
i i
v
i6=v
of
X X
,2 mac ,H ∗
[W mac ]v,v > [W mac ]u,v = pmac
u uv Rhumac |yT umac
v . (D.49)
u6=v u6=v
This concludes the “only-if” part of the proof for Theorem 4.1.
K
X
∗ T
|βk |2 Ehk |yT |gk (hk )|2 cnk + |βk |2 pH 2
i Ehk |y T hk hk |gk (hk )| pi
i=1
K
X ∗ T
= kpk k22 ξk−2 + |βi |2 ξi2 ξk−2 pH 2
k Ehi |y T hi hi |gi (hi )| pk . (D.51)
i=1
It can be simplified to
K
!
X ∗ T
kpk k22 = ξk2 |βk |2 Ehk |yT |gk (hk )|2 cnk + pH 2
i Ehk |y T hk hk |gk (hk )| pi
i=1
K
X ∗ T
− |βi |2 ξi2 pH 2
k Ehi |y T hi hi |gi (hi )| pk , (D.52)
i=1
W ξ = l. (D.53)
To prove the “only if” part of Theorem 4.2, we rewrite the left hand side
of (4.141) with the transformations (4.145)
K
!
X h i
pmac
k
,2
= ξk2 kg mac
k k2
2
+ pmac
i
,2 mac ,H
gk Ehimac |yT himac ,∗ himac ,T g mac
k
i=1
K
X h i
,2 2 mac ,H mac ,∗ mac ,T
− pmac
k ξ g
i i Ehk
mac |y
T
h k h k g mac
i . (D.60)
i=1
the variance
K
X h i
cv = kg mac
v k22 + pmac
i
,2 mac ,H
gv Ehimac |yT himac ,∗ himac ,T g mac
v
i=1
,2 mac ,2 ,2 T
of the output of g mac
v , and lmac = [pmac
1 , p2 , . . . , pmac
K ] . If kg mac 2
k k2 > 0
mac
for all k, W is (column) diagonally dominant due to
X
[W mac ]v,v > [W mac ]u,v =
u6=v
X h i
= pmac
u
,2 mac ,H
gv Ehumac |yT humac ,∗ humac ,T g mac
v . (D.62)
u6=v
because of (5.26). Taking into account the constraint on its structure, the
solution F = [f 1 , . . . , f K ] reads
0k×M
f k = −β (O) pk , (D.65)
Bk
(O) (O)
with Ak ∈ Ck×M and B k ∈ C(K−k)×M .
For minimization of (D.63) w.r.t. P , we construct the Lagrange function
L (P , F , β, O, µ) = FTHP(P , F , β, O) + µ tr P C w P H − Ptx (D.67)
K
!−2
X (O) (O),H (O) Ḡ (O),H
β2 = cwk eT
k Ak X+ Ak Ak + IM Ak ek Ptx .
Ptx
k=1
(D.71)
For X = 0M and high SNR, the matrix inversion lemma A.2.2 has to be
applied to (D.70) and (D.71) to ensure existence of the solution.
It remains to find the precoding order O. Applying (D.65), (D.70),
and (D.71) to (D.63) we can simplify the expression for its minimum to
254 D Detailed Derivations for Precoding with Partial CSI
FTHP(O) =
K
!−1
X (O) (O),H (O) Ḡ (O),H
= tr[C w ] − cwk eT
k Ak X+ Ak Ak + IM Ak ek .
Ptx
k=1
(D.72)
Fig. E.1 Uniform linear array with spacing η and mean azimuth direction ϕ̄k of
propagation channel to terminal k (with angular spread).
For the evaluation of the algorithms presented in this book, the fol-
lowing model for the wireless channel is employed: The Q samples of the
P = M K(L + 1) channel parameters hT [q] ∈ CP Q (2.8) are assumed to
be a zero-mean complex Gaussian distributed Nc (0P Q , C hT ) and stationary
random sequence. They are generated based on the Karhunen-Loève expan-
1/2
sion, i.e., hT [q] = C hT ζ[q] with ζ[q] ∼ Nc (0, I P Q ). The structure of C hT is
exploited to reduce the computational complexity.
We assume identical temporal correlations for every element in h[q] (see
Section 2.1 for details), which results in C hT = C T ⊗ C h (2.12) with C T ∈
TQ P
+,0 and C h ∈ S+,0 .
If not defined otherwise, Clarke’s model [206, p. 40] is chosen for the tempo-
ral correlations ch [ℓ] = ch [0]J0 (2πfmax ℓ).1 Its power spectral density
is shown
in Figure 2.12. To be consistent with the definition of C h = E h[q]h[q]H and
to obtain a unique Kronecker model, the first row of the Toeplitz covariance
matrix C T in the Kronecker model is [ch [0], . . . , ch [Q − 1]]T /ch [0].
1
J0 (•) is the Bessel function of the first kind and of order zero.
255
256 E Channel Scenarios for Performance Evaluation
which contains the phase shifts resulting from the array geometry. A common
value of the interelement spacing is η = λc /2. The azimuth angle ϕ is defined
w.r.t. the perpendicular on the array’s orientation (Figure E.1).
Thus, the spatial covariance matrix is modeled as
Z π/2
C hk = ck a(ϕ)a(ϕ)H pϕ (ϕ + ϕ̄k ) dϕ (E.1)
−π/2
with mean azimuth direction ϕ̄k and tr[C hk ] = M . A widely used choice
for the probability density pϕ (ϕ + ϕ̄k ) describing the azimuth spread is a
truncated Laplace distribution
√
pϕ (ϕ) ∝ exp − 2|ϕ|/σ , |ϕ| ≤ π/2, (E.2)
259
260 F List of Abbreviations
261
262 References
13. Barton, T.A., Smith, S.T.: Structured Covariance Estimation for Space-Time
Adaptive Processing. In: IEEE International Conference on Acoustics, Speech,
and Signal Processing, pp. 3493–3496 (1997)
14. Bello, P.A.: Characterization of Randomly Time-Variant Linear Channels. IEEE
Trans. on Comm. Systems 11(12), 360–393 (1963)
15. Ben-Haim, Z., Eldar, Y.C.: Minimax Estimators Dominating the Least-Squares
Estimator. In: Proc. of the IEEE Int. Conf. on Acoustics, Speech and Signal
Processing, vol. 4, pp. 53–56 (2005)
16. Ben-Tal, A., Nemirovski, A.: Robust convex optimization. Mathematics of Op-
erations Research 23 (1998)
17. Ben-Tal, A., Nemirovski, A.: Robust Optimization - Methodology and Applica-
tions. Mathematical Programming Series B 92, 453–480 (2002)
18. Bengtsson, M., Ottersten, B.: Uplink and Downlink Beamforming for Fading
Channels. In: Proc. of the 2nd IEEE Workshop on Signal Processing Advances
in Wireless Communications, pp. 350–353 (1999)
19. Bengtsson, M., Ottersten, B.: Optimum and Suboptimum Transmit Beamform-
ing. In: Handbook of Antennas in Wireless Communications, chap. 18. CRC
Press (2001)
20. Bengtsson, M., Volcker, B.: On the Estimation of Azimuth Distributions and
Azimuth Spectra. In: Proc. Vehicular Technology Conference Fall, vol. 3, pp.
1612–1615 (2001)
21. Berman, A.: Nonnegative Matrices in the Mathematical Sciences. Academic
Press (1979)
22. Beutler, F.: Error Free Recovery of Signals from Irregular Sampling. SIAM Rev.
8, 322–335 (1966)
23. Bienvenu, G., Kopp, L.: Optimality of High Resolution Array Processing Using
the Eigensystem Approach. IEEE Trans. Acoust., Speech, Signal Processing
31(5), 1235–1248 (1983)
24. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press
(2004)
25. Breun, P., Dietrich, F.A.: Precoding for the Wireless Broadcast Channel based
on Partial Channel Knowledge. Tech. Rep. TUM-LNS-TR-05-03, Technische
Universität München (2005)
26. Brewer, J.W.: Kronecker Products and Matrix Calculus in System Theory. IEEE
Trans. Circuits Syst. 25, 772–781 (1978)
27. Brown, J.L.: On the Prediction of a Band-Limited Signal from Past Samples.
Proc. IEEE 74(11), 1596–1598 (1986)
28. Brown, J.L., Morean, O.: Robust Prediction of Band-Limited Signals from Past
Samples. IEEE Trans. Inform. Theory 32(3), 410–412 (1986)
29. Burg, J.P., Luenberger, D.G., Wenger, D.L.: Estimation of Structured Covari-
ance Matrices. Proc. IEEE 70(9), 500–514 (1982)
30. Byers, G.J., Takawira, F.: Spatially and Temporally Correlated MIMO Chan-
nels: Modeling and Capacity Analysis. IEEE Trans. Veh. Technol. 53(3), 634–
643 (2004)
31. Caire, G.: MIMO Downlink Joint Processing and Scheduling: A Survey of Clas-
sical and Recent Results. In: Proc. of Workshop on Information Theory and its
Applications. San Diego, CA (2006)
32. Caire, G., Shamai, S.: On the Achievable Throughput of a Multiantenna Gaus-
sian Broadcast Channel. IEEE Trans. Inform. Theory 49(7), 1691–1706 (2003)
33. Chaufray, J.M., Loubaton, P., Chevalier, P.: Consistent Estimation of Rayleigh
Fading Channel Second-Order Statistics in the Context of the Wideband CDMA
Mode of the UMTS. IEEE Trans. Signal Processing 49(12), 3055–3064 (2001)
34. Choi, H., Munson, Jr., D.C.: Analysis and Design of Minimax-Optimal Interpo-
lators. IEEE Trans. Signal Processing 46(6), 1571–1579 (1998)
References 263
35. Choi, H., Munson, Jr., D.C.: Stochastic Formulation of Bandlimited Signal In-
terpolation. IEEE Trans. Circuits Syst. II 47(1), 82–85 (2000)
36. Chun, B.: A Downlink Beamforming in Consideration of Common Pilot and
Phase Mismatch. In: Proc. of the European Conference on Circuit Theory and
Design, vol. 3, pp. 45–48 (2005)
37. Cools, R.: Advances in Multidimensional Integration. Journal of Computational
and Applied Mathematics 149, 1–12 (2002)
38. Cools, R., Rabinowitz, P.: Monomial Cubature Rules since “Stroud”: A Compi-
lation. Journal of Computational and Applied Mathematics 48, 309–326 (1993)
39. Costa, M.: Writing on Dirty Paper. IEEE Trans. Inform. Theory 29(3), 439–441
(1983)
40. Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley &
Sons (1991)
41. Czink, N., Matz, G., Seethaler, D., Hlawatsch, F.: Improved MMSE Estimation
of Correlated MIMO Channels using a Structured Correlation Estimator. In:
Proc. of the 6th IEEE Workshop on Signal Processing Advances in Wireless
Communications, pp. 595–599. New York City (2005)
42. D. Forney, Jr., G.: On the Role of MMSE Estimation in Approaching the
Information-Theoretic Limits of Linear Gaussian Channels: Shannon Meets
Wiener. In: Proc. 2003 Allerton Conf., pp. 430–439 (2003)
43. Dahl, J., Vandenberghe, L., Fleury, B.H.: Robust Least-Squares Estimators
Based on Semidefinite Programming. In: Conf. Record of the 36th Asilomar
Conf. on Signals, Systems, and Computers, vol. 2, pp. 1787–1791. Pacific Grove,
CA (2002)
44. Dante, H.M.: Robust Linear Prediction of Band-Limited Signals. In: Proc. of
the IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 4, pp.
2328–2331 (1988)
45. Degerine, S.: On Local Maxima of the Likelihood Function for Toeplitz Matrix
Estimation. IEEE Trans. Signal Processing 40(6), 1563–1565 (1992)
46. Dembo, A.: The Relation Between Maximum Likelihood Estimation of Struc-
tured Covariance Matrices and Periodograms. IEEE Trans. Acoust., Speech,
Signal Processing 34(6), 1661–1662 (1986)
47. Dembo, A., Mallows, C.L., Shepp, L.A.: Embedding Nonnegative Definite
Toeplitz Matrices in Nonnegative Definite Circulant Matrices, with Applica-
tion to Covariance Estimation. IEEE Trans. Inform. Theory 35(6), 1206–1212
(1989)
48. Dharanipragada, S., Arun, K.S.: Bandlimited Extrapolation Using Time-
Bandwidth Dimension. IEEE Trans. Signal Processing 45(12), 2951–2966 (1997)
49. Dietrich, F.A., Breun, P., Utschick, W.: Tomlinson-Harashima Precoding: A
Continuous Transition from Complete to Statistical Channel Knowledge. In:
Proc. of the IEEE Global Telecommunications Conference, vol. 4, pp. 2379–
2384. Saint Louis, U.S.A. (2005)
50. Dietrich, F.A., Breun, P., Utschick, W.: Robust Tomlinson-Harashima Precod-
ing for the Wireless Broadcast Channel. IEEE Trans. Signal Processing 55(2),
631–644 (2007)
51. Dietrich, F.A., Dietl, G., Joham, M., Utschick, W.: Robust and reduced rank
space-time decision feedback equalization. In: T. Kaiser, A. Bourdoux, H. Boche,
J.R. Fonollosa, J.B. Andersen, W. Utschick (eds.) Smart Antennas—State-of-
the-Art, EURASIP Book Series on Signal Processing and Communications,
chap. Part 1: Receiver, pp. 189–205. Hindawi Publishing Corporation (2005)
52. Dietrich, F.A., Hoffmann, F., Utschick, W.: Conditional Mean Estimator for the
Gramian Matrix of Complex Gaussian Random Variables. In: Proc. of the IEEE
Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 3, pp. 1137–1140.
Philadelphia, U.S.A. (2005)
264 References
53. Dietrich, F.A., Hunger, R., Joham, M., Utschick, W.: Robust Transmit Wiener
Filter for Time Division Duplex Systems. In: Proc. of the 3rd IEEE Int. Symp.
on Signal Processing and Information Technology, pp. 415–418. Darmstadt, Ger-
many (2003)
54. Dietrich, F.A., Ivanov, T., Utschick, W.: Estimation of Channel and Noise Cor-
relations for MIMO Channel Estimation. In: Proc. of the Int. ITG/IEEE Work-
shop on Smart Antennas. Reisensburg, Germany (2006)
55. Dietrich, F.A., Joham, M., Utschick, W.: Joint Optimization of Pilot Assisted
Channel Estimation and Equalization applied to Space-Time Decision Feedback
Equalization. In: Proc. of the IEEE Int. Conf. on Communications, vol. 4, pp.
2162–2167. Seoul, South Korea (2005)
56. Dietrich, F.A., Utschick, W.: Pilot-Assisted Channel Estimation Based on
Second-Order Statistics. IEEE Trans. Signal Processing 53(3), 1178–1193 (2005)
57. Dietrich, F.A., Utschick, W.: Robust Tomlinson-Harashima Precoding. In: Proc.
of the IEEE 16th Int. Symp. on Personal, Indoor and Mobile Radio Communi-
cations, vol. 1, pp. 136–140. Berlin, Germany (2005)
58. Dietrich, F.A., Utschick, W., Breun, P.: Linear Precoding Based on a Stochastic
MSE Criterion. In: Proc. of the 13th European Signal Processing Conference.
Antalya, Turkey (2005)
59. Dimic, G., Sidiropoulos, N.D.: On Downlink Beamforming with Greedy User
Selection: Performance Analysis and a Simple New Algorithm. IEEE Trans.
Signal Processing 53(10), 3857–3868 (2005)
60. Dogandzic, A., Jin, J.: Estimating Statistical Properties of MIMO Fading Chan-
nels. IEEE Trans. Signal Processing 53(8), 3065–3080 (2005)
61. Dorato, P.: A Historical Review of Robust Control. IEEE Control Syst. Mag.
7(2), 44–56 (1987)
62. Doucet, A., Wang, X.: Monte Carlo Methods for Signal Processing: A Review in
the Statistical Signal Processing Context. IEEE Signal Processing Mag. 22(6),
152–170 (2005)
63. Duel-Hallen, A., Hu, S., Hallen, H.: Long-Range Prediction of Fading Signals.
IEEE Signal Processing Mag. 17(3), 62–75 (2000)
64. Ekman, T.: Prediction of Mobile Radio Channels Modeling and Design. Disser-
tation, Uppsala University (2002). ISBN 91-506-1625-0
65. Erez, U., Brink, S.: A Close-to-Capacity Dirty Paper Coding Scheme. IEEE
Trans. Inform. Theory 51(10), 3417–3432 (2005)
66. Ertel, R.B., Cardieri, P., Sowerby, K.W., Rappaport, T.S., Reed, J.H.: Overview
of Spatial Channel Models for Antenna Array Communication Systems. IEEE
Personal Commun. Mag. 5(1), 10–22 (1998)
67. Eyceoz, T., Duel-Hallen, A., Hallen, H.: Deterministic Channel Modeling and
Long Range Prediction of Fast Fading Mobile Radio Channels. IEEE Commun.
Lett. 2(9), 254–256 (1998)
68. Fessler, J.A., Hero, A.O.: Space-Alternating Generalized Expectation-
Maximization Algorithm. IEEE Trans. Signal Processing 42(10), 2664–2677
(1994)
69. Fischer, R.: Precoding and Signal Shaping for Digital Transmission. John Wiley
& Sons, Inc. (2002)
70. Fischer, R.F.H.: The Modulo-Lattice Channel: The Key Feature in Precoding
Schemes. International Journal of Electronics and Communications pp. 244–253
(2005)
71. Fischer, R.F.H., Windpassinger, C., Lampe, A., Huber, J.B.: MIMO Precoding
for Decentralized Receivers. In: Proc. IEEE Int. Symp. on Information Theory,
p. 496 (2002)
72. Fischer, R.F.H., Windpassinger, C., Lampe, A., Huber, J.B.: Tomlinson-
Harashima Precoding in Space-Time Transmission for Low-Rate Backward
References 265
113. Joham, M., Schmidt, D.A., Brehmer, J., Utschick, W.: Finite-Length MMSE
Tomlinson-Harashima Precoding for Frequency Selective Vector Channels.
IEEE Trans. Signal Processing 55(6), 3073–3088 (2007)
114. Joham, M., Schmidt, D.A., Brunner, H., Utschick, W.: Symbol-Wise Order Op-
timization for THP. In: Proc. ITG/IEEE Workshop on Smart Antennas (2007)
115. Joham, M., Utschick, W.: Ordered Spatial Tomlinson Harashima Precod-
ing. In: T. Kaiser, A. Bourdoux, H. Boche, J.R. Fonollosa, J.B. Andersen,
W. Utschick (eds.) Smart Antennas—State-of-the-Art, EURASIP Book Series
on Signal Processing and Communications, chap. Part 3: Transmitter, pp. 401–
422. EURASIP, Hindawi Publishing Corporation (2006)
116. Joham, M., Utschick, W., Nossek, J.A.: Linear Transmit Processing in MIMO
Communications Systems. IEEE Trans. Signal Processing 53(8), 2700–2712
(2005)
117. Johnson, C.R., Lundquist, M., Nævdal, G.: Positive Definite Toeplitz Comple-
tions. Journal of London Mathematical Society 59(2), 507–520 (1999)
118. Johnson, D.B.W.D.H.: Robust Estimation of Structured Covariance Matrices.
IEEE Trans. Signal Processing 41(9), 2891–2906 (1993)
119. Jongren, G., Skoglund, M., Ottersten, B.: Design of Channel-Estimate-
Dependent Space-Time Block Codes. IEEE Trans. Commun. 52(7), 1191–1203
(2004)
120. Jung, P.: Analyse und Entwurf digitaler Mobilfunksysteme, 1st edn. B.G. Teub-
ner (1997)
121. Kailath, T., Sayed, A.H., Hassibi, B.: Linear Estimation. Prentice Hall (2000)
122. Kassam, A.A., Poor, H.V.: Robust Techniques for Signal Processing: A Survey.
Proc. IEEE 73(3), 433–481 (1985)
123. Kassam, S.A., Poor, H.V.: Robust Techniques for Signal Processing: A Survey.
Proc. IEEE 73(3), 433–481 (1985)
124. Kay, S.M.: Fundamentals of Statistical Signal Processing - Estimation Theory,
1st edn. PTR Prentice Hall (1993)
125. Krim, H., Viberg, M.: Two Decades of Array Signal Processing Research. IEEE
Signal Processing Mag. 13(4), 67–94 (1996)
126. Kusume, K., Joham, M., Utschick, W., Bauch, G.: Efficient Tomlinson-
Harashima Precoding for Spatial Multiplexing on Flat MIMO Channel. In:
Proc. of the IEEE Int. Conf. on Communications, vol. 3, pp. 2021–2025 (2005)
127. Kusume, K., Joham, M., Utschick, W., Bauch, G.: Cholesky Factorization with
Symmetric Permutation Applied to Detecting and Precoding Spatially Multi-
plexed Data Streams. IEEE Trans. Signal Processing 55(6), 3089–3103 (2007)
128. Lapidoth, A., Shamai, S., Wigger, M.: On the Capacity of Fading MIMO Broad-
cast Channels with Imperfect Transmitter Side-Information. In: Proc. of the
43rd Annual Allerton Conference on Communication, Control, and Computing,
pp. 28–30 (2005). Extended version online: http://arxiv.org/abs/cs.IT/0605079
129. Larsson, E.G., Stoica, P.: Space-Time Block Coding for Wireless Communica-
tions. Cambridge University Press (2003)
130. Laurent, M.: Matrix Completion Problems. In: The Encyclopedia of Optimiza-
tion, vol. III, pp. 221–229. Kluwer (2001)
131. Lev-Ari, H., Parker, S.R., Kailath, T.: Multidimensional Maximum-Entropy Co-
variance Extension. IEEE Trans. Inform. Theory 35(3), 497–508 (1989)
132. Li, C.K., Mathias, R.: Extremal Characterizations of the Schur Complement
and Resulting Inequalities. SIAM Review 42(2), 233–246 (2000)
133. Li, H., Stoica, P., Li, J.: Computationally Efficient Maximum Likelihood Es-
timation of Structured Covariance Matrices. IEEE Trans. Signal Processing
47(5), 1314–1323 (1999)
134. Li, Y., Cimini, L.J., Sollenberger, N.R.: Robust Channel Estimation for OFDM
Systems with Rapid Dispersive Fading Channels. IEEE Trans. Commun. 46(7),
902–915 (1998)
268 References
155. Newsam, G., Dietrich, C.: Bounds on the Size of Nonnegative Definite Circulant
Embeddings of Positive Definite Toeplitz Matrices. IEEE Trans. Inform. Theory
40(4), 1218–1220 (1994)
156. Nicoli, M., Simeone, O., Spagnolini, U.: Multislot Estimation of Fast-Varying
Space-Time Channels in TD-CDMA Systems. IEEE Commun. Lett. 6(9), 376–
378 (2002)
157. Nicoli, M., Simeone, O., Spagnolini, U.: Multi-Slot Estimation of Fast-Varying
Space-Time Communication Channels. IEEE Trans. Signal Processing 51(5),
1184–1195 (2003)
158. Nicoli, M., Sternad, M., Spagnolini, U., Ahlén, A.: Reduced-Rank Channel Es-
timation and Tracking in Time-Slotted CDMA Systems. In: Proc. IEEE Int.
Conference on Communications, pp. 533–537. New York City (2002)
159. Otnes, R., Tüchler, M.: On Iterative Equalization, Estimation, and Decoding.
In: Proc. of the IEEE Int. Conf. on Communications, vol. 4, pp. 2958–2962
(2003)
160. Palomar, D.P., Bengtsson, M., Ottersten, B.: Minimum BER Linear
Transceivers for MIMO Channels via Primal Decomposition. IEEE Trans. Signal
Processing 53(8), 2866–2882 (2005)
161. Paolella, M.S.: Computing Moments of Ratios of Quadratic Forms in Normal
Variables. Computational Statistics and Data Analysis 42(3), 313 – 331 (2003)
162. Papoulis, A.: A Note on the Predictability of Band-Limited Processes. Proc.
IEEE 73(8), 1332 (1985)
163. Papoulis, A.: Levinson’s Algorithm, Wold’s Decomposition, and Spectral Esti-
mation. SIAM Review 27(3), 405–441 (1985)
164. Papoulis, A.: Probability, Random Variables, and Stochastic Processes, 3rd edn.
WCB/McGraw-Hill (1991)
165. Paulraj, A., Papadias, C.B.: Space-Time Processing for Wireless Communica-
tions. IEEE Signal Processing Mag. 14(6), 49–83 (1997)
166. Pedersen, K.I., Mogensen, P.E., Fleury, B.H.: A Stochastic Model of the Tempo-
ral and Azimuthal Dispersion Seen at the Base Station in Outdoor Propagation
Environments. IEEE Trans. Veh. Technol. 49(2), 437–447 (2000)
167. Pedersen, K.I., Mogensen, P.E., Ramiro-Moreno, J.: Application and Perfor-
mance of Downlink Beamforming Techniques in UMTS. IEEE Commun. Mag.
42(10), 134–143 (2003)
168. Poor, H.V.: An Intoduction to Signal Detection and Estimation, 2nd edn.
Springer-Verlag (1994)
169. Porat, B.: Digital Processing of Random Signals. Prentice-Hall (1994)
170. Porat, B.: A Course in Digital Signal Processing. John Wiley & Sons (1997)
171. Prékopa, A.: Stochastic Programming. Kluwer (1995)
172. Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical
Recipes in C, 2nd edn. Cambridge University Press (1999)
173. Proakis, J.G.: Digital Communications, 3rd edn. McGraw-Hill (1995)
174. Psaltopoulos, G., Joham, M., Utschick, W.: Comparison of Lattice Search Tech-
niques for Nonlinear Precoding. In: Proc. of the Int. ITG/IEEE Workshop on
Smart Antennas. Reisensburg, Germany (2006)
175. Psaltopoulos, G., Joham, M., Utschick, W.: Generalized MMSE Detection Tech-
niques for Multipoint-to-Point Systems. In: Proc. of the IEEE Global Telecom-
munications Conference. San Francisco, CA (2006)
176. Quang, A.N.: On the Uniqueness of the Maximum-Likeliwood Estimate of Struc-
tured Covariance Matrices. IEEE Trans. Acoust., Speech, Signal Processing
32(6), 1249–1251 (1984)
177. Remmert, R.: Theory of Complex Functions. Springer-Verlag (1991)
178. Requicha, A.A.G.: The Zeros of Entire Functions: Theory and Engineering Ap-
plications. Proc. IEEE 68(3), 308–328 (1980)
270 References
179. Rey, F., Lamarca, M., Vazquez, G.: A Robust Transmitter Design for MIMO
Multicarrier Systems with Imperfect Channel Estimates. In: Proc. of the 4th
IEEE Workshop on Signal Processing Advances in Wireless Communications,
pp. 546 – 550 (2003)
180. Rosen, Y., Porat, B.: Optimal ARMA Parameter Estimation Based on the Sam-
ple Covariances for Data with Missing Observations. IEEE Trans. Inform. The-
ory 35(2), 342–349 (1989)
181. Schafer, R.W., Mersereau, R.M., Richards, M.A.: Constrained Iterative Restora-
tion Algorithms. Proc. IEEE 69(4), 432–450 (1981)
182. Scharf, L.L.: Statistical Signal Processing: Detection, Estimation, and Time Se-
ries Analysis. Addison-Wesley (1991)
183. Schmidt, D., Joham, M., Hunger, R., Utschick, W.: Near Maximum Sum-Rate
Non-Zero-Forcing Linear Precoding with Successive User Selection. In: Proc. of
the 40th Asilomar Conference on Signals, Systems and Computers (2006)
184. Schmidt, D., Joham, M., Utschick, W.: Minimum Mean Square Error Vector
Precoding. In: Proc. of the EEE 16th Int. Symp. on Personal, Indoor and
Mobile Radio Communications, vol. 1, pp. 107–111 (2005)
185. Schubert, M., Boche, H.: Solution of the Multiuser Downlink Beamforming
Problem With Individual SINR Constraints. IEEE Trans. Veh. Technol. 53(1),
18–28 (2004)
186. Schubert, M., Boche, H.: Iterative Multiuser Uplink and Downlink Beamform-
ing under SINR Constraints. IEEE Trans. Signal Processing 53(7), 2324–2334
(2005)
187. Schubert, M., Shi, S.: MMSE Transmit Optimization with Interference Pre-
Compensation. In: Proc. of the IEEE 61st Vehicular Technology Conference,
vol. 2, pp. 845–849 (2005)
188. Sharif, M., Hassibi, B.: On the Capacity of MIMO Broadcast Channels with
Partial Side Information. IEEE Trans. Inform. Theory 51(2), 506–522 (2005)
189. Shi, S., Schubert, M.: MMSE Transmit Optimization for Multi-User Multi-
Antenna Systems. In: Proc. of the IEEE Int. Conf. on Acoustics, Speech, and
Signal Processing, vol. 3, pp. 409–412 (2005)
190. Simeone, O., Bar-Ness, Y., Spagnolini, U.: Linear and Non-Linear Precod-
ing/Decoding for MIMO Systems with Long-Term Channel State Information
at the Transmitter. IEEE Trans. Wireless Commun. 3(2), 373–378 (2004)
191. Simeone, O., Spagnolini, U.: Multi-Slot Estimation of Space-Time Channels. In:
Proc. of the IEEE Int. Conf. on Communications, vol. 2, pp. 802–806 (2002)
192. Simeone, O., Spagnolini, U.: Lower Bound on Training-Based Channel Esti-
mation Error for Frequency-Selective Block-Fading Rayleigh MIMO Channels.
IEEE Trans. Signal Processing 32(11), 3265–3277 (2004)
193. Sion, M.: On General Minimax Theorems. IEEE Trans. Inform. Theory 8, 171–
176 (1958)
194. Slepian, D.: On Bandwidth. Proc. IEEE 64(3), 292–300 (1976)
195. Slepian, D.: Prolate Spheroidal Wave Functions, Fourier Analysis, and
Uncertainty—V: The Discrete Case. The Bell System Technical Journal 57(5),
1371–1430 (1978)
196. Snyders, J.: Error Formulae for Optimal Linear Filtering, Prediction and In-
terpolation of Stationary Time Series. The Annals of Mathematical Statistics
43(6), 1935–1943 (1972)
197. Solov’ev, V.N.: A Minimax-Bayes Estimate on Classes of Distributions with
Bounded Second Moments. Russian Mathematical Surveys 50(4), 832–834
(1995)
198. Solov’ev, V.N.: On the Problem of Minimax-Bayesian Estimation. Russian
Mathematical Surveys 53(5), 1104–1105 (1998)
199. Soloviov, V.N.: Towards the Theory of Minimax-Bayesian Estimation. Theory
Probab. Appl. 44(4), 739–754 (2000)
References 271
200. Stark, H., Woods, J.W.: Probability, Random Processes, and Estimation Theory
for Engineers, 2nd edn. Prentice Hall (1994)
201. Steiner, B., Baier, P.W.: Low Cost Channel Estimation in the Uplink Receiver
of CDMA Mobile Radio Systems. Frequenz 47(11-12), 292–298 (1993)
202. Stenger, F.: Tabulation of Certain Fully Symmetric Numerical Integration For-
mulas of Degree 7, 9 and 11. Mathematics of Computation 25(116), 935 (1971).
Microfiche
203. Stoica, P., Moses, R.: Spectral Analysis of Signals. Pearson Prentice Hall (2005)
204. Stoica, P., Söderström, T.: On Reparametrization of Loss Functions used in Es-
timation and the Invariance Principle. Signal Processing 17(4), 383–387 (1989)
205. Stoica, P., Viberg, M.: Maximum Likelihood Parameter and Rank Estimation in
Reduced-Rank Multivariate Linear Regressions. IEEE Trans. Signal Processing
44(12), 3069–3078 (1996)
206. Stüber, G.L.: Principles of Mobile Communication. Kluwer Academic Publishers
(1996)
207. Sturm, J.F.: Using SeDuMi 1.02, a MATLAB toolbox for optimization over
symmetric cones. Optimization Methods and Software 11–12, 625–653 (1999)
208. Sultan, S.A., Tracy, D.S.: Moments of the Complex Multivariate Normal Dis-
tribution. Linear Algebra and its Applications 237-238, 191–204 (1996)
209. Taherzadeh, M., Mobasher, A., Khandani, A.K.: LLL Lattice-Basis Reduction
Achieves the Maximum Diversity in MIMO Systems. In: Proc. Int. Symp. on
Information Theory, pp. 1300–1304 (2005)
210. Tejera, P., Utschick, W., Bauch, G., Nossek, J.A.: Subchannel Allocation in Mul-
tiuser Multiple-Input Multiple-Output Systems. IEEE Trans. Inform. Theory
52(10), 4721–4733 (2006)
211. Tepedelenlioğlu, C., Abdi, A., Giannakis, G.B., Kaveh, M.: Estimation of
Doppler Spread and Signal Strength in Mobile Communications with Appli-
cations to Handoff and Adaptive Transmission. Wireless Communications and
Mobile Computing 1(1), 221–242 (2001)
212. Tikhonov, A.N., Arsenin, V.Y.: Solutions of Ill-Posed Problems. V. H. Winston
& Sons (1977)
213. Tomlinson, M.: New Automatic Equaliser Employing Modulo Arithmetic. Elec-
tronics Letters 7(5/6), 138–139 (1971)
214. Tong, L., Perreau, S.: Multichannel Blind Identification: From Subspace to Max-
imum Likelihood Methods. Proc. IEEE 86(10), 1951–1968 (1998)
215. Tong, L., Sadler, B.M., Dong, M.: Pilot-Assisted Wireless Transmissions. IEEE
Signal Processing Mag. 21(6), 12–25 (2004)
216. Trefethen, L.N., Bau, D.: Numerical Linear Algebra. SIAM - Society for Indus-
trial and Applied Mathematics (1997)
217. Tüchler, M., Mecking, M.: Equalization for Non-ideal Channel Knowledge. In:
Proc. of the Conf. on Information Sciences and Systems. Baltimore, U.S.A.
(2003)
218. Tugnait, J., Tong, L., Ding, Z.: Single-User Channel Estimation and Equaliza-
tion. IEEE Signal Processing Mag. 17(3), 17–28 (2000)
219. Utschick, W.: Tracking of signal subspace projectors. IEEE Trans. Signal Pro-
cessing 50(4), 769–778 (2002)
220. Utschick, W., Dietrich, F.A.: On Estimation of Structured Covariance Matrices.
In: J. Beyerer, F.P. León, K.D. Sommer (eds.) Informationsfusion in der Mess-
und Sensortechnik, pp. 51–62. Universitätsverlag Karlsruhe (2006). ISBN 978-
3-86644-053-1
221. Utschick, W., Joham, M.: On the Duality of MIMO Transmission Techniques for
Multiuser Communications. In: Proc. of the 14th European Signal Processing
Conference. Florence, Italy (2006)
222. Vaidyanathan, P.P.: On Predicting a Band-Limited Signal Based on Past Sample
Values. Proc. IEEE 75(8), 1125 (1987)
272 References
244. Zerlin, B., Joham, M., Utschick, W., Seeger, A., Viering, I.: Linear Precoding
in W-CDMA Systems based on S-CPICH Channel Estimation. In: Proc. of
the 3rd IEEE Int. Symp. on Signal Processing and Information Technology, pp.
560–563. Darmstadt, Germany (2003)
245. Zhang, X., Palomar, D.P., Ottersten, B.: Robust Design of Linear MIMO
Transceivers Under Channel Uncertainty. In: Proc. of the IEEE Int. Conf. on
Acoustics, Speech, and Signal Processing, vol. 4, pp. 77–80 (2006)
Index
275
276 Index