Signal Processing For Wireless Communications

Foundations in Signal Processing,
Communications and Networking

Series Editors: W. Utschick, H. Boche, R. Mathar
Foundations in Signal Processing,
Communications and Networking
Series Editors: W. Utschick, H. Boche, R. Mathar
Vol. 1. Dietl, G. K. E.
Linear Estimation and Detection in Krylov Subspaces, 2007
ISBN 978-3-540-68478-7
Vol. 2. Dietrich, F. A.
Robust Signal Processing for Wireless Communications, 2008
ISBN 978-3-540-74246-3
Frank A. Dietrich
Robust Signal Processing

for Wireless Communications
With 88 Figures and 5 Tables
123
Series Editors:
Wolfgang Utschick Holger Boche

TU Munich TU Berlin
Associate Institute for Signal Dept. of Telecommunication Systems
Processing Heinrich-Hertz-Chair for Mobile
Arcisstrasse 21 Communications
80290 Munich, Germany Einsteinufer 25
10587 Berlin, Germany
Rudolf Mathar
RWTH Aachen University
Institute of Theoretical
Information Technology
52056 Aachen, Germany
Author:
Frank A. Dietrich
TU Munich
Associate Institute for Signal
Processing
Arcisstrasse 21
80290 Munich, Germany
E-mail: Dietrich@mytum.de
ISBN 978-3-540-74246-3 e-ISBN 978-3-540-74249-4
DOI 10.1007/978-3-540-74249-4
Springer Series in
Foundations in Signal Processing, Communications and Networking ISSN 1863-8538
Library of Congress Control Number: 2007937523
© 2008 Springer-Verlag Berlin Heidelberg
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting,
reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9,
1965, in its current version, and permission for use must always be obtained from Springer. Violations
are liable for prosecution under the German Copyright Law.
The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply,
even in the absence of a specific statement, that such names are exempt from the relevant protective laws
and regulations and therefore free for general use.
Cover design: eStudioCalamar S.L., F. Steinen-Broo, Girona, Spain
Printed on acid-free paper
987654321
springer.com
To my family
Preface
Optimization of adaptive signal processing algorithms is based on a mathe-

matical description of the signals and the underlying system. In practice, the
structure and parameters of this model are not known perfectly. For example,
this is due to a simplified model of reduced complexity (at the price of its ac-
curacy) or significant estimation errors in the model parameters. The design
of robust signal processing algorithms takes into account the parameter errors
and model uncertainties. Therefore, its robust optimization has to be based
on an explicit description of the uncertainties and the errors in the system’s
model and parameters.
It is shown in this book how robust optimization techniques and estima-
tion theory can be applied to optimize robust signal processing algorithms
for representative applications in wireless communications. It includes a re-
view of the relevant mathematical foundations and literature. The presenta-
tion is based on the principles of estimation, optimization, and information
theory: Bayesian estimation, Maximum Likelihood estimation, minimax op-
timization, and Shannon’s information rate/capacity. Thus, where relevant,
it includes the modern view of signal processing for communications, which
relates algorithm design to performance criteria from information theory.
Applications in three key areas of signal processing for wireless communica-
tions at the physical layer are presented: Channel estimation and prediction,
identification of correlations in a wireless fading channel, and linear and non-
linear precoding with incomplete channel state information for the broadcast
channel with multiple antennas (multi-user downlink).
This book is written for research and development engineers in indus-
try as well as PhD students and researchers in academia who are involved
in the design of signal processing for wireless communications. Chapters 2
and 3, which are concerned with estimation and prediction of system param-
eters, are of general interest beyond communication. The reader is assumed
to have knowledge in linear algebra, basic probability theory, and a familiar-
ity with the fundamentals of wireless communications. The relevant notation
is defined in Section 1.3. All chapters except for Chapter 5 can be read in-
vii
viii Preface
dependently from each other since they include the necessary signal models.
Chapter 5 additionally requires the signal model from Sections 4.1 and 4.2
and some of the ideas presented in Section 4.3.
Finally, the author would like to emphasize that the successful realization
of this book project was enabled by many people and a very creative and
excellent environment.
First of all, I deeply thank Prof. Dr.-Ing. Wolfgang Utschick for being my
academic teacher and a good friend: His steady support, numerous intensive
discussions, his encouragement, as well as his dedication to foster fundamen-
tal research on signal processing methodology have enabled and guided the
research leading to this book.
I am indebted to Prof. Dr. Björn Ottersten from Royal Institute of Tech-
nology, Stockholm, for reviewing the manuscript and for his feedback. More-
over, I thank Prof. Dr. Ralf Kötter, Technische Universität München, for his
support.
I would like to express my gratitude to Prof. Dr. techn. Josef A. Nossek for
his guidance in the first phase of this work and for his continuous support.
I thank my colleague Dr.-Ing. Michael Joham who has always been open to
share his ideas and insights and has spent time listening; his fundamental
contributions in the area of precoding have had a significant impact on the
second part of this book.
The results presented in this book were also stimulated by the excellent
working environment and good atmosphere at the Associate Institute for Sig-
nal Processing and the Institute for Signal Processing and Network Theory,
Technische Universität München: I thank all colleagues for the open exchange
of ideas, their support, and friendship. Thanks also to my students for their
inspiring questions and their commitment.
Munich, August 2007 Frank A. Dietrich

Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Robust Signal Processing under Model Uncertainties . . . . . . . . 1
1.2 Overview of Chapters and Selected Wireless Applications . . . . 4
1.3 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Channel Estimation and Prediction . . . . . . . . . . . . . . . . . . . . . . . 11

2.1 Model for Training Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Channel Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.1 Minimum Mean Square Error Estimator . . . . . . . . . . . . . 20
2.2.2 Maximum Likelihood Estimator . . . . . . . . . . . . . . . . . . . . 22
2.2.3 Correlator and Matched Filter . . . . . . . . . . . . . . . . . . . . . 25
2.2.4 Bias-Variance Trade-Off . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3 Channel Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.3.1 Minimum Mean Square Error Prediction . . . . . . . . . . . . 31
2.3.2 Properties of Band-Limited Random Sequences . . . . . . 35
2.4 Minimax Mean Square Error Estimation . . . . . . . . . . . . . . . . . . 37
2.4.1 General Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.4.2 Minimax Mean Square Error Channel Estimation . . . . 43
2.4.3 Minimax Mean Square Error Channel Prediction . . . . . 45
3 Estimation of Channel and Noise Covariance Matrices . . . . 59

3.1 Maximum Likelihood Estimation of Structured Covariance
Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.1.1 General Problem Statement and Properties . . . . . . . . . . 62
3.1.2 Toeplitz Structure: An Ill-Posed Problem and its
Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.2 Signal Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.3 Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.3.1 Application of the Space-Alternating Generalized
Expectation Maximization Algorithm . . . . . . . . . . . . . . . 73
3.3.2 Estimation of the Noise Covariance Matrix . . . . . . . . . . 76
ix
x Contents
3.3.3 Estimation of Channel Correlations in the Time,

Delay, and Space Dimensions . . . . . . . . . . . . . . . . . . . . . . 82
3.3.4 Estimation of Channel Correlations in the Delay and
Space Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.3.5 Estimation of Temporal Channel Correlations . . . . . . . . 88
3.3.6 Extensions and Computational Complexity . . . . . . . . . . 89
3.4 Completion of Partial Band-Limited Autocovariance Sequences 91
3.5 Least-Squares Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
3.5.1 A Heuristic Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
3.5.2 Unconstrained Least-Squares . . . . . . . . . . . . . . . . . . . . . . 100
3.5.3 Least-Squares with Positive Semidefinite Constraint . . 101
3.5.4 Generalization to Spatially Correlated Noise . . . . . . . . . 106
3.6 Performance Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
4 Linear Precoding with Partial Channel State Information 121

4.1 System Model for the Broadcast Channel . . . . . . . . . . . . . . . . . . 123
4.1.1 Forward Link Data Channel . . . . . . . . . . . . . . . . . . . . . . . 123
4.1.2 Forward Link Training Channel . . . . . . . . . . . . . . . . . . . . 125
4.1.3 Reverse Link Training Channel . . . . . . . . . . . . . . . . . . . . . 127
4.2 Channel State Information at the Transmitter and Receivers . 128
4.2.1 Transmitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
4.2.2 Receivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
4.3 Performance Measures for Partial Channel State Information 130
4.3.1 Mean Information Rate, Mean MMSE, and Mean SINR131
4.3.2 Simplified Broadcast Channel Models: AWGN Fading
BC, AWGN BC, and BC with Scaled Matched Filter
Receivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
4.3.3 Modeling of Receivers’ Incomplete Channel State
Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
4.3.4 Summary of Sum Mean Square Error Performance
Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
4.4 Optimization Based on the Sum Mean Square Error . . . . . . . . 150
4.4.1 Alternating Optimization of Receiver Models and
Transmitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
4.4.2 From Complete to Statistical Channel State Information156
4.4.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
4.4.4 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
4.5 Mean Square Error Dualities of BC and MAC . . . . . . . . . . . . . . 167
4.5.1 Duality for AWGN Broadcast Channel Model . . . . . . . . 169
4.5.2 Duality for Incomplete Channel State Information at
Receivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
4.6 Optimization with Quality of Service Constraints . . . . . . . . . . . 176
4.6.1 AWGN Broadcast Channel Model . . . . . . . . . . . . . . . . . . 176
4.6.2 Incomplete CSI at Receivers: Common Training
Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
Contents xi
5 Nonlinear Precoding with Partial Channel State

Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
5.1 From Vector Precoding to Tomlinson-Harashima Precoding . . 183
5.2.1 MMSE Receivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
5.2.2 Scaled Matched Filter Receivers . . . . . . . . . . . . . . . . . . . . 197
5.3 Optimization Based on Sum Performance Measures . . . . . . . . . 200
Transmitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
5.3.2 From Complete to Statistical Channel State Information210
5.4 Precoding for the Training Channel . . . . . . . . . . . . . . . . . . . . . . . 214
5.5 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
A Mathematical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225

A.1 Complex Gaussian Random Vectors . . . . . . . . . . . . . . . . . . . . . . . 225
A.2 Matrix Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
A.2.1 Properties of Trace and Kronecker Product . . . . . . . . . . 226
A.2.2 Schur Complement and Matrix Inversion Lemma . . . . . 227
A.2.3 Wirtinger Calculus and Matrix Gradients . . . . . . . . . . . . 228
A.3 Optimization and Karush-Kuhn-Tucker Conditions . . . . . . . . . 229
B Completion of Covariance Matrices and Extension of

Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
B.1 Completion of Toeplitz Covariance Matrices . . . . . . . . . . . . . . . 231
B.2 Band-Limited Positive Semidefinite Extension of Sequences . . 233
B.3 Generalized Band-Limited Trigonometric Moment Problem . . 235
C Robust Optimization from the Perspective of Estimation

Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
D Detailed Derivations for Precoding with Partial CSI . . . . . . 243

D.1 Linear Precoding Based on Sum Mean Square Error . . . . . . . . 243
D.2 Conditional Mean for Phase Compensation at the Receiver . . 245
D.3 Linear Precoding with Statistical Channel State Information . 246
D.4 Proof of BC-MAC Duality for AWGN BC Model . . . . . . . . . . . 249
D.5 Proof of BC-MAC Duality for Incomplete CSI at Receivers . . 250
D.6 Tomlinson-Harashima Precoding with Sum Performance
Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
E Channel Scenarios for Performance Evaluation . . . . . . . . . . . . 255
F List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
Chapter 1
Introduction
Wireless communication systems are designed to provide high data rates

reliably for a wide range of velocities of the mobile terminals. One impor-
tant design approach to increase the spectral efficiency1 envisions multiple
transmit or receive antennas, i.e., Multiple-Input Multiple-Output (MIMO)
systems, to increase the spectral efficiency.
This results in a larger number of channel parameters which have to be
estimated accurately to achieve the envisioned performance. For increasing
velocities, i.e., time-variance of the parameters, the estimation error increases
and enhanced adaptive digital signal processing is required to realize the sys-
tem. Improved concepts for estimation and prediction of the channel parame-
ters together with robust design methods for signal processing at the physical
layer can contribute to achieve these goals efficiently. They can already be
crucial for small velocities.
Before giving an overview of the systematic approaches to this problem
in three areas of physical layer signal processing which are presented in this
book, we define the underlying notion of robustness.
1.1 Robust Signal Processing under Model

Uncertainties
The design of adaptive signal processing relies on a model of the underly-

ing physical or technical system. The choice of a suitable model follows the
traditional principle: It should be as accurate as necessary and as simple as
possible. But, typically, the complexity of signal processing algorithms in-
creases with the model complexity. And on the other hand the performance
degrades in case of model-inaccuracies.
1
It is defined as the data rate normalized by the utilized frequency band.
1
2 1 Introduction
Fig. 1.1 Conventional approach to deal with model uncertainties: Treat the esti-
mated parameters and the model as if they were true and perfectly known. The
parameters for the signal processing algorithm are optimized under these idealized
conditions.
Fig. 1.2 Sensitivity of different design paradigms to the size of the model uncertainty
or the parameter error.
For example, the following practical constraints lead to an imperfect charac-

terization of the real system:
• To obtain an acceptable complexity of the model, some properties are not
modeled explicitly.
• The model parameters, which may be time-variant, have to be estimated.
Thus, they are not known perfectly.
Often a pragmatic design approach is pursued (Figure 1.1) which is charac-
terized by two design steps:
• An algorithm is designed assuming the model is correct and its parameters
are known perfectly.
• The model uncertainties are ignored and the estimated parameters are
applied as if they were error-free.
It yields satisfactory results as long as the model errors are “small”.
A robust algorithm design aims at minimizing the performance degrada-
tion due to model errors or uncertainties. Certainly, the first step towards a
robust performance is an accurate parameter estimation which exploits all
available information about the system. But in a second step, we would like to
find algorithms which are robust, i.e., less sensitive, to the remaining model
uncertainties.
1.1 Robust Signal Processing under Model Uncertainties 3
(a) General approach to robust optimization based on a set

C describing the model/parameter uncertainties.
(b) Robust optimization based on estimated model parameters and a

stochastic or deterministic description C of their errors.
(c) Optimization for the least-favorable choice of parameters

from an uncertainty set C for the considered optimality crite-
rion of the algorithm: This parameterization of the standard
algorithm (as in Figure 1.1) yields the maximally robust algo-
rithm for important applications.
Fig. 1.3 General approach to robust optimization of signal processing algorithms

under model uncertainties and two important special cases.
Sometimes suboptimum algorithms turn out to be less sensitive although

they do not model the uncertainties explicitly: They give a fixed robust design
which cannot adapt to the size of uncertainties (Figure 1.2).
An adaptive robust design of signal processing yields the optimum perfor-
mance for a perfect model match (no model uncertainties) and an improved
or in some sense optimum performance for increasing errors (Figure 1.2).
Conceptually, this can be achieved by
• defining a mathematical model of the considered uncertainties and
• constructing an optimization problem which includes these uncertainties.
Practically, this corresponds to an enhanced interface between system iden-
tification and signal processing (Figure 1.3(a)). Now, both tasks are not op-
timized independently from each other but jointly.
In this book, we focus on three important types of uncertainties in the
context of wireless communications:
1. Parameter errors with a stochastic error model,
2. parameter errors with a deterministic error model, and
3. unmodeled stochastic correlations of the model parameters.
The two underlying design paradigms are depicted in Figures 1.3(b)
and 1.3(c), which are a special case of the general approach in Figure 1.3(a):
4 1 Introduction
Fig. 1.4 Robust optimization of signal processing in the physical layer of a wireless
communication system: Interfaces between signal processing tasks are extended by a
suitable model for the parameter errors or uncertainties.
The first clearly shows the enhanced interface compared to Figure 1.1 and
is suitable for treating parameter errors (Figure 1.3(b)). The second version
guarantees a worst-case performance for the uncertainty set C employing a
maxmin or minimax criterion: In a first step, it chooses the least-favorable
model or parameters in C w.r.t. the conventional optimization criterion of
the considered signal processing task. Thus, optimization of the algorithm is
identical to Figure 1.1, but based on the worst-case model. In important prac-
tical cases, this is identical to the problem of designing a maximally robust
algorithm (see Section 2.4).
Finally, we would like to emphasize that the systematic approach to robust
design has a long history and many applications: Since the early 1960s, robust
optimization has been treated systematically in mathematics, engineering,
and other sciences. Important references are [97, 226, 122, 61, 16, 17, 171]
which give a broader overview of the subject than this brief introduction.
Other relevant classes of uncertainties and their applications are treated
there.
1.2 Overview of Chapters and Selected Wireless

Applications
The design of adaptive signal processing at the physical layer of a wireless

communication system relies on channel state information. Employing the
traditional stochastic model for the wireless fading channel, the channel state
consists of the parameters of the channel parameters’ probability distribution
1.2 Overview of Chapters and Selected Wireless Applications 5
and their current or future realization. We treat three related fundamental

signal processing tasks in the physical layer as shown in Figure 1.4:
• Estimation and prediction of the channel parameters (Chapter 2),
• estimation of the channel parameters’ and noise correlations (Chapter 3),
and
• linear and nonlinear precoding with partial channel state information for
the broadcast channel (Chapters 4 and 5).2
We propose methods for a robust design of every task. Moreover, every chap-
ter or main section starts with a survey of the underlying theory and literature
with the intention to provide the necessary background.
• Chapter 2: Channel Estimation and Prediction

The traditional approaches to estimation of the frequency-selective wire-
less channel using training sequences are introduced and compared regard-
ing the achieved bias-variance trade-off (Section 2.2). This includes more
recent techniques such as the reduced-rank Maximum Likelihood estimator
and the matched filter.
Prediction of the wireless channel gains in importance with the application
of precoding techniques at the transmitter relying on channel state infor-
mation. In Section 2.3, we focus on minimum mean square error (MMSE)
prediction and discuss its performance for band-limited random sequences,
which model the limited temporal dynamics of the wireless channel due to
a maximum velocity in a communication system.
Channel estimation as well as prediction relies on the knowledge of the
channel parameters’ probability distribution. It is either unknown, is only
specified partially (e.g., by the mean channel attenuation or the maximum
Doppler frequency), or estimates of its first and second order moments are
available (Chapter 3). In Section 2.4, a general introduction to minimax
mean square error (MSE) optimization is given: Its solution guarantees a
worst-case performance for a given uncertainty set, proves that the Gaus-
sian probability distribution is least-favorable, and provides the maximally
robust estimator.
We apply the minimax-results to robust MMSE channel estimation and
robust MMSE prediction for band-limited random sequences. For example,
for prediction of channels with a maximum Doppler frequency we provide
the uncertainty set which yields the following predictor: MMSE prediction
based on the rectangular and band-limited power spectral density; it is
maximally robust for this uncertainty set.
• Chapter 3: Estimation of Channel and Noise Covariance Matrices
Estimation of the covariance matrix for the channel parameters in space,
delay, and time dimension together with the spatial noise correlations can
be cast as the problem of estimating a structured covariance matrix. An
2
This serves as an example for adaptive processing of the signal containing the
information bearing data.
6 1 Introduction
overview of the general Maximum Likelihood problem for structured co-

variance matrices is given in Section 3.1. The focus is on Toeplitz structure
which yields an ill-posed Maximum Likelihood problem.
First, we present an iterative Maximum Likelihood solution based on a gen-
eralization of the expectation-maximization (EM) algorithm (Section 3.3).
It includes the combined estimation of channel correlations in space, delay,
and time as well as special cases.
If only the decimated band-limited autocovariance sequence can be esti-
mated and the application requires the interpolated sequence, the deci-
mated sequence has to be completed. As a constraint, it has to be ensured
that the interpolated/completed sequence is still positive semidefinite. We
propose a minimum-norm completion, which can be interpreted in the
context of minimax MSE prediction (Section 3.4).
For estimation of correlations in space and delay dimensions, the Max-
imum Likelihood approach is rather complex. Least-squares approaches
are computationally less complex and can be shown to be asymptotically
equivalent to Maximum Likelihood. Including a positive semidefinite con-
straint we derive different suboptimum estimators based on this paradigm,
which achieve a performance close to Maximum Likelihood (Section 3.5).
Chapters 2 and 3 deal with estimation of the channel state information (CSI).
As signal processing application which deals with the transmission of data,
we consider precoding of the data symbols for the wireless broadcast chan-
nel.3 For simplicity, we treat the case of a single antenna at the receiver and
multiple transmit antennas for a frequency flat channel4 . Because precoding
takes place at the transmitter, the availability of channel state information
is a crucial question.
The last two chapters present robust precoding approaches based on MSE
which can deal with partial channel state information at the transmitter. Its
foundation is the availability of the channel state information from estimators
of the channel realization and correlations which we present in Chapters 2
and 3.
• Chapter 4: Linear Precoding with Partial Channel State Information
For linear precoding, we introduce different optimum and suboptimum
performance measures, the corresponding approximate system models, and
emphasize their relation to the information rate (Section 4.3). We elaborate
on the relation between SINR-type measures to MSE-like expressions. It
includes optimization criteria for systems with a common training channel
instead of user-dedicated training sequences in the forward link.
The central aspect of this new framework is the choice of an appropri-
ate receiver model which controls the trade-off between performance and
complexity.
3
If restricted to be linear, precoding is also called preequalization or beamforming.
4
This is a finite impulse response (FIR) channel model of order zero.
1.2 Overview of Chapters and Selected Wireless Applications 7
Linear precoding is optimized iteratively for the sum MSE alternating be-
tween the receiver models and the transmitter. This intuitive approach
allows for an interesting interpretation. But optimization of sum perfor-
mance measures does not give quality of service (QoS) guarantees to the
receivers or provide fairness. We address these issues briefly in Section 4.6
which relies on the MSE-dualities between the broadcast and virtual mul-
tiple access channel developed in Section 4.5.
• Chapter 5: Nonlinear Precoding with Partial Channel State Information
Nonlinear precoding for the broadcast channel, i.e., Tomlinson-Harashima
precoding or the more general vector precoding, is a rather recent ap-
proach; an introduction is given in Section 5.1. We introduce novel robust
performance measures which incorporate partial as well as statistical chan-
nel state information (Section 5.2). As for linear precoding, the choice of
an appropriate receiver model is the central idea of this new framework.
For statistical channel state information, the solution can be considered
the first approach to nonlinear beamforming for communications.
The potential of Tomlinson-Harashima precoding (THP) with only sta-
tistical channel state information is illustrated in examples. For chan-
nels with rank-one spatial correlation matrix, we show that THP based
on statistical channel state information is not interference-limited: Every
signal-to-interference-plus-noise (SINR) ratio is achievable as for the case
of complete channel state information.
Vector precoding and Tomlinson-Harashima precoding are optimized
based on the sum MSE using the same paradigm as for linear precod-
ing: Iteratively, we alternate between optimizing the receiver models and
the transmitter (Section 5.3).
As a practical issue, we address precoding for the training sequences to
enable the implementation of the optimum receiver, which results from
nonlinear precoding at the transmitter (Section 5.4). The numerical per-
formance evaluation in Section 5.5 shows the potential of a robust design
in many interesting scenarios.
• Appendix :
In Appendix A, we briefly survey the necessary mathematical background.
The completion of covariance matrices with unknown elements and the ex-
tension of band-limited sequences is reviewed in Appendix B. Moreover,
these results are combined to prove a generalization to the band-limited
trigonometric moment problem which is required to interpolate autoco-
variance sequences in Section 3.4.
The robust optimization criteria which we apply for precoding with partial
channel state information are discussed more generally in Appendix C from
the perspective of estimation theory. The appendix concludes with detailed
derivations, the definition of the channel scenarios for the performance
evaluations, and a list of abbreviations.
8 1 Introduction
1.3 Notation
Sets
Definitions for sets of numbers, vectors, and matrices are:

Z (positive and negative) integers
R real numbers
R+,0 real positive (or zero) numbers
RM M -dimensional vectors in R
C complex numbers
CM M -dimensional vectors in C
CM ×N M × N matrices in C
SM M × M Hermitian matrices
SM
+,0 M × M positive semidefinite matrices
(equivalent to A 0)
SM
+ M × M positive definite matrices
TM
+,0 M × M positive semidefinite Toeplitz matrices
TM
+ M × M positive definite Toeplitz matrices
TM
c M × M circulant and Hermitian matrices
TM
c+,0 M × M positive semidefinite circulant matrices
TM
c+ M × M positive definite circulant matrices
TM
ext+,0 M × M positive definite Toeplitz matrices
with positive semidefinite circulant extension
DM
+,0 M × M positive semidefinite diagonal matrices
TM,N
+,0 M N × M N positive semidefinite block-Toeplitz matrices
with blocks of size M × M
TM,N
ext+,0 M N × M N positive semidefinite block-Toeplitz matrices
with blocks of size M × M and with positive semidefinite
block-circulant extension
LM M -dimensional rectangular Lattice τ ZM + jτ ZM
V Voronoi region {x + jy|x, y ∈ [−τ /2, τ /2)}
Matrix Operations
For a matrix A ∈ CM ×N , we introduce the following notation for standard

operations (A is assumed square or regular whenever required by the opera-
tion):
A−1 inverse A∗ complex conjugate
A† pseudo-inverse AT transpose
tr[A] trace AH Hermitian (complex conjugate transpose)
det[A] determinant
1.3 Notation 9
The relation A B is defined as A − B 0, i.e., the difference is positive

semidefinite.
The operator a = vec[A] stacks the columns of A in a vector, and
A = unvec[a] is the inverse operation. An N × N diagonal matrix with
di , i ∈ {1, 2, . . . , N }, on its diagonal is given by diag [d1 , d2 , . . . , dN ]. The
Kronecker product ⊗, the Schur complement, and more detailed definitions
are introduced in Appendix A.2.
Random Variables
Random vectors and matrices are denoted by lower and upper case sans-
serif bold letters (e.g., a, A, h), whereas their realizations or deterministic
variables are, e.g., a, A, h. To describe their properties we define:
pa (a) probability density function (pdf) of random vector a
evaluated for a
pa|y (a) conditional pdf of random vector a which is
conditioned on the realization y of a random vector y
Ea [f (a)] expectation of f (a) w.r.t. random vector a
Ea|y [f (a)] conditional expectation of f (a) w.r.t. random vector a
Ea [f (a)|y; x] expectation w.r.t. random vector a
for a conditional pdf with parameters x
µa = Ea [a] mean of a random vector a
µa|y = Ea|y [a] conditional mean of random vector a
a ∼ Nc (µa , C a ) random vector with complex Gaussian pdf (Appendix A.1)
Covariance matrices and conditional covariance matrices of a random vector
a are C a = Ea [(a − µa )(a − µa )H ] and C a|y = Ea|y [(a − µa|y )(a − µa|y )H ].
For a random matrix A, we define C A = EA [(A − EA [A])(A − EA [A])H ] and
C AH = EA [(A − EA [A])H (A − EA [A])], which are the sum of the covariance
matrices for its columns and the complex conjugate of its rows, respectively.
Correlation matrices are denoted by Ra = Ea [aa H ] and RA = EA [AAH ].
Other Definitions
A vector-valued function of a vector-valued argument x is written as f :

CM → CN , x 7→ f (x) and a scalar function of vector-valued arguments as
f : CM → C, x 7→ f(x).
A sequence of vectors is h : Z → CM , q 7→ h[q]. The (time-) index q
is omitted if we consider a fixed time instance. For a stationary sequence
of random vectors h[q], we define µh = µh[q] , C h = C h[q] , and ph (h[q]) =
ph[q] (h[q]).
10 1 Introduction
The standard convolution operator is ∗. The Kronecker sequence δ[i] is

defined as δ[0] = 1 and δ[i] = 0, i 6= 0 and the imaginary unit is denoted j
with j2 = −1.
Chapter 2
Channel Estimation and Prediction
Estimation and prediction of the wireless channel is a very broad topic. In

this chapter, we treat only a selection of important aspects and methods in
more detail. To clarify their relation to other approaches in the literature we
first give a general overview. It also addresses topics which we do not cover
in the sequel.
Introduction to Chapter
Approaches to estimation and prediction of the wireless channel can be clas-

sified according to the exploited information about the transmitted signal
and the underlying channel model.
Blind (unsupervised) methods for channel estimation do not require a
known training sequence to be transmitted but exploit the spatial or tem-
poral structure of the channel or, alternatively, properties of the transmitted
signal such as constant magnitude. The survey by Tong et al.[214] gives a
good introduction and many references in this field. Application to wireless
communications is discussed by Tugnait et al.[218] and Paulraj et al.[165].
In most communication systems (e.g., [215]) training sequences are defined
and can be exploited for channel estimation. Two important issues are the op-
timum choice of training sequences as well as their placement and the number
of training symbols (see [215] and references); the focus is on characterizing
the trade-off between signaling overhead and estimation accuracy regarding
system or link throughput and capacity. For systems which have already been
standardized, these parameters are fixed and the focus shifts towards the de-
velopment of training sequence based estimators. They can be distinguished
by the assumed model for the communication channel (Figure 2.1):
• A structured deterministic model describes the physical channel properties

explicitly. It assumes discrete paths parameterized by their delay, Doppler
frequency, angle of arrival and of departure, and complex path attenuation.
11
12 2 Channel Estimation and Prediction
Fig. 2.1 Different categories of models for the communication channel and references
to channel estimators based on the respective model.
This model is nonlinear in its parameters, which can be estimated, for

example, using ESPRIT- and MUSIC-type algorithms [224, 88].
• An unstructured deterministic finite impulse response (FIR) model of the
channel is linear in the channel parameters and leads to a linear (weighted)
least-squares estimator which is a maximum likelihood (ML) estimator
for Gaussian noise [124, 144, 165, 129]. The most popular low-complexity
approximation in practice is the correlator [120, 56].
• The number of channel parameters is increased significantly in multiple-
input multiple-output (MIMO) channels and with increasing symbol rate.1
This results in a larger estimation variance of the ML estimator. At the
same time these channels provide additional spatial and temporal struc-
ture, which can be exploited to improve the estimation accuracy. One point
of view considers the (temporal) invariance of the angles and delays2 and
models it as an invariance of a low-dimensional subspace which contains
the channel realization (subspace-oriented deterministic model ). It con-
siders the algebraic structure in contrast to the structured deterministic
model from above, which approximates the physical structure.
This approach has been pursued by many authors [205, 74], but the most
advanced methods are the reduced-rank ML estimators introduced by
Nicoli, Simeone, and Spagnolini [157, 156]. A detailed comparison with
Bayesian methods is presented in [56]. Regarding channel estimation, the
prominent rake-receiver can also be viewed as a subspace-oriented im-
provement of the correlator, which can also be generalized to space-time
processing [56].
• The channel is modeled by an FIR filter whose parameters are a sta-
tionary (vector) random process with given mean and covariance matrix
1
Assuming that the channel order increases with the symbol rate.
2
The angles and delays of a wireless channel change slowly compared to the complex
path attenuations (fading) [157, 229, 227] and are referred to as long-term or large-
scale channel properties.
2 Channel Estimation and Prediction 13
[14, 144]. This stochastic model allows for application of Bayesian estima-
tion methods [124, 182] and leads to the conditional mean (CM) estimator
in case of an mean square error (MSE) criterion. For jointly Gaussian
observation and channel, it reduces to the linear minimum mean square
error (LMMSE) estimator. Similar to the subspace approaches, the MMSE
estimator exploits the long-term structure of the channel which is now de-
scribed by the channel covariance matrix.
For time-variant channels, either interpolation of the estimated channel

parameters between two blocks of training sequences (e.g. [9, 144]) or a deci-
sion directed approach [144] improves performance. Channel estimation can
also be performed iteratively together with equalization and decoding to track
the channel, e.g., [159].
Another important topic is prediction of the future channel state which we
address in more detail in Sections 2.3 and 2.4.3. We distinguish three main
categories of prediction algorithms for the channel parameters:
• Extrapolation based on a structured channel model with estimated
Doppler frequencies,
• extrapolation based on the smoothness of the channel parameters over
time, e.g., standard (nonlinear) extrapolation based on splines, and
• MMSE prediction based on a stochastic model with given autocovariance
function [64, 63] (Section 2.3).
References to algorithms in all categories are given in Duel-Hallen et al.[63].
Here we focus on MMSE prediction as it has been introduced by Wiener and
Kolmogorov [182, 234, 235] in a Bayesian framework [124].
If the Doppler frequency is bounded due to a maximum mobility in the
system, i.e., the parameters’ power spectral densities are band-limited, this
property can be exploited for channel prediction as well as interpolation. Ei-
ther the maximum Doppler frequency (maximum velocity) is known a priori
by the system designer due to a specified environment (e.g., office or urban)
or it can be estimated using algorithms surveyed in [211]. Perfect prediction
(i.e, without error) of band-limited processes is possible given a non-distorted
observation of an infinite number of past samples and a sampling rate above
the Nyquist frequency [22, 162, 138, 27, 222]. But this problem is ill-posed, be-
cause the norm of the predictor is not bounded asymptotically for an infinite
number of past observations [164, p. 380]. Thus, regularization is required.
For about 50 years, the assumption of a rectangular power spectral density,
which corresponds to prediction using sinc-kernels, has been known to be
a good ad hoc assumption [35]. Optimality is only studied for the case of
undistorted/noiseless past observations [34]. And a formal justification for
its robustness is addressed for the “filtering” problem in [134], where it is
shown to be the least-favorable spectrum.
For conditional mean or MMSE channel estimation and prediction, the
probability distribution of the channel parameters and observation is required
Fig. 2.2 On an absolute time scale the signals are sampled at t = qTb + nTs , where
q is the block index and n the relative sample index. Within one block with index q
we only consider the relative time scale in n.
to be known perfectly. Assuming a Gaussian distribution, only perfect knowl-

edge of the first and second order moments is necessary. Estimators for the
moments of the channel parameters are introduced in Chapter 3, but still the
estimates will never be perfect. Minimax MSE estimators, which guarantee
a worst case performance for a given uncertainty class of second order statis-
tics, are introduced by Poor, Vastola, and Verdu [225, 226, 76, 75].3 These
results address the case of a linear minimax MSE estimator for an observation
of infinite length. The finite dimensional case with uncertainty in the type
of probability distribution and its second order moments is treated by Van-
delinde [223] and Soloviov [197, 198, 199], where the Gaussian distribution is
shown to be least-favorable and maximally robust.
Outline of Chapter
In this chapter we focus on training sequence based channel estimation, where

we assume block-fading. An overview of different paradigms for channel es-
timation which are either based on an unstructured deterministic, subspace-
based, or stochastic model, is given in Section 2.2 (see overview in Figure 2.1).
In addition to the comparison, we present the matched filter for channel es-
timation. An introduction to multivariate and univariate MMSE prediction
is given in Section 2.3, which emphasizes the perfect predictability of band-
limited random sequences. After a brief overview of the general results on
minimax MSE estimation in Section 2.4, we apply them to MMSE channel
estimation and prediction. Therefore, we are able to consider errors in the es-
timated channel statistics. Given the channel’s maximum Doppler frequency,
different minimax MSE optimal predictors are proposed.
2.1 Model for Training Channel
A linear MIMO communication channel with K transmitters and M receivers

is modeled by the time-variant impulse response H(t, τ ), where t denotes
3
See also the survey [123] by Kassam and Poor with many references.
2.1 Model for Training Channel 15
Fig. 2.3 The training sequence is time-multiplexed and transmitted periodically

with period Tb . For a frequency selective channel with L > 0, the first L samples of
the received training sequence are distorted by interblock interference.
Fig. 2.4 Transmission of the nth symbol s[n] ∈ BK in block q over a frequency
selective MIMO channel described by the finite impulse response (FIR) H[q, n] ∈
CM ×K (cf. (2.1) and (2.2)).
time and τ the delay. We describe the signals and channel by their complex
baseband representations (e.g. [173, 129]).
Periodically, every transmitter sends a training sequence of Nt symbols
which is known to the receivers. The period between blocks of training se-
quences is Tb . The receive signal is sampled at the symbol period Ts (Fig-
ure 2.2). The channel is assumed constant within a block of Nt symbols. Given
an impulse response H(t, τ ) with finite support in τ , i.e., H(t, τ ) = 0M ×K
for τ < 0 and τ > LTs , the corresponding discrete time system during the
qth block is the finite impulse response (FIR) of order L
L
X
H[q, i] = H(qTb , iTs ) = H (ℓ) [q]δ[i − ℓ] ∈ CM ×K . (2.1)
ℓ=0
The training sequences are multiplexed in time with a second signal (e.g.
a data channel), which interferes with the first L samples of the received
training sequence as depicted in Figure 2.3. For all transmitters, the sequences
are chosen from the (finite) set B and collected in s[n] ∈ BK for n ∈ {1 −
L, 2− L, . . . , N }, with Nt = N + L. In every block the same training sequence
s[n] is transmitted. Including additive noise, the sampled receive signal y (t)
during block q is (Figure 2.4)
y [q, n] = y (qTb + nTs ) = H[q, n] ∗ s[n] + n[q, n] ∈ CM , n = 1, 2, . . . , N

L
X
= H (ℓ) [q]s[n − ℓ] + n[q, n]. (2.2)
ℓ=0
Note that the symbol index n is the relative sample index within one block.
Only samples for 1 ≤ n ≤ N are not distorted due to intersymbol interference
Fig. 2.5 Linear system model for optimization of channel estimators W ∈ CP ×M N

for h[q] with ĥ[q] = W y[q] (Section 2.2).
from the other multiplexed signal (Figure 2.3). The relation between the
absolute time t ∈ R, the sampling of the signals at t = qTb + nTs , and the
relative sample index within one block is illustrated in Figure 2.2.
The noise is modeled as a stationary, temporally uncorrelated, and zero-
mean complex Gaussian random sequence n[q, n] ∼ Nc (0M , C n,s ). The spa-
tial covariance matrix of the noise is C n,s = E[n[q, n]n[q, n]H ].
2.1.0.1 Model for one Block
With n ∈ {1, 2, . . . , N }, we consider those N samples of the receive sig-

nal which are not influenced by the signal preceding the training block
(Figure 2.3). The relevant samples of the receive signal are collected in
Y [q] = [y [q, 1], y [q, 2], . . . , y [q, N ]] (equivalently for N[q]) and we obtain
Y [q] = H[q]S̄ + N[q] ∈ CM ×N , (2.3)
where H[q] = [H (0) [q], H (1) [q], . . . , H (L) [q]] ∈ CM ×K(L+1) and
 
s[1] s[2] · · · s[1 + L] · · · s[N ]
 s[0] s[1] s[N − 1] 
 ∈ BK(L+1)×N ,
 
S̄ =  .. . ..
 . . . . 
s[1 − L] · · · s[1] · · · s[N − L]
which has block Toeplitz structure. Defining y [q] = vec[Y [q]] we get the
standard linear model in Figure 2.5 (cf. Eq. A.15)
y [q] = Sh[q] + n[q] (2.4)
T
with system matrix S = S̄ ⊗ I M ∈ CM N ×P . The channel is described by
P = M K(L + 1) parameters h[q] = vec[H[q]] ∈ CP , where hp [q] is the pth
element of h[q].
We model the noise n[q] = vec[N[q]] ∈ CM N as n[q] ∼ Nc (0M N , C n ) with
two different choices of covariance matrices:
1. n[q] is spatially and temporally uncorrelated with C n = cn I M N and vari-
ance cn , or
2.1 Model for Training Channel 17
2. n[q] is temporally uncorrelated and spatially correlated with
C n = I N ⊗ C n,s . (2.5)
This can model an interference from spatially limited directions.

In wireless communications it is common [173] to choose a stochastic model
for the channel parameters h[q]. A stationary complex Gaussian model
h[q] ∼ Nc (µh , C h ) (2.6)
with µh = E[h[q]] and C h = E[(h[q] − µh )(h[q] − µh )H ] is standard. It is

referred to as Rice fading because the magnitude distribution of the elements
of h[q] is a Rice distribution [173]. In the sequel we assume µh = 0P , i.e., the
magnitude of an element of h[q] is Rayleigh distributed. An extension of the
results to non-zero mean channels is straightforward. Moreover, we assume
that h[q] and n[q] are uncorrelated.
Depending on the underlying scenario, one can assume a channel covari-
ance matrix C h with additional structure:
1. If the channel coefficients belonging to different delays are uncorrelated,
e.g., due to uncorrelated scattering [14], C h is block-diagonal
 
C h (0) L
..  X
eℓ+1 eT

Ch =  .  = ℓ+1 ⊗ C h (ℓ) , (2.7)
C h (L) ℓ=0
where C h (ℓ) ∈ SM K
+,0 is the covariance matrix of the ℓth channel delay
h [q] = vec[H [q]] ∈ CM K .
(ℓ) (ℓ)
2. If the channel coefficients belonging to different transmit antennas are un-

correlated, e.g., they belong to different mobile terminals, the
PKcovariance
matrix of the ℓth channel delay has the structure C h (ℓ) = k=1 ek eT k ⊗
(ℓ) M
C h (ℓ) , where hk [q] ∈ C with covariance matrix C h (ℓ) is the kth column
k k
of H (ℓ) [q]. In case of a uniform linear array at the receiver and a narrow-
band signal (see Appendix E and [125]) C h (ℓ) has Toeplitz structure.
k
2.1.0.2 Model for Multiple Blocks
To derive predictors of the channel realization h[q] based on Q previously

observed blocks of training sequences, we model observations and the cor-
responding channel vectors for an arbitrary block index q as realizations of
random vectors
   
y [q − 1] h[q − 1]
 y [q − 2]   h[q − 2] 
 ∈ CQM N  ∈ CQP ,
   
yT [q] =  .. and hT [q] =  .. (2.8)
 .   . 
y [q − Q] h[q − Q]
respectively. With (2.4) the corresponding signal model is
yT [q] = S T hT [q] + nT [q] ∈ CQM N (2.9)
with S T = I Q ⊗ S.
In addition to the covariance matrix C h in space and delay domain, we
have to define the channel parameters’ temporal correlations. In general,
the temporal correlations (autocovariance sequences) are different for ev-
ery element in h[q], and C h [i] = E[h[q]h[q − i]H ] does not have a special
structure. Because h[q] is a stationary random sequence (equidistant sam-
ples of the continuous-time channel), the auto- and crosscovariance matrices
C hT,p hT,p′ = E[hT,p [q]hT,p′ [q]H ] of hT,p [q] = [hp [q − 1], hp [q − 2], . . . , hp [q −
Q]]T ∈ CQ are Toeplitz with first row [chp hp′ [0], chp hp′ [1], . . . , chp hp′ [Q − 1]]
and chp hp′ [i] = E[hp [q]hp′ [q − i]∗ ] for p, p′ ∈ {1, 2, . . . , P }. Therefore, C hT is
block-Toeplitz and can be written as
P X
X P
C hT = C hT,p hT,p′ ⊗ ep eT
p′ . (2.10)
p=1 p′ =1
To obtain simplified models of the channel correlations we introduce the

following definition:
Definition 2.1. The crosscovariance chp hp′ [i] of two parameters hp [q] and
hp′ [q − i], p 6= p′ is called separable if it can be factorized as chp hp′ [i] =
ap [i]b[p, p′ ].4 The sequence ap [i] = ap′ [i], which on p (or p′ ) and the time dif-
ference i, describes the temporal correlations and b[p, p′ ] the crosscorrelation
between different random sequences.
Thus, for a separable crosscovariance, we require the same normalized auto-
covariance sequence chp [i]/chp [0] = chp′ [i]/chp′ [0] for hp [q] and hp′ [q], p 6= p′ .
In other words, the crosscorrelations in space and delay are called separable
from the temporal correlations if the corresponding physical processes are
statistically independent and the temporal correlations are identical.5
4
This definition is in analogy to the notion of separable transmitter and receiver
correlations for MIMO channel models resulting in the “Kronecker model” for the
spatial covariance matrix (see e.g. [231]) and is exploited in [192, 30].
5
Separability according to this definition is a valid model for a physical propagation
channel if the following conditions hold: The Doppler frequencies do not depend on
the angles of arrival at the receiver or angles of departure at the transmitter, which
determine the spatial correlations; and they can be modeled as statistically indepen-
2.2 Channel Estimation 19
Depending on the physical properties of the channel, the following special

cases are of interest.
1. For L = 0, the channel from the kth transmitter to the M receive antennas
is denoted hk [q] ∈ CM , which is the kth column of H[q]. Its covariance
matrix is C hk . For separability of the spatial and temporal correlation of
hk [q], the autocovariance sequence ck [i] is defined as E[hk [q]hk [q − i]H ] =
ck [i]C hk with ck [0] = 1. Additionally, we assume that channels of different
transmitters are uncorrelated E[hk [q]hk′ [q]H ] = C hk δ[k − k ′ ], i.e., C h =
PK T
k=1 ek ek ⊗ C hk . This yields
K
X
C hT = C T,k ⊗ ek eT
k ⊗ C hk , (2.11)
k=1
where C T,k is Toeplitz with first row [ck [0], ck [1], . . . , ck [Q − 1]]. For ex-
ample, this describes the situation of K mobile terminals with generally
different velocity and geographical location and M receivers located in the
same (time-invariant) environment and on the same object.
2. If all elements of h[q] have an identical autocovariance sequence c[i] and
the correlations are separable, we can write E[h[q]h[q − i]H ] = c[i]C h . This
defines c[i] implicitly, which we normalize to c[0] = 1. These assumptions
result in
C hT = C T ⊗ C h , (2.12)
where C T is Toeplitz with first row equal to [c[0], c[1], . . . , c[Q − 1]].
Separability of spatial and temporal correlations is not possible in general,
e.g., when the mobile terminals have multiple antennas and are moving. In
this case the Doppler spectrum as well as the spatial correlation are deter-
mined by the same angles of departure/arrival at the mobile terminal.
2.2 Channel Estimation
Based on the model of the receive signal6
y = Sh + n ∈ CM N (2.13)
dent random variables. As an example, consider a scenario in which the reflecting

or scattering object next to the fixed receiver is not moving and the Doppler spec-
trum results from a ring of scatterers around the geographically well-separated mobile
terminals with one antenna each [30].
6
To develop estimators based on y[q] which estimate the current channel realization
h[q], we omit the block index q of (2.4) in this section.
we discuss different approaches to optimize a linear estimator W ∈ CP ×M N

for h. Given a realization y of y , it yields the channel estimate
ĥ = W y ∈ CP . (2.14)
Depending on the model of h, e.g., stochastic or deterministic, different es-

timation philosophies are presented in the literature (Figure 2.1). Here we
give a survey of different paradigms. We emphasize their relation to classical
regularization methods [154] regarding the inverse problem (2.13).
As the system matrix S in the inverse problem (2.13) can be chosen by the
system designer via the design of training sequences (see references in [215]),
it is rather well-conditioned in general. Thus, the purpose of regularization
methods is to incorporate a priori information about h in order to improve
the performance in case of many parameters, in case of low SNR, or for a
small number of observations N of Nt training symbols.
2.2.1 Minimum Mean Square Error Estimator
In the Bayesian formulation of the channel estimation problem (2.13) we

model the channel parameters h as a random vector h with given probabil-
ity density function (pdf) ph (h). Thus, for example, correlations of h can
be exploited to improve the estimator’s performance. Given the joint pdf
ph,y (h, y) and the observation y, we search a function f : CM N → CP with
ĥ = f (y) such that ĥ is, on average, as close to the true channel realization
h as possible.
Here we aim at minimizing the mean square error (MSE)

Eh,y kh − f (y )k22 = Ey Eh|y kh − f (y )k22 (2.15)
w.r.t. all functions f [124]. This is equivalent to minimizing the conditional

MSE w.r.t. ĥ ∈ CP given the observation y

min Eh|y kh − ĥk22 . (2.16)
ĥ∈CP
Writing the conditional MSE explicitly, we obtain

H
Eh|y kh − ĥk22 = tr Rh|y + kĥk22 − ĥ µh|y − µH
h|y ĥ (2.17)
with conditional correlation matrix Rh|y = C h|y + µh|y µH h|y , conditional

covariance matrix C h|y , and conditional mean µh|y of the pdf ph|y (h|y).
The solution of (2.16)
ĥCM = µh|y = Eh|y [h] (2.18)
is often called the minimum MSE (MMSE) estimator or the conditional mean
(CM) estimator. Thus, in general f is a nonlinear function of y. The MMSE

Ey tr Rh|y − kµh|y k22 = tr Ey C h|y (2.19)
is the trace of the mean error correlation matrix

h i
C h−ĥCM = Ey C h|y = Eh,y (h − ĥCM )(h − ĥCM )H . (2.20)
So far in this section we have considered a general pdf ph,y (h, y); now we
assume h and y to be zero-mean and jointly complex Gaussian distributed.
For this assumption, the conditional mean is a linear function of y and reads
(cf. (A.2))
ĥMMSE = µh|y = W MMSE y (2.21)
with linear estimator7
W MMSE = C hy C −1
y
= C h S H (SC h S H + C n )−1 (2.22)

H
= (C h S C −1
n S
−1
+ IP ) ChS H
C −1
n .
The first equality follows from the system model (2.13) and the second from
the matrix inversion lemma (A.19) [124, p. 533]. The left side of this system
of linear equations, i.e., the matrix to be “inverted”, is smaller in the third
line in (2.22). If N > K(L + 1) and C −1 n is given or C n = I N ⊗ C n,s , the
order of complexity to solve it is also reduced [216].
For the Gaussian assumption, the mean error correlation matrix in (2.20)
is equivalent to the conditional error covariance matrix, as the latter is inde-
pendent of y. It is the Schur complement of C y in the covariance matrix of
[h T , y T ]T (see A.17)

H C h C hy
C h|y = C h − W MMSE C hy = tr KC y . (2.23)
CHhy C y
For every linear estimator W , the MSE can be decomposed in the mean
squared norm of the estimator’s bias h − W Sh and the variance of the
estimator:
7
The channel covariance matrix C h is determined by the angles of arrival and depar-
ture, delays, and mean path loss. These are long-term or large-scale properties of the
channel and change slowly compared to the realization h. Thus, in principle it is not
necessary to estimate these parameters directly as in [224] to exploit the channel’s
structure for an improved channel estimate ĥ.

Eh,y kh − W y k22 = Eh,n kh − W (Sh + n)k22 (2.24)

= Eh kh − W Shk22 + En kW nk22 . (2.25)
| {z } | {z }
mean squared norm of Bias Variance
The MMSE estimator finds the best trade-off among both contributions be-
cause it exploits the knowledge about the channel and noise covariance matrix
C h and C n , respectively.
Due to its linearity, W MMSE is called linear MMSE (LMMSE) estimator
and could also be derived from (2.16) restricting f to the class of linear
functions as in (2.24). In Section 2.4 we discuss the robust properties of the
LMMSE estimator for general (non-Gaussian) pdf ph,y (h, y).
The estimate ĥMMSE is also the solution of the regularized weighted least-
squares problem
ĥMMSE = argmin ky − Shk2C −1 + khk2C −1 , (2.26)

n h
h
which aims at simultaneously minimizing the weighted norm of the approx-

imation error and of h [24]. In the terminology of [154], C h describes the
smoothness of h. Thus, smoothness and average size of h — relative to C n
— is the a priori information exploited here.
2.2.2 Maximum Likelihood Estimator
Contrary to the MMSE estimator, the Maximum Likelihood (ML) approach

treats h as a deterministic parameter. At first we assume to have no additional
a priori information about it.
Together with the noise covariance matrix C n , the channel parameters h
define the stochastic model of the observation y ,
y ∼ Nc (Sh, C n ). (2.27)
The ML approach aims at finding the parameters of this distribution which

present the best fit to the given observation y and maximize the likelihood
max py (y; h, C n ) . (2.28)

h∈CP ,C n ∈Cn
The complex Gaussian pdf is (see Appendix A.1)

−1
H
py (y; h, C n ) = π M N det C n exp − (y − Sh) C −1
n (y − Sh) . (2.29)
Restricting the noise to be temporally uncorrelated and spatially correlated,

the corresponding class of covariance matrices is

Cn = C n |C n = I N ⊗ C n,s , C n,s ∈ SM
+,0 , (2.30)
for which (2.28) yields [129]
ĥML = W ML y, W ML = S †
1 H (2.31)
Ĉ n,s = Y − Ĥ ML S̄ Y − Ĥ ML S̄
N

with Ĥ ML = unvec ĥML ∈ CM ×K(L+1) . The estimates exist only if S has
full column rank, i.e., necessarily M N ≥ P . If C n = cn I M N , we have ĉn =
1 2 H H −1
M N ky − S ĥML k2 . Introducing the orthogonal projector P = S̄ (S̄ S̄ ) S̄
⊥
on the row space of S̄ and the projector P = I N −P on the space orthogonal
to the row space of S̄, we can write [129]
1
Ĉ n,s = Y P ⊥Y H (2.32)
N
1
ĉn = kY P ⊥ k2F . (2.33)
MN
The space orthogonal to the row space of S̄ contains only noise (“noise sub-
space”), because Y P ⊥ = N P ⊥ (cf. Eq. 2.3).
In contrast to (2.21), ĥML in (2.31) is an unbiased estimate of h, whereas
the bias of Ĉ n,s and ĉn is

N − K(L + 1)
En Ĉ n,s − C n,s = − 1 C n,s (2.34)
N

N − K(L + 1)
En [ĉn ] − cn = − 1 cn . (2.35)
N
It results from the scaling of the estimates in (2.32) and (2.33) by N and
M N , respectively. On average, Ĉ n,s and ĉn underestimate the true noise
covariance matrix, because E[Ĉ n,s ] C n,s . Note that the rank of P ⊥ is
N − K(L + 1). Thus, for unbiased estimates, a scaling with N − K(L + 1) and
M [N − K(L + 1)] instead of N and M N , respectively, would be sufficient.
These unbiased estimators are derived in Chapter 3.
For the general class Cn = {C n |C n ∈ SM N
+,0 } and if we assume C n to be
known, the ML problem (2.28) is equivalent to the weighted least-squares
problem [124]
min ky − ShkC −1
n
, (2.36)
h
which is solved by
−1
ĥML = W ML y, W ML = S H C −1
n S S H C −1
n . (2.37)
Note that S H C −1
n y is a sufficient statistic for h. Applying Tikhonov regu-
larization [154, 24, 212] to (2.36), we obtain (2.26) and the LMMSE estima-
tor (2.21).
For (2.31) and (2.37), the covariance matrix of the zero-mean estimation
error ε = ĥ − h is
−1
C ε = S H C −1
n S (2.38)
and the MSE is E[kεk22 ] = tr[C ε ], since ĥML is unbiased.

The relationship between W ML from (2.37) and the MMSE estimate (2.21)
is given by
−1
W MMSE = C ε C −1
h + IP W ML (2.39)
−1
−1 1/2
= C ε1/2 C 1/2
ε Ch Cε + IP C −1/2
ε W ML
−1
= C ε1/2 V (Σ + I P ) Σ V H C −1/2 W ML (2.40)
| {z } | ε{z }
matrix of filter factors F whitening
−1/2 −1/2
with the eigenvalue decomposition of C ε C h C ε = V ΣV H and [Σ]i,i ≥
[Σ]i+1,i+1 . Based on this decomposition of W MMSE , the MMSE estimator
can be interpreted as follows: The first stage is an ML channel estimator fol-
lowed by a whitening of the estimation error. In the next stage, the basis in
CP is changed and the components are weighted by the diagonal elements of
−1
(Σ + I P ) Σ, which describe the reliability of the estimate in every dimen-
sion. The ith diagonal element of Σ may be interpreted as the signal to noise
1/2
ratio in the one-dimensional subspace spanned by the ith column of C ε V .
2.2.2.1 Reduced-Rank Maximum Likelihood Estimator
In contrast to this “soft” weighting of the subspaces, the reduced-rank ML

(RML) solution proposed in [191, 157] (for C n = I N ⊗ C n,s ) results in the
new matrix of filter factors F in (2.40), chosen as

I
F RML = R , (2.41)
0P −R×P −R
which selects the strongest R dimensions and truncates the weakest P − R

1/2
dimensions in the basis C ε V .8 The resulting RML estimator
W RML = C 1/2 H −1/2

ε V F RML V C ε W ML (2.42)
is of reduced rank R ≤ P .
8
The eigenvalues are assumed to be sorted in decreasing magnitude.
Truncating the singular value decomposition of the system matrix S is a

standard technique for regularizing the solution of inverse problems [154, 205].
Here, a non-orthogonal basis is chosen which describes additional a priori
information available for this problem via C h .
The relation of the RML to the MMSE estimator was established in
[56], which allows for the heuristic derivation of the RML estimator given
here. In the original work [191, 157], a rigorous derivation of this esti-
mator is presented. It assumes a reduced-rank model of dimension R for
h = U R ξ with ξ ∈ CR . The optimum choice for U R ∈ CP ×R is the non-
1/2
orthogonal basis U R = C ε V F . This basis does not change over B blocks
of training sequences, whereas ξ is assumed to change from block to block.
Moreover, based on the derivation in [191, 157] for finite B, the estimate
PB
Ĉ h = B1 q=1 ĥML [q]ĥML [q]H of C h is optimum, where ĥML [q] is the ML
estimate (2.31) based on y[q] (2.4) in block q.
In practice, the choice of R is a problem, as C h is generally of full rank.
Choosing R < P results in a bias. Thus, the rank R has to be optimized
to achieve a good bias-variance trade-off (see Section 2.2.4, also [56, 157]).
The MMSE estimator’s model of h is more accurate and results in a superior
performance.
2.2.3 Correlator and Matched Filter
For orthogonal columns in S and white noise, i.e., S H S = N cs I P and C n =

cn I M N , the ML estimator in (2.31) and (2.37) simplifies to
1 H
WC = N cs S , (2.43)
which is sometimes called “correlator” in the literature. It is simple to im-

plement, but if S H S = N cs I P does not hold, the channel estimate becomes
biased and the MSE saturates for high SNR. Orthogonality of S can be
achieved by appropriate design of the training sequences, e.g., choosing or-
thogonal sequences for L = 0 or with the choice of [201] for L > 0. The
correlator does not exploit any additional a priori information of the channel
parameters such as correlations.
The matched filter (MF) aims at maximizing the cross-correlation between
the output of the estimator ĥ = W y and the signal h to be estimated. The
correlation is measured by |Eh,n [ĥ H h]|2 = |tr[Rhĥ ]|2 , which is additionally
normalized by the noise variance in ĥ to avoid kW kF → ∞. This can be
interpreted as a generalization of the SNR which is defined in the literature
for estimating a scalar signal [108, p.131]. Thus, the MF assumes a stochastic
model for h with correlation matrix9 Rh .
The optimization problem reads
h i2
Eh,n ĥ H h
max (2.44)
W En kW nk22
and is solved by
W MF = αRh S H C −1
n , α ∈ C. (2.45)
The solution is unique up to a complex scaling α because the cost function is

invariant to α. A possible choice is α = Ncncs to achieve convergence to the cor-
relator (2.43) in case Rh = I P and C n = cn I M N , where cn = tr[C n ]/(M N )
and cs is defined by cs = ks[n]k22 /K for training sequences with constant
ks[n]k2 .
The MF produces a sufficient statistic S H C −1 n y for h in the first stage
before weighting the estimate by the correlation matrix Rh . This weighting
Rh = U ΛU H corresponds to changing the basis with U , which is followed
by a weighting by the eigenvalues (signal variances in the transformed do-
main) and a transformation back to the original basis. Thus, the estimate
is amplified more in those dimensions for which the channel parameters are
stronger.
For high noise variance, the MMSE estimator (2.22) scaled by c′n which is
defined via C n = c′n C ′n and C ′n = limc′n →∞ C n /c′n ≻ 0, converges to the MF
in (2.45) with α = 1 and C n = C ′n :
W MF α=1,C n =C ′n
= ′lim c′n W MMSE . (2.46)
cn →∞
If the diagonal of C n is constant and equal to c′n , then we have C ′n = I M N .

In systems operating at low SNR and with short training sequences, the
MF is a low-complexity implementation of the MMSE estimator with similar
performance.
Although the correlator for channel estimation is often called a MF, the
relation of the MF based on (2.44) to the MMSE estimator (2.21) in (2.46)
suggests using this terminology for (2.45).
2.2.4 Bias-Variance Trade-Off
The MSE can be decomposed as
9
Although it is equal to the covariance matrix for our assumption of zero-mean h ,
in general the correlation matrix has to be used here.

Eh,n kεk22 = Eh,n kh − W (Sh + n)k22

= Eh k(I P − W S)hk22 + En kW nk22

= tr (I P − W S)C h (I P − W S)H + cn kW k2F , (2.47)
| {z } | {z }
mean squared norm of bias variance
where the first term is the mean of the squared norm of the bias and the sec-
ond the variance of the estimate ĥ = W y with C n = cn I M N . The estimators
presented in the previous sections aim at a reduced variance compared to the
ML estimator, at the price of introducing and increasing the bias.
Assuming orthogonal training sequences with S H S = N cs I P , the basic be-
havior of the estimators w.r.t. to this bias-variance trade-off can be explained.
With this assumption and the EVD C h = U ΛU H , Λ = diag [λ1 , . . . , λP ],
the estimators read
1 H
W ML = W C = S
N cs
−1
cn
W MMSE = U Λ+ IP Λ U H W ML
N cs
| {z }
Optimal weighting of subspaces
1
W MF = U ΛU H W ML
λ1
W RML = U F RML U H W ML .
The scaling for the MF is chosen as α = cn /(λ1 N cs ) to obtain a reduced

variance compared to the ML estimator. The correlator and ML estimator
are identical in this case.
For all estimators, the mean bias and variance as defined in (2.47) are
given in Table 2.1. The MMSE, MF, and RML estimators reduce the bias
compared to the ML estimator. By definition, the MMSE estimator achieves
the optimum trade-off, whereas the performance of the RML approach crit-
ically depends on the choice of the rank R. The bias-variance trade-off for
the MF estimator cannot be adapted to the channel correlations, number of
training symbols N , and SNR. Thus, the MF approach only yields a perfor-
mance improvement for large correlation, small N , and low SNR where it
approaches the MMSE estimator.
To assess the estimators’ MSE numerically,10 all estimators W are scaled
by a scalar β = tr[W SC h ]∗ /tr[W SC h S H W H + W C n W H ] minimizing the
10
In the figures the MSE Eh,n [kεk22 ] is normalized by tr[C h ] to achieve the same
maximum normalized MSE equal to one, for all scenarios.
Channel Mean Bias Variance

Estimator
ML/Correlator 0 P Ncnc
«s
λp 2 cn
P
„ «−1 P
„
P λp P
Matched Filter λp −1 N cs
p=1 λ1 p=1 λ1
0 12
P P
1 1 ! 2 cn
P B C P
MMSE λp B cn − 1C N cs
1+ cn
@ A
p=1 p=1
λp N cs 1+
λp N cs
P
R Ncnc
P
Red.-Rank ML λp
p=R+1 s
(RML)
Table 2.1 Mean bias and variance (2.47) of channel estimators for S H S = N cs I P
and C n = cn I M N .

MSE, i.e., minβ Eh,n kh − βW y k22 . Thus, we focus on the estimators’ capa-
bility to estimate the channel vector h up to its norm.11
First, we consider a scenario with M = K = 8, L = 0 (frequency-flat), and
N = 16 in Figure 2.6. The binary training sequences satisfy S H S = N cs I P .
The covariance matrix C h is given by the model in Appendix E with mean
angles of arrival ϕ̄ = [−45◦ , −32.1◦ , −19.3◦ , −6.4◦ , 6.4◦ , 19.3◦ , 32.1◦ , 45◦ ] and
Laplace angular power spectrum with spread σ = 5◦ .
Due to a small N , the ML estimator requires an SNR = cs /cn at a relative
MSE of 10−1 which is about 5 dB larger than for the MMSE estimator. With
the RML estimator, this gap can be closed over a wide range of SNR reducing
the rank from R = P = 64 for the ML estimator to R = 40. For reference, we
also give the RML performance with the optimum rank w.r.t. MSE obtained
from a brute-force optimization. The MF estimator is only superior to the
ML estimator at very low SNR, as its bias is substantial.
In Figures 2.7 and 2.8, performance in a frequency selective channel with
L = 4, M = 8, K = 1, and N = 16 is investigated. The covariance matrix
C h is given by the model in Appendix E with mean angles of arrival ϕ̄ =
[−45◦ , −22.5◦ , 0◦ , 22.5◦ , 45◦ ] per delay and Laplace angular power spectrum
with spread σ = 5◦ .
In this channel scenario the MF estimator reduces the MSE compared
to the correlator (with comparable numerical complexity) for an interesting
SNR region (Figures 2.7). Asymptotically, for high SNR it is outperformed
by the correlator. The RML estimator achieves a performance close to the
MMSE approach if the rank is chosen optimally for every SNR (Figure 2.8).
For full rank (R = P = 40), the RML is equivalent to the ML estimator. For
a fixed rank of R = 5 and R = 10, the MSE saturates at high SNR due to the
11
If the application using the channel estimates is sensitive to a scaling of ĥ, esti-
mators are proposed in the literature (e.g. [15]) which also introduce a proper scaling
for ML estimators and do not rely on the knowledge of C h .
Fig. 2.6 Normalized MSE of channel estimators for M = K = 8, L = 0 (frequency-

flat), and N = 16.
Channel Order of complexity

Estimator
ML K 3 (L + 1)3
Matched Filter M K (L + 1)2 N
3 2
MMSE M 3 K 3 (L + 1)3
Red.-Rank ML (RML) M 3 K 3 (L + 1)3
Table 2.2 Order of computational complexity of channel estimators assuming C n =
cn I M N .
bias. At low SNR a smaller rank should be chosen than at high SNR. Thus,
an adaptation of the rank R to the SNR is required. The MMSE estimator
performs the necessary bias-variance trade-off based on C h and C n .
The correlator and MF estimator are simple estimators with a low and
comparable numerical complexity. As discussed in [56], only a suboptimum
estimate of C h is required for the MF. Regarding the possibilities to reduce
the complexity of the RML and MMSE estimators with efficient implemen-
tations and approximations, it is difficult to draw a final conclusion for their
complexity. Roughly, the complexity of the MMSE and RML estimators is
also similar [56], but the RML approach requires an additional EVD of C h .
The ML estimator is considerably less complex. The order of required float-
ing point operations is given in Table 2.2 for C n = cn I M N .12 For the MF,
RML, and MMSE estimators, the covariance matrix C h has to be estimated
(Section 3). This increases their complexity compared to the ML estimator.
A low-complexity tracking algorithm for the MMSE estimator is introduced
12
This assumption influences only the order of the matched filter.
Fig. 2.7 Normalized MSE of channel estimators for M = 8, K = 1, L = 4 (frequency-

selective), and N = 16.
Fig. 2.8 Normalized MSE of channel estimators for M = 8, K = 1, L = 4 (frequency-

selective), and N = 16. Comparison with reduced-rank ML estimators.
in [141], whereas tracking of the projector on the R-dimensional subspace is

possible for the RML (e.g. [219]).
2.3 Channel Prediction
The observation of Q previous realizations for the channel parameters hT [q] =

[h[q − 1]T , h[q − 2]T , . . . , h[q − Q]T ]T ∈ CQP is described by the stochastic
signal model (2.9)
yT [q] = S T hT [q] + nT [q] ∈ CQM N . (2.48)

2.3 Channel Prediction 31
They are observed indirectly via known training sequences which determine
S T . This observation is disturbed by additive noise nT [q]. To estimate h[q]
based on y T [q], we first assume that the joint pdf ph,yT (h[q], y T [q]) is given.
2.3.1 Minimum Mean Square Error Prediction
Given the stochastic model ph,yT (h[q], y T [q]), Bayesian parameter estimation
gives the framework for designing optimum predictors. As in Section 2.2.1,
we choose the MSE as average measure of the estimation quality. The main
ideas are reviewed in Section 2.2.1 and can be applied directly to the prob-
lem of channel prediction. First we treat the general case of predicting the
channel vector h[q] ∈ CP before focusing on the prediction of a single chan-
nel parameter in h[q]. The restriction to the prediction of a scalar allows
us to include standard results about prediction of parameters or time series
from Wiener and Kolmogorov (see references in [182, chapter 10]) such as
asymptotic performance results for Q → ∞.13
2.3.1.1 Multivariate Prediction
As discussed in Section 2.2.1, the MMSE estimate ĥ[q] is the (conditional)

mean of the conditional pdf ph|yT (h[q]|y T [q]). If ph,yT (h[q], y T [q]) is the zero-
mean complex Gaussian pdf, the conditional mean is linear in y T [q]
ĥ[q] = Eh|yT [h[q]] = W y T [q]. (2.49)
Thus, the assumption of a linear estimator is not a restriction and W can

also be obtained from
h i
2
min Eh,yT kh[q] − W yT [q]k2 . (2.50)
W ∈CP ×QM N
For the model (2.48), the estimator W ∈ CP ×QM N , which can also be inter-
preted as a MIMO FIR filter of length Q, reads [124]
W = C hyT C −1
yT
−1
= C hhT S H H
T S T C hT S T + C nT (2.51)
where C hhT = E[h[q]hT [q]H ]. It can be decomposed into a ML channel esti-

q−1
mation stage (2.37), which also provides a sufficient statistic for {h[i]}i=q−Q ,
13
The general case of vector or multivariate prediction is treated in [234, 235]. The
insights which can be obtained for the significantly simpler scalar or univariate case
are sufficient in the context considered here.
and a prediction stage with W P ∈ CP ×QP

−1
W = C hhT C hT + (S H −1
T C nT S T )
−1
(I Q ⊗ W ML ) . (2.52)
| {z }
WP
The advantage of this decomposition is twofold: On the one hand, it allows for
a less complex implementation of W exploiting the structure of the related
system of linear equations (2.51). On the other hand, we can obtain further
insights separating the tasks of channel estimation and prediction; we focus
on the prediction stage W P of the estimator W in the sequel. Denoting the
ML channel estimates by h̃[q] = W ML y[q] and collecting them in h̃T [q] =
[h̃[q − 1]T , h̃[q − 2]T , . . . , h̃[q − Q]T ]T ∈ CQP we can rewrite (2.48)
ĥ[q] = W P h̃T [q] = W P (I Q ⊗ W ML ) y T [q]. (2.53)
Example 2.1. Consider the case of white noise C nT = cn I QM N and or-

thogonal training sequences S H T S T = N cs I QP , where cs denotes the
power of a single scalar symbol of the training sequence. We as-
sume h[q] = [h1 [q], h2 [q], . . . , hP [q]]T to be uncorrelated, i.e., C h =
diag [ch1 [0], ch2 [0], . . . , chP [0]], with autocovariance sequence chp [i] =
PP
E[hp [q]hp [q − i]∗ ] for its pth element. Thus, we obtain C hT = p=1 C hT,p ⊗
ep eTp from (2.10), where C hT,p ∈ TQ +,0 is Toeplitz with first row
[chp [0], chp [1], . . . , chp [Q − 1]]. With chp hT,p = [chp [1], chp [2], . . . , chp [Q]], the
crosscovariance matrix in (2.52) reads
P
X
C hhT = chp hT,p ⊗ ep eT
p. (2.54)
p=1
Applying these assumptions to the predictor W P from (2.52), we get

P −1
X cn
WP = chp hT,p C hT,p + ⊗ ep eT
p, (2.55)
p=1 |
N cs
{z }
wT
p
i.e., the prediction stage is decoupled for every parameter hp [q], and we can
write
ĥp [q] = wT
p h̃T,p [q] (2.56)
with h̃T,p [q] = [h̃p [q − 1], h̃p [q − 2], . . . , h̃p [q − Q]]T . ⊓

⊔
Fig. 2.9 Linear prediction of hp [q] one step ahead based on Q past noisy
observations h̃p [q] and the FIR filter wp [q] with non-zero coefficients wp =
[wp [1], wp [2], . . . , wp [Q]]T : ĥp [q] = Q
P
ℓ=1 wp [ℓ]h̃p [q − ℓ].
2.3.1.2 Univariate Prediction
The prediction problem can also be treated separately for every parameter
hp [q], i.e., it is understood as a univariate prediction problem. This is equiv-
alent to restricting the structure of W in optimization problem (2.50) to
W = W P (I Q ⊗ W ML ) with
P
X
WP = wT T
p ⊗ ep ep . (2.57)
p=1
With this restriction on the domain of W , problem (2.50) is equivalent to

P
X
2

min Ehp ,h̃T,p hp [q] − wT
p h̃T,p [q] . (2.58)
{wp }P
p=1 p=1
Of course, the MMSE achieved by this solution is larger than for the general
multivariate prediction (2.50) because the correlations C h and C ε (2.38) are
not exploited. The MMSEs are equal for the assumptions made in Exam-
ple 2.1 at the end of the previous subsection. The advantage of this solution
is its simplicity and, consequently, reduced complexity.
The estimate of hp [q] is ĥp [q] = wT p h̃T,p [q] (2.56). It is based on the esti-
mated channel parameters in Q previous blocks h̃T,p [q] = [h̃p [q − 1], h̃p [q −
2], . . . , h̃p [q − Q]]T , which we model as
h̃p [q] = hp [q] + ep [q] (2.59)
with ep [q] = eT
p W ML n[q], ep [q] ∼ Nc (0, cep ), and
cn
cep = (S H C −1
n S)
−1
p,p
≥ , (2.60)
N cs
where the inequality holds for C n = cn I M N . The last inequality gives a lower
bound, which is achieved for S H S = N cs I P .
The terms in the cost function (2.58) are decoupled for every parameter
and the solution is
−1
wT
p = chp hT,p C hT,p + cep I Q . (2.61)
The MMSE for the univariate predictor of hp [q] is

E |εQ,p [q]|2 = chp [0] − wTp chT,p hp (2.62)
with estimation error εQ,p [q] = ĥp [q] − hp [q] (Figure 2.9).
Considering prediction based on observations of all past samples of the
sequence h̃p [q], i.e., for a predictor length Q → ∞, we arrive at the classi-
cal univariate prediction theory of Wiener and Kolmogorov [182]. For this
asymptotic case, the MMSE of the predictor can be expressed in terms of the
power spectral density
∞
X
Shp (ω) = chp [ℓ] exp(−jωℓ) (2.63)
ℓ=−∞
of hp [q] with autocovariance sequence chp [ℓ] = E[hp [q]hp [q − ℓ]∗ ] and
reads [196]
Z π
1 Sh (ω)
E |ε∞,p [q]|2 = cep exp ln 1 + p dω − 1
2π −π cep (2.64)
2

≤ E |εQ,p [q]| .
It is a lower bound for E[|εQ,p [q]|2 ] in (2.62).

In case the past observations of the channel h̃p [q] are undistorted, i.e.,
cep → 0 and h̃p [q] = hp [q], the MMSE (2.64) converges to [196]
Z π
1
E |ε∞,p [q]|2 = exp ln Shp (ω) dω . (2.65)
2π −π
An interpretation for this relation is given by Scharf [182, p.431 and 434]:
E[|ε∞,p [q]|2 ]/chp [0] is a measure for the flatness of the spectrum because in
the case of Shp (ω) = chp [0], −π ≤ ω ≤ π, we have E[|ε∞,p [q]|2 ]/chp [0] = 1,
i.e., it is a white and not predictable random sequence. Moreover, the expo-
nent in (2.65) determines the entropy rate of a stationary Gaussian random
sequence [40, p. 274].
2.3.2 Properties of Band-Limited Random Sequences
In practice, the time-variance of the channel parameters is limited, since

the mobility or velocity of the transmitter, receiver, or their environment
is limited. For a sufficiently high sampling rate, autocovariance sequences
with a band-limited power spectral density are an accurate model. The set
of band-limited power spectral densities of hp [q] with maximum frequency
ωmax,p = 2πfmax,p and variance chp [0] is denoted
(
Bhp ,fmax,p = Shp (ω) Shp (ω) = 0 for ωmax,p < |ω| ≤ π,
Z )
ωmax,p
1
ωmax,p = 2πfmax,p < π, Shp (ω) ≥ 0, chp [0] = Shp (ω)dω .
2π −ωmax,p
(2.66)
In wireless communications, fmax,p is the maximum Doppler frequency of the

channel normalized by the sampling period, i.e., the block period Tb in our
case.
Random processes with band-limited power spectral density (ωmax,p <
π) belong to the class of deterministic processes (in contrast to regular
processes) because [169]
Z π
1
ln Shp (ω) dω = −∞. (2.67)
2π −π
Applying this property to (2.65) we obtain E[|ε∞,p [q]|2 ] = 0 if cep → 0,

i.e., deterministic processes are perfectly predictable. For cep > 0, the power
spectral density of h̃p [q] is not band-limited anymore and perfect prediction
is impossible.
At first glance, this property is surprising: Intuitively, we do not expect
a “regular” random process to have enough structure such that perfect pre-
diction could be possible. In contrast to random processes, the values of
deterministic functions can be predicted perfectly as soon as we know the
class of functions and their parameters. This is one reason why the formal
classification into regular and deterministic random processes was introduced.
In the literature about prediction (or extrapolation) of band-limited ran-
dom sequences this classical property of perfect predictability, which goes
back to Szegö-Kolmogorov-Krein [233], seems to be less known and alterna-
tive proofs are given: For example, Papoulis [162],[164, p. 380], [163] shows
that a band-limited random process can be expressed only by its past sam-
ples if sampled at a rate greater than the Nyquist rate. But Marvasti [138]
states that these results go back to Beutler [22]. In [27], an alternative proof
to Papoulis based on the completeness of a set of complex exponentials on
a finite interval is given. These references do not refer to the classical prop-
erty of perfect predictability for deterministic random processes, however, the
new proofs do not require knowledge about the shape of the power spectral
density.
Consequently, perfect prediction based on the MSE criterion requires per-
fect knowledge of the power spectral density, but perfect prediction can be
achieved without any knowledge about the power spectral density except that
it belongs to the set Bhp ,fmax,p with given maximum frequency. Algorithms
and examples are given in [27, 222]. The practical problem with perfect pre-
diction is that the norm of the prediction filter is not bounded as Q → ∞
[164, p. 198]. Thus, perfect prediction is an ill-posed problem in the sense of
Hadamard [212].
The necessary a priori information for prediction can be even reduced
further as in [28, 153, 44]: Prediction algorithms for band-limited sequences
are developed which do not require knowledge about the bandwidth, but the
price is a larger required sampling rate.
The following two observations provide more understanding of the perfect
predictability of band-limited random processes.
1. A realization of an ergodic random process with band-limited power spec-
tral density is also band-limited with probability one, i.e., every realization
itself has a rich structure.
The argumentation is as follows: The realization of a random process has
finite power (second order moment) but is generally not of finite energy,
i.e., it is not square summable. Thus, its Fourier transform does not exist
[178, 169] and the common definition of band-limitation via the limited
support of the Fourier transform of a sequence is not applicable. An alter-
native definition of band-limitation is reported in [178]: A random sequence
with finite second order moment is called band-limited if it passes a filter
g[q], whose Fourier transform is G(ω) = 1, |ω| ≤ ωmax , zero elsewhere,
without distortion for almost all realizations. Employing this definition we
can show that the realization of a random process with band-limited power
spectral density is band-limited in the generalized sense with probability
one.14
This result is important because all realizations can now be treated equiv-
alently to deterministic band-limited sequences — except that they are
power-limited and not energy-limited. Thus, the results in the literature,
e.g., [27, 222, 28, 153], for deterministic signals are also valid for random
processes with probability one.
14
Proof: Consider an ergodic random sequence h[q] with power spectral density
Sh (ω) band-limitedPto ωmax , i.e., Sh (ω)G(ω) = Sh (ω). The filter g[q] is defined above.
Because E[|h[q] − ∞ 2
ℓ=−∞ g[ℓ]h[q − ℓ]| ] = 0, i.e., the output of g[q] ∗ h[q] converges to
h[q] in the mean-square sense, the filter output is also equal to the random sequence
h[q] with probability one [200, p. 432]. Thus, almost all realization h[q] of h[q] pass
the filter undistorted, i.e., are band-limited in this generalized sense.
2.4 Minimax Mean Square Error Estimation 37
2. For a sequence h[q] band-limited to ωmax < π, we construct a high pass

filter f [q] with cut-off frequency ωmax and f [0] = 1. The sequence at the
PQ
filter output e[q] = ℓ=0 f [ℓ]h[q − ℓ] has “small” energy, i.e., e[q] ≈ 0. For
Q → ∞, the stop-band of the filter f [q] can be designed with arbitrary
accuracy,
P∞ which results in e[q] = 0. We can rewrite this equation in h[q] =
− ℓ=1 f [ℓ]h[q − ℓ], i.e., the low-pass prediction filter w[q] = −f [q], q =
1, 2, . . . , ∞, achieves perfect prediction [222]. Thus, it can be shown that
perfect prediction for band-limited (equivalently for bandpass) processes
can be achieved knowing only the maximum frequency ωmax if the sampling
rate is strictly larger than the Nyquist rate [222]. The design of a high-pass
filter is one possible approach.
Example 2.2. Let us consider the rectangular power spectral density

(
π c [0], |ω| ≤ ω
SR (ω) = ωmax,p
hp max,p
, (2.68)
0, ωmax,p < |ω| ≤ π
which yields the prediction error (2.64)

" ωmax,p /π #
2
πchp [0]
E |ε∞,p [q]| = cep 1+ −1 . (2.69)
cep ωmax,p
For ωmax,p = π, we get E[|ε∞,p [q]|2 ] = chp [0], whereas for ωmax,p < π and
cep → 0 perfect prediction is possible, i.e., E[|ε∞,p [q]|2 ] = 0. ⊓
⊔
2.4 Minimax Mean Square Error Estimation
When deriving the MMSE channel estimator and predictor in Sections 2.2.1
and 2.3.1 we assume perfect knowledge of the joint probability distribution
for the observation and the parameters to be estimated. For the Gaussian
distribution, the estimators are linear in the observation and depend only on
the first and second order moments which are assumed to be known perfectly.
The performance of the Bayesian approach depends on the quality of the
a priori information in terms of this probability distribution. The probability
distribution is often assumed to be Gaussian for simplicity; the covariance
matrices are either estimated or a fixed “typical” or “most likely” covariance
matrix is selected for the specified application scenario. It is the hope of
the engineer that the errors are small enough such that the performance
degradation from the optimum design for the true parameters is small.
The minimax approach to MSE estimation aims at a guaranteed MSE
performance for a given class of probability distributions optimizing the esti-
mator for the worst case scenario. Thus, for a well chosen class of covariance
matrices, e.g., based on a model of the estimation errors or restriction to
possible scenarios, a robust estimator is obtained. If a large class is chosen,

the resulting robust estimator is very conservative, i.e., the guaranteed MSE
may be too large.
An important question is: For which class of probability distributions is
the linear MMSE solution assuming a Gaussian distribution minimax ro-
bust?15 This will provide more justification for this standard approach and
is discussed below.
2.4.1 General Results
Two cases are addressed in the literature:

• A minimax formulation for the classical Wiener-Kolmogorov theory is
given by Vastola and Poor [225], which treats the case of univariate linear
estimation based on an observation of infinite dimension. See also the re-
view of Kassam and Poor [123] or the more general theory by Verdu and
Poor [226].
• Multivariate minimax MSE estimation based on an observation of finite
dimension is discussed by Vandelinde [223], where robustness w.r.t. the
type of probability distribution is also considered. Soloviov [199, 198, 197]
gives a rigorous derivation of these results.
We briefly review both results in this section.
2.4.1.1 Observation of Infinite Dimension
Given the sequence of scalar noisy observations of the random sequence

h[q] (2.59)
h̃[q ′ ] = h[q ′ ] + e[q ′ ], q ′ = q − 1, q − 2, . . . , (2.70)
we estimate h[q + d] with the causal IIR filter w[ℓ] (w[ℓ] = 0, ℓ ≤ 0)16
∞
X
ĥ[q + d] = w[ℓ]h̃[q − ℓ]. (2.71)
ℓ=1
The estimation task is often called prediction if d = 0 (Figure 2.9), filtering

if d = −1, and smoothing in case d < −1. Formulating this problem in the
frequency domain [200, 225] the MSE reads
15
This is called an inverse minimax problem [226].
16
In [225], the more general case of estimating the output of a general linear filter
applied to the sequence h[q] is addressed.
Fig. 2.10 Linear prediction of h[q] based on Q past noisy observations h̃[q].

2
F(W, Sh , Se ) = E |ε∞ [q]|2 = E h[q + d] − ĥ[q + d]
Z π
1 2
= |exp(−jωd) − W(ω)| Sh (ω) + |W(ω)|2 Se (ω) dω.
2π −π
(2.72)
The transfer function W of w[ℓ] is

∞
X
2
W(ω) = w[ℓ] exp(−jωℓ) ∈ H+ . (2.73)
ℓ=1
It is chosen
R ∞ from the space of causal and square-integrable transfer functions
with −∞ |W(ω)|2 dω < ∞, which is called Hardy space [225] and denoted by
2
H+ . The power spectral densities of the uncorrelated signal h[q] and noise
e[q] are
∞
X
Sh (ω) = ch [ℓ] exp(−jωℓ) ∈ Ch (2.74)
ℓ=−∞
X∞
Se (ω) = ce [ℓ] exp(−jωℓ) ∈ Ce . (2.75)
ℓ=−∞
Both power spectral densities are not known perfectly, but are assumed to
belong to the uncertainty classes Ch and Ce , respectively.
An estimator W is called robust if there exists a finite upper bound on the
MSE F(W, Sh , Se ) over all Sh ∈ Ch and Se ∈ Ce . Thus, the performance
max F(W, Sh , Se )
Sh ∈Ch ,Se ∈Ce
can be guaranteed.
The most robust transfer function WR minimizes this upper bound and is
the solution of the minimax problem
min max F(W, Sh , Se ). (2.76)

W∈H2+ Sh ∈Ch ,Se ∈Ce
The solutions SLh and SLe to the maxmin problem, which is the dual problem
to (2.76),
F(WL , SLh , SLe ) = max min F(W, Sh , Se ) (2.77)

Sh ∈Ch ,Se ∈Ce W∈H2+
are termed least-favorable for causal estimation for the given uncertainty
classes. If the values of (2.76) and (2.77) are equal, then SLh and SLe are least
favorable if and only if SLh , SLe , and WL form a saddle-point solution. This
means they satisfy
F(WL , Sh , Se ) ≤ F(WL , SLh , SLe ) ≤ F(W, SLh , SLe ), (2.78)
and WL is a most robust causal transfer function.

The following theorem by Vastola and Poor [225] states the conditions on
Ch and Ce , under which the optimal transfer function for a least favorable
pair of power spectral densities is most robust.
Theorem 2.1 (Vastola, Poor [225]). If the spectral uncertainty classes
Ch and Ce are such that the following three conditions hold:
1
Rπ
1. sup 2π S (ω)dω < ∞,
−π h
Sh ∈Ch
2. Ch and Ce are convex,
3. At least one of the following properties holds for some ε > 0:
a. Every Sh ∈ Ch satisfies Sh (ω) ≥ ε > 0,
b. Every Se ∈ Ce satisfies Se (ω) ≥ ε > 0,
then the power spectral densities SLh ∈ Ch and SLe ∈ Ce and their optimal
transfer function WL form a saddle-point solution to the minimax prob-
lem (2.76), if and only if SLh , SLe are least favorable for causal estimation,
i.e., they solve (2.77). In this case WL is a most robust transfer function.
The maximization in (2.77) is easier to solve than the original minimax prob-
lem (2.76) if closed-form expressions for the minimum in (2.77) are available,
e.g., (2.64) and (2.65). Among others (see [225]), Snyders [196] gives a gen-
eral method for finding these expressions. With this theorem and closed-form
expressions, often a closed-form robust estimator WR can be found (see ex-
amples in [196]).
2.4.1.2 Observation of Finite Dimension
Next we treat the case of estimating a parameter vector h ∈ CP based on an

Ny -dimensional observation y ∈ CNy . The joint probability distribution is not
known exactly, but belongs to an uncertainty class P of possible probability
distributions.

C h C hy P +Ny
P = ph,y (h, y) µz = 0P +Ny , C z = ∈ C ⊆ S+,0 (2.79)
C yh C y
is the set of all zero mean distributions of z = [h T , y T ]T ∈ CP +Ny having

a covariance matrix C z from an uncertainty class C of positive semidefinite
matrices.17 Although we present the case of zero mean z and non-singular
C y , the derivations in [199, 198, 197] include non-zero mean and singular C y
and in [197] uncertainties in the mean are discussed. The estimator ĥ = f (y )
is chosen from the class of measurable functions F = {f : CNy → CP , ĥ =
f (y)}.
The most robust estimator f R is the solution of the minimax problem
Gmin = min max Eh,y [kh − ĥk22 ]. (2.80)

f ∈F ph,y(h,y)∈P
The least-favorable probability distribution pLh,y (h, y) is defined by

Gmin = G f L , pLh,y (h, y) = max min Eh,y [kh − ĥk22 ]. (2.81)
ph,y(h,y)∈P f ∈F
As for the infinite dimensional case the optimal values of both problems
are identical if and only if f L and pLh,y (h, y) form a saddle point18 , i.e.,
Eph,y(h,y) [kh − f L (y )k22 ] ≤ EpLh,y(h,y) [kh − f L (y )k22 ] ≤ EpLh,y(h,y) [kh − f (y )k22 ].

(2.82)
The equality of (2.80) and (2.81) (minimax equality) holds, because of the
following arguments:
1. The MSE in (2.80) is a linear function in the second order moments and
convex in the parameters of f L if f L is a linear function. Due to the minimax
theorem [193], problems (2.80) and (2.81) are equal.
2. For fixed C z = C Lz (C = {C Lz }), the left inequality in (2.82) is an equality
because the MSE only depends on the second order moments of h and
y for linear f L . For general C, the left inequality is true if we choose C Lz
according to (2.83).
3. For all linear estimators and all distributions ph,y (h, y), pLh,y (h, y) and f L
form a saddle point.
4. For arbitrary nonlinear estimators f (measurable functions in F), the linear
estimate is optimum for a Gaussian distribution pLh,y (h, y) and the right
inequality in (2.82) holds. Therefore, the Gaussian distribution is least-
favorable.
17
Without loss of generality, we implicitly restrict h and y to have uncorrelated and
identically distributed real and imaginary parts.
18
Eph,y(h,y) [•] is the expectation operator given by the probability distribution
ph,y (h, y).
This yields the following theorem due to Soloviov.

Theorem 2.2 (Soloviov [199]). If C is a compact and convex set of positive
semidefinite matrices, then

Gmin = G f L , pLh,y (h, y) = max tr KC y (C z ) (2.83)
C z ∈C

= max tr C h − C hy C y−1 C yh .
C z ∈C
The least favorable solution pLh,y (h, y) in the class P is the normal distribu-
tion
z = [h T , y T ]T ∼ Nc (0P +Ny , C Lz ), (2.84)
where C Lz is any solution of (2.83). The maximally robust solution of the

original problem (2.80) is given by the linear estimator
ĥ = f R (y) = C Lhy C L,−1

y y (2.85)
in case C Ly is non-singular.
The Schur complement in (2.83) is concave in C z . For example, given an
estimate Ĉ z and the constraints kC z − Ĉ z kF ≤ ε and C z 0, problem (2.83)
can be formulated as a semidefinite program [24, p. 168] introducing P slack
variables and applying Theorem A.2 to the diagonal elements of the Schur
complement. Numerical optimization tools are available for this class of prob-
lems (e.g., [207]) that may be too complex for online adaptation in wireless
communications at the moment. But if we can choose C such that it has a
maximum element C Lz ∈ C, in the sense that C Lz C z , ∀C z ∈ C, the solution
is very simple: If the set C has a maximum element C Lz ∈ C, then it follows
from Theorem A.1 that C Lz is a solution to (2.83).
We conclude with a few remarks on minimax estimators and the solution
given by the theorem are in order:
1. Minimax robust estimators guarantee a certain performance Gmin , but
may be very conservative if the set C is “large”. For example, this is the
case when the nominal or most likely parameter from C in the considered
application deviates significantly from the least-favorable parameter. But
a certain degree of conservatism is always the price for robustness and
can be reduced by choosing the set C as small as possible. On the other
hand, if the least-favorable parameter happens to be the most likely or the
nominal parameter, then there is no performance penalty for robustness.
2. The theorem gives the justification for using a linear MMSE estimator,
although the nominal joint pdf is not Gaussian: The Gauss pdf is the
worst case distribution.
3. The theorem also shows the importance of the constraint on the second
order moments: The efficiency of the linear estimator is determined by this
constraint and not the Gaussian assumption [223]. This is due to the fact
that any deviation from the Gaussian pdf with identical constraint set C
on the second order moments C z does not increase the cost in (2.82).
4. The difficulty lies in defining the constraint set C. Obtaining an accurate
characterization of this set based on a finite number of observations is
critical for a robust performance, whereas the assumption of a Gaussian
pdf is already the worst-case model. 19
2.4.2 Minimax Mean Square Error Channel

Estimation
The general results on minimax MSE estimation based on an observation of

finite dimension are now applied to obtain robust MMSE channel estimators
(Section 2.2). In channel estimation we have C y = SC h S H + (I N ⊗ C n,s )
and C hy = C h S H (2.13). Thus, the covariance matrix C z of z = [h T , y T ]T ∈
CP +M N (2.79) is determined by C h and C n,s .
We consider two types of uncertainty sets for C h and C n . The first as-
sumes that only upper bounds on the 2-norm20 of the covariance matrices
are available:
n o
(1) (1)
Ch = C h kC h k2 ≤ αh , C h 0 , (2.86)
n o
Cn(1) = C n C n = I N ⊗ C n,s , kC n,s k2 ≤ αn(1) , C n,s 0 . (2.87)
(1) (1)
The effort to parameterize these sets via the upper bounds αh and αn is
very small.21 Because we employed the 2-norm, both sets have a maximum
(1) (1)
element22 given by C Lh = αh I P and C Ln = αn I M , respectively. Defining
(1) (1)
C (1) = {C z |C h ∈ Ch , C n ∈ Cn } and due to the linear structure of C z in
C h and C n , the maximum element C Lz of C (1) is determined by C Lh and C Ln .
The minimax MSE channel estimator is obtained from

min max Eh,y kh − W y k22 . (2.88)
W C ∈C z
(1)
19
The example from [223] illustrates this: Consider a distribution of a Gauss mixed
with a Cauchy distribution of infinite variance, where the latter occurs only with a
small probability. To obtain a reliable estimate of the large variance would require
many observations.
20
The 2-norm kAk2 of a matrix A is defined as maxx kAxk2 /kxk2 which is its
largest singular value.
21
They may be found exploiting tr[C h ]/P ≤ kC h k2 ≤ kC h kF ≤ tr[C h ]. Thus,
(1)
an estimate of the variances of all channel parameters is sufficient and αh , in this
example, can be chosen proportional to tr[C h ].
22
W.r.t. the Loewner partial order [96], i.e., A B if and only if A − B 0.
According to Theorem 2.2, it is the optimum estimator for the least-favorable

parameters. With Theorem A.1 the least-favorable parameters w.r.t. C (1) are
C Lz and the minimax robust estimator is given by (cf. Eq. 2.22)
!−1
(1)
(1) αn
W MMSE = H
S S+ (1)
IP SH. (2.89)
αh
If estimates Ĉ h and Ĉ n,s of the covariance matrices and a model for the
estimation errors are available, the uncertainty sets can be restricted to
n o
(2) (2)
Ch = C h C h = Ĉ h + E, kEk2 ≤ αh , C h 0 (2.90)
n o
Cn(2) = C n C n = I N ⊗ C n,s , C n,s = Ĉ n,s + E n , kE n k2 ≤ αn(2) , C n 0 .
(2.91)
Because Theorem 2.2 requires compact and convex sets, the errors have to
be modeled as being bounded.23 Again we choose the 2-norm to obtain the
(2) (2)
uncertainty classes Ch and Cn with maximum elements which are C Lh =
(2) (2)
Ĉ h + αh I P and C Ln = I N ⊗ C Ln,s with C Ln,s = Ĉ n,s + αn I M . With (2.22)
this yields the robust estimator
h i−1
(2) (2)
W MMSE = Ĉ h + αh I P S H I N ⊗ (Ĉ n,s + αn(2) I M )−1 S + I P

(2)
× Ĉ h + αh I P S H I N ⊗ (Ĉ n,s + αn(2) I M )−1 . (2.92)
(1) (2)
Both robust estimators W MMSE and W MMSE converge to the ML channel
estimator (2.31) as the 2-norm of C h and of the estimation error of the
(1) (2)
estimate Ĉ h , respectively, increase: αh → ∞ and αh → ∞. This provides
a degree of freedom to ensure a performance of the MMSE approach for
estimated second order moments which is always better or, in the worst case,
equal to the ML channel estimator. The upper bounds in (2.90) and (2.91)
can be chosen proportional to or as a monotonically increasing function of a
rough estimate for the first order moment of kEk22 ≤ kEk2F .24
23
For stochastic estimation errors, the true covariance matrices are only included in
the set with a certain probability. This probability is another design parameter and
gives an upper bound on the probability that the worst case MSE is exceeded (outage
probability).
24
A stochastic model for the estimation error of sample mean estimators for covari-
ance matrices is derived in [238].
Fig. 2.11 Minimax MSE prediction of band-limited random sequences with a norm
constraint on their power spectral density.
2.4.3 Minimax Mean Square Error Channel Prediction
The MMSE predictor in Section 2.3.1 requires the knowledge of the autoco-
variance sequence of the channel parameters. In Section 2.3.2 we introduce the
set Bh,fmax (2.66) of power spectral densities corresponding to band-limited
autocovariance sequences with given maximum (Doppler-) frequency fmax
and variance ch [0]. We would like to obtain minimax MSE predictors for
similar classes C of autocovariance sequences, which require only knowledge
of fmax and a bound on the norm of their power spectral densities. This ro-
bust solution does not require knowledge of the autocovariance sequence and
is maximally robust, i.e., minimizes the worst-case MSE (Figure 2.11).
The case of an observation of infinite dimension Q → ∞ and of fi-
nite dimension Q are treated separately. The derivations are based on the
model (2.56) and (2.59), but in the sequel we omit the index p denoting the
pth channel parameter.
2.4.3.1 Minimax MSE Prediction for Q → ∞ and L1 -Norm
The models for Q → ∞ are given by (2.70) and (2.71) for d = 0. With (2.72)
and (2.76), the minimax robust predictor is given by

min2 max F(W, Sh , Se ) = min2 max E |ε∞ [q]|2 . (2.93)
W∈H+ Sh (ω)∈Bh,fmax W∈H+ Sh (ω)∈Bh,fmax
For simplicity, we assume white noise e[q] with power spectral density
Se (ω) = ce , i.e, known variance ce > 0. Because ce > 0, we have Se (ω) > 0.
Moreover, the set Bh,fmax (2.66) is convex and ch [0] is bounded. Therefore,
the conditions of Theorem 2.1 are satisfied.25 Thus, the least-favorable power
spectral density SLh (ω) and the corresponding optimum predictor WL form
a saddle point for (2.93). Applying the closed-form expression from [196]
(see (2.64)) to (2.93), we obtain
max min F(W, Sh , Se )

Sh (ω)∈Bh,fmax W∈H2+
Z π
1 Sh (ω)
= max ce exp ln 1 + dω − 1 . (2.94)
Sh (ω)∈Bh,fmax 2π −π ce
This is equivalent to maximizing the spectral entropy [123]

Z π
1 Sh (ω)
max ln 1 + dω, (2.95)
Sh (ω)∈Bh,fmax 2π −π ce
which is a variational problem. To solve it we follow the steps in [134], where

the least-favorable power spectral density for the filtering problem d = −1
is derived. We compare this cost function with the spectral entropy of the
rectangular power spectral density SR (ω) (2.68) with cut-off frequency fmax
Z π Z π
1 Sh (ω) 1 SR (ω)
ln 1 + dω − ln 1 + dω (2.96)
2π −π ce 2π −π ce
 
π
Z ωmax Sh (ω) − ch [0]
1  ωmax   dω
= ln 1 + (2.97)
2π −ωmax  πch [0] 
ce 1 +
ωmax ce
π
Z ωmax Sh (ω) − ch [0]
1 ωmax
≤ dω = 0, (2.98)
2π −ωmax πch [0]
ce 1 +
ωmax ce
where the inequality is due to ln(x + 1) ≤ x for x > −1. Equality holds if
and only if Sh (ω) = SR (ω). The optimum value of (2.93) and (2.94) is (cf.
Eq. 2.69)
" ωmax /π #
πch [0]
max min F(W, Sh , Se ) = ce 1+ − 1 . (2.99)
Sh (ω)∈Bh,fmax W ce ωmax
This is the upper bound for the prediction error for all autocovariance se-
quences with power spectral density in Bh,fmax . This argumentation yields
the following theorem.
25
The class Bh,fmax in (2.66) could also be extended to contain all power spectral
densities with L1 norm (variance ch [0]) upper-bounded by β. The following argumen-
tation shows that the least-favorable power spectral density w.r.t. this extended set
has an L1 norm equal to cLh [0] = β.
Theorem 2.3. The worst-case or least-favorable autocovariance sequence for

problem (2.93) is the inverse Fourier transform of SLh (ω) = SR (ω)
sin(2πfmax ℓ)
cLh [ℓ] = ch [0]sinc(2fmax ℓ) = ch [0] . (2.100)
2πfmax ℓ
The corresponding random sequence is the least predictable with maximum

entropy rate [40] in this class. SR (ω) is the maximally flat power spectral
density in Bh,fmax , since E[|ε∞ [q]|2 ]/ch [0] is a measure for the flatness of the
spectrum [182].
It is often mentioned in the literature that this choice of power spectral
density yields results close to optimum performance in various applications
(see [144, p. 658], [9]), but no proof is given for its robustness or optimality
for prediction: Franke, Vastola, Verdu, and Poor [123, 75, 76, 225, 226] either
discuss the general minimax MSE problem or other cases of minimax MSE
prediction (for ce = 0 and no band-limitation); in [134] it is shown that
the rectangular power spectral density is also least-favorable for the causal
filtering problem (2.76) with d = −1 in (2.71).
2.4.3.2 Minimax MSE Prediction for Finite Q and L1 -Norm
For prediction based on Q past observations {h̃[q−ℓ]}Q 2

ℓ=1 , the MSE E[|εQ [q]| ]
Q
depends only on Q + 1 samples {ch [ℓ]}ℓ=0 of the autocovariance sequence.
1
We define the set Ch,f max
of partial sequences {ch [ℓ]}Q
ℓ=0 for which a power
spectral density Sh (ω) with support limited to [−2πfmax , 2πfmax ] and L1 -
norm ch [0] = kSh (ω)k1 bounded by β exists:
n Z π
1 Q 1
Ch,f = {ch [ℓ]}ℓ=0 ∃Sh (ω) ≥ 0 : ch [ℓ] = Sh (ω) exp(jωℓ)dω,
max
2π −π
Z π o
1
ch [0] = Sh (ω)dω ≤ β, Sh (ω) = 0 for 2πfmax < |ω| ≤ π . (2.101)
2π −π
Thus, the partial sequence can be extended to a sequence with a band-limited

power spectral density (Appendix B.2). With Theorem 2.2, the maximally
robust predictor is the solution of

min max E |εQ [q]|2 = max min E |εQ [q]|2 .
w {c [ℓ]}Q ∈C 1 {ch [ℓ]}Q 1 w
h ℓ=0 h,f max ℓ=0 ∈Ch,f max
(2.102)
The equivalent maxmin-optimization problem can be written as (cf. (2.62))

−1
max β − cH
hT h C h̃ chT h (2.103)
{ch [ℓ]}Q 1
ℓ=0 ∈Ch,fmax
T
with C h̃T = C hT + ce I Q , C hT ∈ TQ +,0 with first row [ch [0], . . . , ch [Q − 1]], and
chT h = [ch [1]∗ , . . . , ch [Q]∗ ]T .
1
The set Ch,f max
can be characterized equivalently by Theorem B.3 from
Arun and Potter [6], which yields positive semidefinite constraints on the
Toeplitz matrices C T ∈ TQ+1 ′
+,0 with first row [ch [0], . . . , ch [Q]] and C T ∈ T+,0
Q
with first row [c′h [0], . . . , c′h [Q−1]], where c′h [ℓ] = ch [ℓ−1]−2 cos(2πfmax )ch [ℓ]+
ch [ℓ + 1] and ch [0] = cLh [0] = β. The least-favorable variance is cLh [0] = β.
Therefore, problem (2.103) can be reformulated as
−1
max β − cH
hT h C h̃ chT h s.t. C T 0, C ′T 0. (2.104)
{ch [ℓ]}Q
ℓ=1
T
It can be transformed into a semidefinite program26 which has a unique

solution and can be solved numerically by SeDuMi [207], for example.
If the partial sequence ch [0] = β, ch [ℓ] = 0, for 1 ≤ ℓ ≤ Q, is an element of
1
Ch,f max
, the maximally robust predictor is trivial w = 0Q and the maximum
MSE (2.103) equals cLh [0] = β. Whether this partial sequence has a band-
1
limited positive semidefinite extension, i.e., belongs to Ch,f max
, depends on
the values of Q and fmax . Its existence can be verified by Theorem B.3,
which results in the necessary condition fmax ≥ 1/3 (in the case of Q ≥ 2).
For example, for Q = 3 it exists for all fmax ≥ 0.375 (Example B.1). The
minimum fmax for its existence increases as Q increases. For Q → ∞, the
autocovariance sequence which is zero for ℓ 6= 0 is not band-limited anymore
and its power spectral density is not an element of Bh,fmax for fmax < 0.5 and
ch [0] = β.
2.4.3.3 Minimax MSE Prediction for Finite Q and L∞ -Norm
1
Obviously, for finite Q, uncertainty set Ch,f max
, and fmax < 0.5, the least-
favorable autocovariance sequence differs from (2.100) and the predictor cor-
responding to a rectangular power spectral density is not maximally robust.27
26
Introducing the slack variable t we have
−1
min t s.t. cH
hT h C chT h ≤ t, C T 0, C ′T 0, (2.105)
t,{ch [ℓ]}Q h̃T
ℓ=1
which is equivalent to the semidefinite program

" #
t cH
min t s.t. hT h
0, C T 0, C ′T 0 (2.106)
t,{ch [ℓ]}Q chT h C h̃T
ℓ=1
by means of Theorem A.2.

27
The robustness of a given predictor w, e.g., a good heuristic, can be evaluated by
solving
It has often been stated in the literature that a rectangular power spec-
tral density is a “good” choice for parameterizing an MMSE predictor for
band-limited signals. But for which uncertainty class is this choice maximally
robust?
We change the definition of the uncertainty set and derive a maximally
robust predictor which depends on the (least-favorable) autocovariance se-
quence {cLh [ℓ]}Q
ℓ=0 related to a band-limited rectangular power spectral den-
sity.
In the proofs of [153, 27] for perfect predictability of band-limited random
sequences, where both references consider the case of no noise ce = 0 in
the observation, the class of autocovariance sequences with band-limited and
bounded power spectral density is considered. This motivates us to define the
uncertainty set
n Z π
∞ Q 1
Ch,f = {ch [ℓ]}ℓ=0 ∃Sh (ω) ≥ 0 : ch [ℓ] = Sh (ω) exp(jωℓ)dω,
max
2π −π
o
Sh (ω) ≤ γ, Sh (ω) = 0 for ωmax < |ω| ≤ π, ωmax = 2πfmax , (2.108)
which is based on a bounded infinity norm L∞ : kSh (ω)k∞ = maxω Sh (ω)

of the band-limited power spectral density Sh (ω). The upper bound on the
variance of h[q] is cLh [0] = 2fmax γ, which results from the upper bound γ on
the L∞ norm.
Instead of applying a minimax theorem from above, we solve the minimax
problem

min max E |εQ [q]|2
w {ch [ℓ]}Q ∞ (2.109)
ℓ=0 ∈Ch,fmax
directly, where w = [w[1], w[2], . . . , w[Q]]T ∈ CQ .28 With (2.56) and (2.59)
(omitting the index p), the estimate of h[q] reads
Q
X Q
X Q
X
ĥ[q] = w[ℓ]h̃[q − ℓ] = w[ℓ]h[q − ℓ] + w[ℓ]e[q − ℓ], (2.110)
ℓ=1 ℓ=1 ℓ=1
where we assume white noise e[q] with variance ce . Similar to (2.72), the MSE
E[|εQ [q]|2 ] of the estimation error εQ [q] = ĥ[q] − h[q] can be written as
E |εQ [q]|2
ˆ ˜
max (2.107)
{ch [ℓ]}Q
ℓ=0 ∈C
1
h ,fmax
numerically.
28
Problem (2.109) can also be solved using the minimax Theorem 2.2 together with
Theorem A.1.
Z Q 2
ωmax
1 X 2
E |εQ [q]|2 = Sh (ω) 1 − w[ℓ] exp(−jωℓ) dω + ce kwk2 .
2π −ωmax ℓ=1
(2.111)
Because Sh (ω) ≤ γ, we can bound the MSE by
Z Q 2
ωmax
1 X 2
E |εQ [q]|2 ≤ γ 1− w[ℓ] exp(−jωℓ) dω + ce kwk2 (2.112)
2π −ωmax ℓ=1
Q X
X Q
= γ 2fmax + γ w[ℓ1 ]w[ℓ2 ]∗ 2fmax sinc(2fmax (ℓ1 − ℓ2 ))
ℓ1 =1 ℓ2 =1
Q
X Q
X
−γ w[ℓ] 2fmax sinc(2fmax ℓ) − γ w[ℓ]∗ 2fmax sinc(2fmax ℓ)
ℓ=1 ℓ=1
2
+ ce kwk2

= c̄εQ {w[ℓ]}Q
ℓ=1 , (2.113)
which is convex in w. The bound on the MSE is achieved with equality for
the rectangular power spectral density SR (ω) with variance cLh [0] = 2fmax γ.
∞
Due to the convexity of Ch,f max
, Theorem 2.2 holds, and SR (ω) is the least-
favorable power spectral density. This is similar to Q → ∞ and Bh,fmax (2.93),
but now the variance of the uncertainty set (2.108) is not fixed.
The necessary and sufficient conditions for the minimum of the upper
bound c̄εQ [q] ({w[ℓ]}Q
ℓ=1 ) form a system of Q linear equations

Q Q
!
∂c̄εQ [q] {w[ℓ]}ℓ=1 X
= 2fmax γ w[ℓ1 ]sinc(2fmax (ℓ1 − ℓ)) − sinc(2fmax ℓ)
∂w[ℓ]∗
ℓ1 =1
+ ce w[ℓ] = 0, ℓ = 1, 2, . . . , Q. (2.114)
In matrix-vector notation it reads29

C LhT + ce I Q w = cLhT h , (2.115)
where cLhT h = cLh [0][sinc(2fmax ) , sinc(2fmax 2) , . . . , sinc(2fmax Q)]T and C LhT ∈

TQ L
+,0 is Toeplitz with first row ch [0] · [1, sinc(2fmax ) , . . . , sinc(2fmax (Q − 1))].
We collect these results in a theorem:
Theorem 2.4. The solution of (2.109) is identical to (2.61) with the least-
favorable autocovariance sequence
29
Applying the Levinson algorithm [170] it can be solved with a computational
complexity of order Q2 .
cLh [ℓ] = 2fmax γ sinc(2fmax ℓ) (2.116)
corresponding to the least-favorable rectangular power spectral density SR (ω).

∞
Over the class Ch,f max
the achievable MSE is upper bounded by
−1
min E |εQ [q]|2 = ch [0] − cH
hT h (C hT + ce I Q ) chT h
w

≤ min max E |εQ [q]|2
w {c [ℓ]}Q ∈C ∞
h ℓ=0 h,f max
−1
T
= 2fmax γ − cLhT h C LhT + ce I Q cLhT h
∀ {ch [ℓ]}Q ∞
ℓ=0 ∈ Ch,fmax . (2.117)
Obviously, it can also be shown in a similar fashion that the rectangu-

lar power spectral density is least-favorable for the finite dimensional filter-
ing and smoothing/interpolation problem and the same uncertainty class.
Knowing the literature, this result is not very surprising, but confirms the
empirical observations of many authors: Interpolation (extrapolation) with
the sinc-kernel as in (2.115) or the parameterization of an MMSE estimator
with (2.116) is widely considered a “good choice” [35, 9]. To our knowledge,
the optimality and minimax robustness of this choice has not been shown ex-
plicitly before. Many publications dealing with extrapolation or interpolation
of band-limited signals assume that a noiseless observation (ce = 0) of the
signal is available [34]. For the noiseless case, it is well known that perfect
prediction (interpolation) can be achieved with (2.115) for Q → ∞. But it is
also clear that the system of equations (2.115) becomes ill-conditioned (for
ce = 0) as Q grows [195, 194]. Thus, standard regularization methods can
and must be applied [154].30
Moreover, the minimax robustness of the rectangular power spectral den-
∞
sity for Ch,f max
shows that the guaranteed MSE (2.117) for this robust pre-
dictor is very conservative if the nominal power spectral density deviates
significantly from the rectangular power spectral density. An example is the
power spectral density with band-pass shape in Figure 2.12, which is included
∞
in the uncertainty set Ch,f max
with parameters γ = 50 for fmax = 0.1. If the
∞
uncertainty class Ch,fmax should also contain these power spectral densities,
the upper bound γ must be large. Thus, the guaranteed MSE (2.117) will
be large in general. A smaller MSE can only be guaranteed by choosing the
1
more appropriate uncertainty class Ch,f max
.
It is straightforward to generalize our results for the uncertainty class
∞
Ch,f max
to
30
For example, when predicting a random sequence, a Tikhonov regularization [212]
of the approximation problem in equation (1.2) of [153] would also lead to (2.113).
But in the cost function (2.113), the regularization parameter is determined by the
considered uncertainty class and the noise model.
Fig. 2.12 Rectangular, band-pass, and Clarke’s power spectral density (psd) Sh (2πf )
[206, p. 40] with maximum frequency fmax = 0.1 (ωmax = 2πfmax ) and ch [0] = 1.
n Z π
Q 1
Chgen = {ch [ℓ]}ℓ=0 ∃Sh (ω) ≥ 0 : ch [ℓ] = Sh (ω) exp(jωℓ)dω,
2π −π
o
L(ω) ≤ Sh (ω) ≤ U(ω) , (2.118)
with arbitrary upper and lower bounds U(ω) ≥ 0 and L(ω) ≥ 0. The least-
favorable power spectral density is given by SLh (ω) = U(ω), since the MSE can
be bounded from above, similar to (2.113). Thus, in case of a nominal power
spectral density deviating significantly from the rectangular power spectral
density, the uncertainty class can be easily modified to obtain a less conserva-
tive predictor. For example, piece-wise constant power spectral densities can
be chosen whose additional parameters can be adapted measuring the signal
power in different predefined frequency bands.
2.4.3.4 Performance Evaluation
The performance of the minimax robust predictors (2.104) and (2.115) is

evaluated numerically for the power spectral densities shown in Figures 2.12
and 2.13 and for different maximum frequencies fmax . In the figures, “Mini-
max MSE L1 ” denotes the robust predictor based on (2.104) (L1 -norm) and
“Minimax MSE L∞ ” based on (2.115). They are compared with the MMSE
predictor assuming perfect knowledge of the autocovariance sequence and fi-
nite Q (“MMSE (perfect knowledge)”), for Q = 5. The MMSE for prediction
with perfect knowledge and Q → ∞ is denoted “MMSE (Q → ∞)”. As a
reference, we give the MSE if the outdated observation is used as an estimate
ĥ[q] = h̃[q−1] (“Outdating”). This simple “predictor” serves as an upper limit
for the MSE, which should not be exceeded by a more sophisticated approach.
The upper bounds for the MSEs of the minimax predictors given by (2.104)
Fig. 2.13 Rectangular power spectral density (psd) Sh (2πf ) with maximum frequen-
cies fmax ∈ {0.05, 0.1, 0.2} and ch [0] = 1.
and (2.117) (“Minimax MSE L1 (upper bound)” and “Minimax MSE L∞ (up-
per bound)”) are guaranteed for the corresponding minimax predictor if the
1 ∞
considered power spectral density belongs to Ch,f max
or Ch,f max
, respectively.
1
For Ch,fmax , we choose β = 1 which is the true variance ch [0] = 1. When
we use Clarke’s power spectral density, which is not bounded and does not
∞
belong to Ch,f max
for finite γ, the choice of γ = 2/fmax yields only an approx-
imate characterization of the considered nominal power spectral density (see
Figure 2.12). At first we assume perfect knowledge of fmax .
Figure 2.14 illustrates that prediction based on Q = 5 past observations
{h̃[q − ℓ]}Q ℓ=1 is already close to the asymptotic MSE for Q → ∞.
31
The
result is given for Clarke’s power spectral density (fmax = 0.1, ch [0] = 1)
and ce = 10−3 , but Q = 5 is also a good choice w.r.t. performance and
numerical complexity for other parameters and power spectral densities. We
choose Q = 5 in the sequel.
Because Clarke’s power spectral density is rather close to a rectangular
power spectral density (Figure 2.12), except for f ≈ ±fmax , the MSEs of the
minimax predictors (γ = 2/fmax ) are close to the MMSE predictor with per-
fect knowledge (Figure 2.15). The graph for perfect knowledge is not shown,
because it cannot be distinguished from the minimax MSE performance based
1 ∞
on Ch,f max
. Furthermore, the minimax MSE based on Ch,f max
is almost iden-
1
tical to the upper bound for Ch,fmax . The upper bounds on the MSEs (2.104)
and (2.117) are rather tight for low fmax .
A different approach to the prediction or interpolation of band-limited
signals is based on discrete prolate spheroidal sequences (DPSS) [195, 48].
Prolate spheroidal sequences can be shown to be the basis which achieves
the most compact representation of band-limited sequences concentrated in
31
The MSE for Q → ∞ is given by (2.64) with Equations (27)-(30) in [134].
Fig. 2.14 MSE E[|εQ [q]|2 ] for prediction of h[q] for different number Q of obser-
vations {h̃[q − ℓ]}Q
ℓ=1 . (Model: Clarke’s power spectral density (Figure 2.12) with
ch [0] = 1, ce = 10−3 , fmax = 0.1)
Fig. 2.15 MSE E[|εQ [q]|2 ] vs. maximum frequency fmax for prediction of h[q] based
on Q = 5 observations {h̃[q − ℓ]}Q ℓ=1 . (Model: Clarke’s power spectral density (Fig-
ure 2.12) with ch [0] = 1, ce = 10−3 , Q = 5)
the time domain with a fixed number of basis vectors, i.e., a reduced dimen-
sion. But the norm of the resulting linear filters becomes unbounded as the
observation length increases. This requires regularization methods [154] to
decrease the norm and amplification of the observation noise. A truncated
singular valued decomposition [48] or a reduction of the dimension [242] are
most common.
We use two approaches as references in Figure 2.16: “DPSS (fixed di-
mension)” chooses the approximate dimension ⌈2fmax Q⌉ + 1 of the signal
subspace of band-limited and approximately time-limited signal, which is the
time-bandwidth dimension [48, 195], and uses the algorithm in Section V of
on Q = 5 observations {h̃[q − ℓ]}Q
ℓ=1 : Comparison of minimax prediction (2.115) with
prediction based on discrete prolate spheroidal sequences (DPSS) [48, Sec. V]. (Model:
Clarke’s power spectral density (Figure 2.12) with ch [0] = 1, ce = 10−3 , Q = 5)
on Q = 5 observations {h̃[q − ℓ]}Q ℓ=1 : Sensitivity of minimax prediction (2.115) de-
fix
signed for a fixed maximum frequency fmax = 0.1 and γ = 20 to a different nominal
maximum frequency fmax . (Model: Rectangular power spectral density (Figure 2.13)
with ch [0] = 1, ce = 10−3 , Q = 5)
[48] without any additional regularization. This choice of dimension for the
subspace is also proposed in [242], which solves an interpolation problem.
The performance of the DPSS based predictor depends on the selected di-
mension of the signal subspace, which is given from a bias-variance trade-off.
For “DPSS (optimum dimension)”, the MSE is minimized over all all possible
dimensions in the set {1, 2, . . . , Q}, i.e., we assume the genie-aided optimum
choice of dimension w.r.t. the bias-variance trade-off. The minimax robust
∞
predictor for Ch,f max
determines its regularization parameter from the noise
Fig. 2.18 Relative degradation of the MSE E[|εQ [q]|2 ] for minimax predic-
tion (2.115) designed as for Figure 2.17 w.r.t. the MMSE predictor with perfect
knowledge of the power spectral density and the nominal maximum frequency fmax .
(Model: Rectangular power spectral density (Figure 2.13) with ch [0] = 1, ce = 10−3 ,
Q = 5)
and signal uncertainty class. Besides its simplicity, its MSE is smaller than
for the DPSS based approach (Figure 2.16).
Next we investigate the sensitivity of the minimax robust predictor (2.115)
∞
for Ch,f max
to the knowledge of fmax . As shown in Figure 2.13, we fix fmax
fix
to fmax = 0.1 for designing the robust predictor, but fmax of the nomi-
nal rectangular power spectral density changes from 0.02 to 0.2. Choosing
γ = 20, the uncertainty set contains the nominal power spectral density for
fmax ∈ [0.025, 0.1] because ch [0] = 1 = 2fmax γ (2.116) has to be satisfied.
Thus, the upper bound (2.117) is the guaranteed performance only in the
interval [0.025, 0.1] and in Figure 2.17 we give the upper bound only for
fix
fmax ≤ 0.1. The shape of the graphs is also typical for other choices of fmax
and types of power spectral densities. For a deviation in MSE (of the mis-
matched minimax predictor relative to the MMSE with perfect knowledge of
the power spectral density) which is smaller than two, the nominal maximum
frequency is allowed to deviate from the design point of fmax by −40% or
+60% (Figure 2.18). The relative degradation is approximately symmetric
w.r.t. fmax = 0.1. Overestimating fmax is less critical because the worst-case
MSE is given by (2.117).
Finally, we choose the band-pass power spectral density from Figure 2.12
with support in the intervals [0.9fmax , fmax ] and [−fmax , −0.9fmax ] and vari-
ance ch [0] = 1. This power spectral density deviates significantly from the
rectangular power spectral density (Figure 2.12). For example, for fmax = 0.1
∞
we have to set γ = 50 to include the power spectral density into Ch,f max
. The
MSEs of the minimax predictors as well as their upper bounds increase with
fmax , as for Clarke’s power spectral density (Figure 2.19). But the MSE of
the MMSE predictor with perfect knowledge of the power spectral density is
on Q = 5 observations {h̃[q − ℓ]}Q ℓ=1 . (Model: Band-pass power spectral density
(Figure 2.12) with ch [0] = 1, ce = 10−3 , Q = 5, γ = 1/(fmax − 0.9fmax )/2)
almost independent of fmax . Both minimax predictors are very conservative.

For example, the mismatch of the band-pass to the worst-case power spectral
∞
density of the uncertainty class Ch,f max
which is rectangular is significant.
The choice of a more detailed model for the power spectral density Sh (ω) or
direct estimation of the autocovariance sequence ch [ℓ], which is presented in
Chapter 3, reduces the degradation significantly.
2.4.3.5 Summary of Minimax MSE Prediction
For wireless communications, the L1 -norm is the more appropriate norm for
the channel parameters’ power spectral density Sh (ω) because the variance
ch [0] can be estimated directly. The related robust predictor from (2.104)
yields a smaller MSE than its counterpart for the L∞ -norm. Its major dis-
advantage is the computational complexity for solving (2.104). In general,
it is considerably larger (by one order in Q) than for the linear system of
equations in (2.115). But, since the results depend only on β and fmax , they
could be computed offline with sufficient accuracy. On the other hand, the
approach based on the L∞ -norm can be modified easily to decrease the size of
the corresponding uncertainty class such as in (2.118). This is very important
in practice to improve performance in scenarios deviating significantly from
the rectangular power spectral density.
Only for Q → ∞, the rectangular power spectral density is least-favorable
1
for Ch,f max
(Theorem 2.3). Whereas the corresponding partial autocovariance
∞
sequence is always least-favorable for Ch,f max
(Theorem 2.4) with generally
larger worst-case MSE. Thus, this prominent “robust” choice of the MMSE
1 ∞
predictor is not maximally robust for Ch,f max
but for Ch,f max
.
To conclude, we illustrate the robust solutions for the uncertainty sets

1 ∞
Ch,f max
and Ch,f max
by an example.
1
Example 2.3. Consider the case of Q = 1, i.e., ĥ[q] = wh̃[q−1]. For Ch,f max
, the
L
robust predictor reads w = ch [1]/(β + ce ) with the least-favorable correlation
coefficients cLh [0] = β and cLh [1] = max(β cos(2πfmax ), 0).32 The worst-case
MSE
max ch [0] − |ch [1]|2 (ch [0] + ce )−1 =

{ch [ℓ]}1ℓ=0 ∈Ch,f
1
max
(
β
β 1− β+ce cos2 (2πfmax ) , fmax < 1/4
= (2.119)
β, fmax ≥ 1/4
is bounded by β, which is reached for fmax ≥ 1/4.

∞
For Ch,f max
, we get w = 2fmax γsinc(2fmax ) /(2fmax γ + ce ) and the maxi-
mum guaranteed MSE
max ch [0] − |ch [1]|2 (ch [0] + ce )−1 =

{ch [ℓ]}1ℓ=0 ∈Ch,f
∞
max

2fmax γ 2
= 2fmax γ 1 − sinc (2fmax ) ≤ 2fmax γ (2.120)
2fmax γ + ce
is bounded by 2fmax γ > ch [0].

The maximally robust predictors w differ for fmax < 0.5 and the maximum
∞ 1
MSE for Ch,f max
is larger than for Ch,f max
for 2fmax γ > β. ⊓
⊔
32
cL ′
h [1] results from C T 0 and maximizes the MMSE in (2.119).
Chapter 3
Estimation of Channel and Noise
Covariance Matrices
Estimation of channel and noise covariance matrices is equivalent to estima-

tion of the structured covariance matrix of the receive signal [103, 54, 41, 220].
We start with an overview of the theory behind this problem, previous ap-
proaches, and selected aspects.
Estimation of a general structured covariance matrix has been studied
intensively in the literature:
• The properties of the Maximum Likelihood (ML) problem for estimation
of a structured covariance matrix are presented by Burg et al.[29] and,
in less detail for a linear structure, by Anderson [4, 5]. Both references
also propose algorithmic solutions for the generally nonlinear likelihood
equation.1
• Uniqueness of a ML estimate for doubly symmetric structure is established
in [176]. Other structures are treated in [46, 23].
• Robust ML estimation in case of an uncertain probability distribution is
discussed by Williams et al.[118].
Covariance matrices with Toeplitz structure are an important class in many
applications. Several properties of the corresponding ML estimation problem
have been described:
• Conditions for existence of an ML estimate are presented by Fuhrmann
and Miller [81, 78, 77, 79]. The existence of local maxima of the likelihood
function is established by [45]. The EM algorithm is applied to this prob-
lem by [81, 149, 47, 150]. This algorithm is generalized to block-Toeplitz
matrices in [80, 12, 11]. Finally, estimation of a Toeplitz covariance matrix
together with the (non-zero) mean of the random sequence is addressed in
[151].
1
The likelihood equation is the gradient of the likelihood function set to zero.
59
60 3 Estimation of Channel and Noise Covariance Matrices
• Least-squares approximation of sample covariance matrices in the class of

Toeplitz matrices is another approach to include information about their
structure (see [105] and references). This is based on the extended invari-
ance principle (EXIP) introduced by Stoica and Söderström [204], which is
applied to estimation of Toeplitz matrices in [133]. The issue of indefinite
estimates in case of least-squares (LS) approximation is mentioned in [13]
and, briefly, in [133].
Estimation of channel and noise covariance matrices has attracted little
attention in wireless communications. In context of CDMA systems an es-
timator for the channel covariance matrix is presented in [33]. Also for a
CDMA system, estimators for the channel and the noise covariance matrix
are obtained solving a system of linear equations in [228] which generally
yields indefinite estimates. A first solution to the ML problem for estimating
the channel covariance matrix and noise variance is given in [60]. It is based
on the “expectation-conditional maximization either algorithm”. A different
approach based on the space-alternating expectation maximization (SAGE)
algorithm, which also covers the case of spatially correlated noise, is intro-
duced in [220]. The estimate of the noise variance in this approach is superior
to [60]. The details of this iterative ML solution and a comparison with [60]
are given in Section 3.3 and 3.6, respectively. A computationally less com-
plex estimator based on LS approximation of the sample covariance matrix
and the EXIP of [204] yields indefinite estimates of the channel covariance
matrix for a small number of observations [54]; this can be avoided includ-
ing a positive semidefinite constraint, which results in an efficient algorithm
[103, 54] (the least-squares criterion was proposed independently by [41] for
uncorrelated noise). A computationally very efficient and recursive implemen-
tation together with the MMSE channel estimator is possible if the positive
semidefinite constraint is not included in the optimization [141].
The previous references address estimation of channel correlations in the
space and delay dimensions. But except identification of a classical autore-
gressive model for the temporal channel correlations [67] or estimation of the
maximum Doppler frequency for a given class of autocovariance sequences
[211], no results are known to the author which estimate the temporal cor-
relations in wireless communications. We also address this problem in this
chapter.
Another interesting problem in estimation of covariance matrices is a sce-
nario where only some elements of the covariance matrix can be inferred
from the observations. The remaining unknown elements are to be completed.
Consider a Toeplitz covariance matrix with some unspecified diagonals: Ob-
viously, this is an interpolation problem in the class of autocovariance se-
quences. But it can also be understood as a (covariance) matrix completion
problem. In this field, similar problems have been addressed already: General
conditions for existence and uniqueness of a positive semidefinite completion
are provided in [131, 130]; algorithms for maximum entropy completion are
derived by Lev-Ari et al.[131].
3 Estimation of Channel and Noise Covariance Matrices 61
When estimating channel and noise correlations we assume stationarity of

the channel and noise process within the observation time. In a wireless chan-
nel the channel parameters are only approximately stationary. This can be
quantified defining a stationarity time [140]. The larger the stationarity time
of the channel the more observations of a periodically transmitted training se-
quence are available to estimate the covariance matrices. On the other hand,
the coherence time of the channel is a measure for the time span between
two approximately uncorrelated channel realizations. The ratio of stationar-
ity time and coherence time gives the approximate number of uncorrelated
channel realizations available for estimation of the covariance matrices. A pro-
found theory for non-stationary channels and the associated stationarity time
is developed by Matz [140] for single-input single-output systems. First steps
towards a generalization to multiple-input multiple-output (MIMO) channels
are presented in [91]. Moreover, only few channel measurement campaigns
address this topic [94, 227]. Based on an ad-hoc and application-oriented
definition of stationarity, Viering [227] estimates that approximately 100 un-
correlated channel realization are available in urban/suburban scenarios and
only 10 realization in an indoor scenario (within a stationarity time period
according to his definition). On the other hand, in an 8 × 8 MIMO system
with 64 channel parameters a channel covariance matrix with 642 = 4096
independent and real-valued parameters has to be estimated. Therefore, it is
necessary to derive computationally efficient estimators which perform well
for a small number of observations.
Outline of Chapter
In Section 3.1 we give a review of previous results on ML estimation of struc-

tured covariance matrices, which provide the foundation for our estimation
problem. The necessary assumptions and the system model for estimation
of the channel and noise covariance matrices are introduced in Section 3.2.
Based on the SAGE algorithm for ML estimation we propose estimators for
the channel correlations in space, delay, and time domain as well as the noise
correlations (Section 3.3). A new approach to the completion of a partially
known or observable band-limited autocovariance sequence is presented in
Section 3.4 with application to MMSE prediction. Efficient estimators based
on LS approximation and the EXIP, which we derive in Section 3.5, provide
almost optimum estimates for the channel correlation in space and delay do-
main as well as the spatial noise correlations. Finally, the performance of all
estimators is compared in Section 3.6.
3.1 Maximum Likelihood Estimation of Structured

Covariance Matrices: A Review
3.1.1 General Problem Statement and Properties
Given B statistically independent observations x[q], q = 1, 2, . . . , B, of the

M -dimensional random vector x, we would like to estimate its covariance
matrix C x = E[xx H ] ∈ M ⊆ SM + . If the set of admissible covariance matrices
M is a proper subset of the set of all positive semidefinite matrices SM +,0
(M ⊂ SM+,0 ), we call it a structured covariance matrix.
Assuming x ∼ Nc (0M , C x ) the Maximum Likelihood (ML) method maxi-
mizes the likelihood for given observations {x[q]}B q=1
B
Y
Ĉ x = argmax px (x[q]; C x ) . (3.1)
C x ∈M q=1
It is often chosen due to its asymptotic properties:

• The estimate Ĉ x is asymptotically (for large B) unbiased and consistent.
• The estimate Ĉ x is asymptotically efficient, i.e., the variances for all matrix
elements achieve the Cramér-Rao lower bound [168, 124].
For the zero-mean complex Gaussian assumption, the log-likelihood func-
tion reads (A.1)
"B #
Y
ln px (x[q]; C x ) = −M B ln[π] − B ln[det C x ] − B tr C̃ x C −1
x . (3.2)
q=1
PB
The sample covariance matrix C̃ x = B1 q=1 x[q]x[q]H is a sufficient statistic
for estimating C x .
The properties of optimization problem (3.1) for different choices of M and
assumptions on C̃ x , which can be found in the literature, are summarized in
the sequel:
1. If C̃ x ∈ SM M
+ , i.e., B ≥ M necessarily, and M ⊆ S+,0 is closed, then (3.1)
has positive definite solutions Ĉ x [29]. But if C̃ x is singular, the existence
of a positive definite solution is not guaranteed, although very often (3.1)
will yield a positive definite estimate (see Section 3.1.2).
In [81, 78, 149] conditions are given under which the likelihood func-
tion (3.2) is unbounded above, i.e., the solution of (3.1) is possibly singular.
A necessary condition is that C̃ x must be singular. Moreover, whether the
likelihood function for realizations {x[q]}B q=1 of the random vector x is
unbounded depends on the particular realizations observed, i.e., the prob-
ability of the likelihood to be unbounded may be non-zero.
3.1 Maximum Likelihood Estimation of Structured Covariance Matrices 63
2. If C x does not have structure, i.e., M = SM

+,0 , the solution is unique and
given by the sample covariance matrix [29]
Ĉ x = C̃ x . (3.3)
The likelihood equation in C x follows from the necessary condition that the
gradient of the log-likelihood function w.r.t. C x must be zero and reads
C̃ x = C x . (3.4)
If M = SM M
+,0 , we obtain (3.3). For C x ∈ M ⊂ S+,0 , it does not have a
2
solution in general. Very often (3.4) is only satisfied with probability zero,
but the probability increases with B (see Section 3.5). If C̃ x ∈ M and we
have a unique parameterization of M, then the unstructured ML estimate
Ĉ x = C̃ x is also the ML estimate for the restricted class of covariance
matrices C x ∈ M due to the invariance principle [124, 46].
3. In general, the necessary conditions, i.e., the likelihood equation, for a so-
lution of (3.1) based on the gradient form a nonlinear system of equations.
For example,
PV the necessary conditions for ci in case of a linear structure
C x = i=1 ci S i are [4, 5]

tr C −1 −1
x (C x − C̃ x )C x S i = 0, i = 1, 2, . . . , V. (3.5)
Iterative solutions were proposed based on the Newton-Raphson algorithm

[4] or inverse iterations [29].
4. According to the extended invariance principle (EXIP) for parameter es-
timation [204], an asymptotic (for large B) ML estimate of a structured
covariance matrix in M can be obtained from a weighted least-squares ap-
proximation of the sample covariance matrix C̃ x . For a complex Gaussian
distribution of x (3.2) and ML estimation, the asymptotically equivalent
problem reads [133]
h −1 −1
i
min kC̃ x − C x k2C̃ −1 = min tr (C̃ x − C x )C̃ x (C̃ x − C x )C̃ x .
C x ∈M x C x ∈M
(3.6)
−1
Here, the weighting matrices C̃ x correspond to the inverses
of consistent
estimates for the error covariance matrix of c̃x = vec C̃ x which is C c̃x =
1 T 3
B (C x ⊗ C x ).
In general, the EXIP aims at a simple solution of the ML problem (3.1) by
first extending the set M (here to SM
+,0 ) such that a simple solution (here:
2
It is not the likelihood equation for this constrained set M.
3
Precisely, (3.6) results from (A.15) and the consistent estimate C̃ x of C x which
yield kc̃x − cx k2 −1 = BkC̃ x − C x k2 −1 for cx = vec[C x ].
C̃ c̃ C̃ x
x
C̃ x ) is obtained. In the second step, a weighted least squares approximation

is performed; the weighting matrix describes the curvature of the original
(ML) cost function.
Let us consider a few more examples for classes of covariance matrices M for
which the solution of (3.1) has been studied.
Example 3.1. If the eigenvectors of C x are known, i.e., M = {C x |C x =
U x Λx U H M
x , Λx ∈ D+,0 }, and only the eigenvalues have to be estimated, the
ML solution is Ĉ x = U x Λ̂x U x with [Λ̂x ]i,i = [U H
x C̃ x U x ]i,i .
For circulant covariance matrices C x , which can be defined as
M = {C x |C x ∈ TM −1 M M
+ , C x ∈ T+ } = Tc+ , (3.7)
this estimate is equivalent to the periodogram estimate of the power spectral

density [46] because the eigenvectors of a circulant
√ matrix form the FFT
matrix [U x ]m,n = exp(−j2π(m − 1)(n − 1)/M )/ M , m, n ∈ {1, 2, . . . , M }.
Note that a (wide-sense) periodic random sequence with period M , i.e., its
first and second order moments are periodic [200, p. 408], is described by a
circulant covariance matrix C x . ⊓
⊔
Example 3.2. If C̃ x ∈ SM M −1 M
+ , M ⊆ S+,0 is closed, and {C x |C x ∈ M ∩ S+ } is
convex, then (3.1) has a unique positive definite solution [176].
An example for M with these properties is the set of positive semidefinite
doubly symmetric matrices, i.e., matrices symmetric w.r.t. their main and
anti-diagonal. A subset for this choice of M is the set of positive semidefinite
Toeplitz matrices. But for Toeplitz matrices the set of inverses of Toeplitz
matrices is not a convex set [176]. ⊓
⊔
Example 3.3. For the important class of Toeplitz covariance matrices M =

TM+,0 , no analytical solution to (3.1) is known. In [45] it is proved that (3.2) can
have local maxima for M = 3 and examples are given for M ∈ {3, 4, 5}. The
local maxima may be close to the global optimum. Thus, a good initialization
is required for iterative algorithms. We discuss the question of existence of a
positive definite estimate Ĉ x in the next section. ⊓
⊔
3.1.2 Toeplitz Structure: An Ill-Posed Problem and its

Regularization
For a better understanding of the issues associated with estimation of covari-

ance matrices with Toeplitz structure, we first summarize some important
properties of Toeplitz matrices.
• Consider a Hermitian Toeplitz matrix with first row [c[0], c[1], . . . , c[M −
1]]. The eigenvalues of a Hermitian Toeplitz matrix can be bounded from
below and above by the (essential) infimum and supremum, respectively,

of a function whose inverse discrete Fourier transform yields the matrix
M −1
elements {c[i]}i=0 . If a power spectral density which is zero only on an
interval of measure zero [85], i.e., only at a finite number of points, the
corresponding positive semidefinite Toeplitz matrix is regular. Conversely,
if a positive semidefinite Toeplitz matrix is singular, its elements can be
associated with a power spectral density which is zero on an interval of
measure greater than zero [85]. Random sequences with such a power spec-
tral density are referred to as “deterministic” random sequences (see also
Section 2.3.2). But the following result from Caratheodory is even stronger.
• According to the representation theorem of Caratheodory (e.g. [81, 86])

every positive semidefinite Toeplitz matrix has a unique decomposition
N
X
Cx = pi γ i γ H 2
i + σ IM , (3.8)
i=1
where pi > 0, N < M , σ 2 ≥ 0, and
γ i = [1, exp(jωi ), exp(j2ωi ), . . . , exp(j(M − 1)ωi )]T (3.9)
with frequencies ωi ∈ [−π, π]. For singular C x , we have σ 2 = 0.

For rank[C x ] = R < M , there exists a unique extension of {c[i]}M −1
i=0 on Z
composed of R complex exponentials [6, 163] which is a valid covariance
sequence. The associated power spectral density consists only of N discrete
frequency components {ωi }N i=1 .
4
From the both properties we conclude: Assuming that C x describes a stochas-

tic process which is not composed by fewer than M complex exponentials or
is a regular (non-deterministic) stochastic process, we would like to ensure a
positive definite estimate Ĉ x .
For estimating positive definite covariance matrix C x with Toeplitz struc-
ture, several theorems were presented by Fuhrmann et al.[81, 78, 79, 149, 77].
The relevant results are reviewed in the sequel.
Theorem 3.1 (Fuhrmann [77]). For statistically independent {x[q]}B q=1
with x ∼ N (0M , C x ) and C x ∈ TM + , the ML estimate (3.1) of C x yields
a positive definite solution if and only if B ≥ ⌈M/2⌉.
If M ≥ 3, we already need B ≥ 2 statistically independent observations.
But if C x contains the samples of the temporal autocovariance function of a
random process, the adequate choice to describe the observations is B = 1
because successive samples of the random process are correlated and sta-
tistically independent observations are often not available. The next theo-
4
This theorem has been, for example, in array processing and spectral estimation
[203, 105].
rem treats this case for the example of estimating the very simple choice of
C x = cI M , which is a Toeplitz covariance matrix.
Theorem 3.2 (Fuhrmann and Barton [79]). For B = 1, x[1] ∈ CM , and
C x = cI M the ML estimate is given by (3.1) with M = TM 5
+,0 . The probability
P that the ML estimate is positive definite is upper bounded as
2−(M −2)
P≤ . (3.10)
(M − 1)!
As the length M of the data vector increases we get
lim P = 0. (3.11)
M →∞
The bound on the probability is one for M = 2 as required by the previous

theorem. For large data length M , it is very likely that we do not obtain
a positive definite estimate, i.e., even in the asymptotic case M → ∞ the
estimate of the covariance matrix does not converge to the true positive
definite covariance matrix.6
For a real-valued random sequence x[q] ∈ RM , more general results are
available:
Theorem 3.3 (Fuhrmann and Miller [81, 78]). For B = 1 and x[q] ∼
N (0M , C x ) with circulant C x ∈ TMc+ , the probability P that the ML esti-
mate (3.1) is positive definite for M = TM +,0 is bounded by
1
P≤ . (3.12)
2M −2
For C x with general Toeplitz structure, a similar result is given in [81], but
the bound decreases more slowly with M . Due to the closeness of Toeplitz to
circulant matrices [81, 85] it can be expected that the probability of having
a positive definite estimate for C x ∈ TM M
+,0 and M = T+,0 does not deviate
significantly from the bound given for circulant covariance matrices.
The ML problem (3.1) is typically ill-posed because the probability of
obtaining the required positive definite solution becomes arbitrarily small
for increasing M . Thus, a regularization of the problem becomes necessary.
Fuhrmann and Miller [81] propose to restrict the set M from Toeplitz matrices
TM+,0 to contain only M × M Toeplitz matrices with positive semidefinite
circulant extension of period M̄ > M . We denote this set as TM M
ext+,0 ⊂ T+,0 .
By this definition we mean that for all C x ∈ TM ext+,0 a positive semidefinite
circulant matrix in TM̄
c+,0 exists whose upper-left M × M block is C x . With
M
this constrained set Text+,0 the ML estimate (3.1) has the desired positive
definite property as stated by the next theorem.
5
Although cI M ⊂ TM M
+,0 , c ≥ 0, optimization is performed over M = T+,0 .
6
Note that the bound decreases exponentially with M .
Theorem 3.4 (Fuhrmann and Miller [81]). For B ≥ 1 and x[q] ∼

N (0M , C x ) with C x ∈ SM+ , the probability P that the ML estimate (3.1)
is positive definite for M = TM
ext+,0 is one.
For this reason, M = TM M

ext+,0 is a good choice in (3.1) if C x ∈ T+ .
For example, the minimal (Hermitian) circulant extension with period
M̄ = 2M − 1 = 5 of the Toeplitz matrix
 
c[0] c[1] c[2]
C x =  c[1]∗ c[0] c[1]  (3.13)
c[2]∗ c[1]∗ c[0]
is given by
 
c[0] c[1] c[2] c[2]∗ c[1]∗
 c[1]∗ c[0] c[1] c[2] c[2]∗ 
 ∗
 5
 c[2]
C x̄ =  c[1]∗ c[0] c[1] c[2]  ∈ Tc , (3.14)
 c[2] c[2]∗ c[1]∗ c[0] c[1] 
c[1] c[2] c[2]∗ c[1]∗ c[0]
which is indefinite in general. Choosing M̄ > 2M − 1 the circulant extension

is not unique, e.g., generated padding with zeros. The question arises, for
which M̄ a positive semidefinite circulant extension exists and how it can
be constructed. This will also give an answer to the question, whether we
introduce a bias when restricting M in this way.
The eigenvalue decomposition of the M̄ × M̄ circulant matrix C x̄ is
C x̄ = V DV H , (3.15)
where D = diag[d1 , d2 , . . . , dM̄ ] is a diagonal matrix with nonnegative ele-

T
ments and the columns of V = [V T T
M , V̄ M ] , with [V ]k,ℓ = exp(−j2π(k −
M ×M̄
1)(ℓ − 1)/M̄ ) and V M ∈ C , are the eigenvectors. Therefore, we can
represent all C x ∈ TM
ext+,0 as
M̄
X
H
C x = V M DV M = dℓ v M,ℓ v H
M,ℓ , (3.16)
ℓ=1
where v M,ℓ = [1, exp(−j2π(ℓ − 1)/M̄ ), . . . , exp(−j2π(M − 1)(ℓ − 1)/M̄ )]T is

the ℓth column of V M . This parameterization shows a close relation to (3.8)
from the theorem of Caratheodory: Here, we have ωℓ = −2π(ℓ − 1)/M̄ and
M̄ > M , i.e., a restriction to a fixed set of equidistant discrete frequencies.
The extension of x to dimension M̄ is denoted x̄ ∈ CM̄ , where the ele-
ments of x̄ form a period of a periodic random sequence with autocovariance
M̄ −1
sequence c[ℓ]. {c[ℓ]}ℓ=0 is the extension of {c[ℓ]}M −1
ℓ=0 to one full period of
the periodic autocovariance sequence of the elements in x.7
Dembo et al.[47] and Newsam et al.[155] give sufficient conditions on C x
to be element of TM ext+,0 : They derive lower bounds on M̄ which depend on
the observation length M and the minimum eigenvalue of C x for positive
definite C x . The bound increases with decreasing smallest eigenvalue and for
a singular C x a periodic extension does not exist. Their constructive proofs
also provide algorithms which construct an extension. But the extension of
{c[ℓ]}M −1
ℓ=0 is not unique. And if the bounds are loose, the computational
complexity of the corresponding ML estimator increases. It has not been
addressed in the literature, how a positive semidefinite circulant extension
can be constructed or chosen that would be a good initialization for the
algorithms in Section 3.3. We address this issue at the end of Sections 3.3.3
and 3.3.5.
3.2 Signal Models for Estimation of Channel and Noise

Covariance Matrices
For the purpose of estimating channel and noise covariance matrices, we

define the underlying signal model and its probability distribution in this
section. It follows from the detailed derivations presented in Section 2.1.
Therefore, we only summarize the necessary definitions in the sequel.
The observation in block q is (cf. (2.4))
y [q] = Sh[q] + n[q] ∈ CM N , (3.17)

T
where S = S̄ ⊗ I M contains the training sequence in a wireless communi-
cation system.8 As in (2.2) and (2.3) we partition it into
n[q] = [n[q, 1]T , n[q, 2]T , . . . , n[q, N ]T ]T (3.18)
and equivalently for y [q].

The zero-mean additive noise n[q] is complex Gaussian distributed with
covariance matrix
C n = I N ⊗ C n,s , (3.19)
i.e., uncorrelated in time and correlated over the M receiver dimensions with
C n,s = E[n[q, n]n[q, n]H ]. As as special case, we also consider spatially and
temporally uncorrelated noise with C n = cn I M .
7
[c[0], c[1], . . . , c[M − 1]] is the first row of C x .
8
In general the rows of S̄ ∈ CK(L+1)×N form a (known) basis of the signal subspace
in CN (compare with (2.3)).
3.2 Signal Models 69
Summarizing the observations of B blocks, we obtain
yT [q] = S T hT [q] + nT [q] ∈ CBM N (3.20)
with definitions yT [q] = [y [q − ℓ1 ]T , y [q − ℓ2 ]T , . . . , y [q − ℓB ]T ]T , nT [q] =

[n[q−ℓ1 ]T , n[q−ℓ2 ]T , . . . , n[q−ℓB ]T ]T , hT [q] = [h[q−ℓ1 ]T , h[q−ℓ2 ]T , . . . , h[q−
ℓB ]T ]T ∈ CBP , and S T = I B ⊗ S as in (2.8) and (2.9). We assume that
the observations are sorted according to ℓi < ℓj for i < j, e.g., ℓi = i.
Furthermore, it follows that the noise covariance matrix is C nT = I BN ⊗C n,s .
The channel parameters hT [q] are assumed to be zero mean for simplicity.
The channel covariance matrix C hT is block Toeplitz, i.e, C hT ∈ TP,B +,0 , if
9
the observations h[q−ℓi ] are equidistant. More detailed models are discussed
in Section 2.1.
For estimation of the channel and noise covariance matrix, we focus on
two models for the observations and the corresponding channel covariance
matrix:
1. Model 1: I statistically independent observations of yT [q] are available.
As an example, we choose ℓi = i and q ∈ OT = {1, B + 1, . . . , (I − 1)B + 1}
in the sequel if not stated otherwise. Thus, we assume that adjacent ob-
servations of frames containing B blocks are statistically independent. If
a frame yT [q] containing B blocks is not well separated from the next
frame yT [q + 1], this assumption serves as an approximation to reduce
the complexity of estimation algorithms in the next section. The associ-
ated channel covariance matrix C hT to be estimated is block Toeplitz as
in (2.10).
2. Model 2: This is a special case of model 1. Although some aspects and
derivations will be redundant, we introduce it with its own notation due to
its importance for estimating the channel correlations in space and delay
domain. Estimation of the covariance matrices is based on B statistically
independent observations of y [q] (3.17) for q ∈ O = {1, 2, . . . , B} (I = 1).
This corresponds to the assumption of a block diagonal covariance matrix
C hT = I B ⊗C h with identical blocks due to stationarity of the channel. We
would like to estimate the covariance matrix C h which has no additional
structure besides being positive semidefinite.
For ML estimation, the observations {yT [q]}Iq=1 and {y [q]}B
q=1 are zero-
mean and complex Gaussian. The log-likelihood function for model 1 reads
(cf. (A.1))
9
ℓi − ℓi−1 is constant for all 2 ≤ i ≤ B.
Fig. 3.1 System model for estimation of C h and C n,s with hidden data spaces h [q]
and n [q] (Section 3.3.1); transformation of the observation y [q] with T yields the
sufficient statistic ĥ [q] and n̄ [q].
 
Y
LyT ({y T [q]}q∈OT ; C hT , C n,s ) = ln pyT (y T [q]; C hT , C nT )
q∈OT

= −IBM N ln π − I ln det S T C hT S H T + C nT
h −1 i
− Itr C̃ yT S T C hT S H
T + C nT (3.21)
with C nT = I BN ⊗ C n,s and a sufficient statistic given by the sample covari-

ance matrix
1 X
C̃ yT = y T [q]y T [q]H . (3.22)
I
q∈OT
The covariance matrix C yT of the observation has the structure C yT =

S T C hT S H
T + I N ⊗ C n,s .
For model 2, we have
 
Y
Ly ({y[q]}q∈O ; C h , C n,s ) = ln py (y[q]; C h , C n )
q∈O

= −BM N ln π − B ln det SC h S H + C n
h −1 i
− Btr C̃ y SC h S H + C n (3.23)
with C n = I N ⊗ C n,s and sufficient statistic
1 X
C̃ y = y[q]y[q]H . (3.24)
B
q∈O
For derivation of the algorithms and to obtain further insights, we also

require alternative sufficient statistics t[q], q ∈ O, and tT [q], q ∈ OT , where
the former is defined as
3.2 Signal Models 71

ĥ[q]
t[q] = = T y [q], q ∈ O, (3.25)
n̄[q]
with the invertible transformation (Figure 3.1)

†
S
T = . (3.26)
AH
This transformation is introduced to separate signal and noise subspace.

ĥ[q] = ĥML [q] is the ML estimate (2.31) of h[q] for the model in Section 2.2.2.
Defining A ∈ CM N ×M N̄ such that it spans the space orthogonal to the col-
umn space of S, i.e., AH S = 0M N̄ ×P , the signal n̄[q] = AH y [q] = AH n[q] ∈
CM N̄ only contains noise. The columns of A are a basis for the M N̄ di-
mensional noise subspace (N̄ = N − K(L + 1)) and are chosen such that
T T
AH A = I M N̄ . From the structure of S = S̄ ⊗ I M , it follows A = Ā ⊗ I M
H
and S̄ Ā = 0K(L+1)×N̄ .
The sufficient statistic for model 1 is defined equivalently by

ĥT [q]
tT [q] = = T T yT [q], q ∈ OT , (3.27)
n̄T [q]
where

S †T I B ⊗ S†
TT = = ∈ CBM N ×BM N (3.28)
AH T I B ⊗ AH
and its properties, e.g., AH

T S T = 0BM N̄ ×BP , follow from above.
The log-likelihood function of {tT [q]}q∈OT is
 
Y
LtT ({tT [q]}q∈OT ; C hT , C n,s ) = ln ptT (tT [q]; C hT , C n,s )
q∈OT
X n o
= ln pĥT ĥT [q]; C hT , C n,s + ln pn̄T (n̄T [q]; C n,s )
q∈OT
h ∗ T
i
= −IBP ln π − I ln det C hT + I B ⊗ (S̄ S̄ )−1 ⊗ C n,s
−1
∗ T
− Itr C̃ ĥT C hT + I B ⊗ (S̄ S̄ )−1 ⊗ C n,s
− IBM N̄ ln π − I ln det[I B N̄ ⊗ C n,s ]

− Itr C̃ n̄T I B N̄ ⊗ C −1
n,s (3.29)
with sample covariance matrices

1 X
C̃ ĥT = ĥT [q]ĥT [q]H (3.30)
I
q∈OT
and
1 X
C̃ n̄T = n̄T [q]n̄T [q]H . (3.31)
I
q∈OT
For the second model, we get

 
Y
Lt ({t[q]}q∈O ; C h , C n,s ) = ln pt (t[q]; C h , C n,s ) (3.32)
q∈O
X
= ln pĥ ĥ[q]; C h , C n,s + ln pn̄ (n̄[q]; C n,s )
q∈O
h ∗ T
i
= −BP ln π − B ln det C h + (S̄ S̄ )−1 ⊗ C n,s
−1
∗ T
− Btr C̃ ĥ C h + (S̄ S̄ )−1 ⊗ C n,s
− BM N̄ ln π − B ln det[I N̄ ⊗ C n,s ]

− Btr C̃ n̄ I N̄ ⊗ C −1
n,s (3.33)
with
1 X
C̃ ĥ = ĥ[q]ĥ[q]H (3.34)
B
q∈O
and
1 X
C̃ n̄ = n̄[q]n̄[q]H . (3.35)
B
q∈O
3.3 Maximum Likelihood Estimation
The problem of estimating the channel and noise covariance matrix can be
formulated in the framework of estimation of a structured covariance matrix
reviewed in Section 3.1: The covariance matrix of the observations {y T [q]}Iq=1
is structured and uniquely parameterized by the channel covariance matrix
C hT and noise covariance matrix C n,s as C yT = S T C hT S H
T +I N ⊗C n,s (3.21)
(Model 1). Our goal is not necessarily a more accurate estimation of C yT , but
we are mainly interested in its parameters C hT and C n,s . The corresponding
ML problem is
3.3 Maximum Likelihood Estimation 73
ML ML
{Ĉ hT , Ĉ n,s } = argmax LyT ({y T [q]}q∈OT ; C hT , C n,s )
C hT ∈TP,B M
+,0 ,C n,s ∈S+,0
= argmax LtT ({tT [q]}q∈OT ; C hT , C n,s ) (3.36)

C hT ∈TP,B M
+,0 ,C n,s ∈S+,0
with the log-likelihood functions of the direct observations {y T [q]}q∈OT and

their transformation {tT [q]}q∈OT (3.27) given in (3.21) and (3.29).
For model 2, the ML problem is based on (3.23) and (3.33) and reads
ML ML
{Ĉ h , Ĉ n,s } = argmax Ly ({y[q]}q∈O ; C h , C n,s )
C h ∈SP M
+,0 ,C n,s ∈S+,0
= argmax Lt ({t[q]}q∈O ; C h , C n,s ). (3.37)

C h ∈SP M
+,0 ,C n,s ∈S+,0
From the discussion in Section 3.1.1 (the necessary conditions (3.5) in par-
ticular), it seems that an explicit solution of these optimization problems is
impossible. Alternatives are the numerical solution of the system of nonlin-
ear equations (3.5) [5] or a numerical and iterative optimization of the un-
constrained optimization problems [124]. Numerical optimization techniques
create a sequence of optimization variables corresponding to monotonically
increasing values of the log-likelihood function.
3.3.1 Application of the Space-Alternating Generalized

Expectation Maximization Algorithm
One important iterative method is the expectation maximization (EM) algo-

rithm [142]: It ensures convergence to a local maximum of the log-likelihood
function. Typically, its formulation is intuitive, but this comes at the price
of a relatively slow convergence [68]. It relies on the definition of a complete
data space, whose complete knowledge would significantly simplify the opti-
mization problem. The complete data space is not fully accessible. Only an
incomplete data is available, which is a non-invertible (deterministic) func-
tion of the complete data space. Thus, in the expectation step (E-step) of the
EM algorithm a conditional mean (CM) estimate of the complete data log-
likelihood function is performed followed by the maximization step (M-step)
maximizing the estimated log-likelihood function.
A generalization of the EM algorithm is the space-alternating general-
ized expectation maximization (SAGE) algorithm [68, 142], which aims at
an improved convergence rate. Its key idea is the definition of (multiple)
hidden-data spaces generalizing the concept of complete data. The observa-
tion may be a stochastic function of a hidden-data space. The iterative proce-
dure chooses a subset of the parameters to be estimated and the correspond-
ing hidden-data spaces for every iteration and performs an E- and M-Step
Fig. 3.2 SAGE iterations selecting a hidden data space (HDS) in every iteration for
Model 2 (similarly for model 1).
for the chosen hidden-data. We are completely free in choosing a suitable se-
quence of hidden-data spaces. This can improve convergence rate. Generating
a monotonically increasing sequence of log-likelihood values and estimates, it
converges asymptotically to a local maximum of the log-likelihood function.
First we have to define suitable hidden data spaces for our estimation prob-
lems, which result in simple M-Steps. We start with the following Gedanken-
experiment for model 2:10 If h[q] and n[q] were observable for q ∈ O, the
corresponding ML problems would be
Y
max ph (h[q]; C h ) (3.38)
C h ∈SP
+,0 q∈O
and
B
YY
max pn[q,n] (n[q, n]; C n,s ) (3.39)
C n,s ∈SM
+,0 q∈O n=1
with n[q] = [n[q, 1]T , n[q, 2]T , . . . , n[q, N ]T ]T (3.18). Due to the complex
Gaussian pdf the solution to both problems is simple and given by the sample
covariance matrices of the corresponding (hypothetical) observations
1 X
C̃ h = h[q]h[q]H , (3.40)
B
q∈O
N
1 XX
C̃ n,s = n[q, n]n[q, n]H , (3.41)
BN n=1
q∈O
P PN
or c̃n = BN1 M q∈O n=1 kn[q, n]k22 for spatially uncorrelated noise.
We define the following two hidden-data spaces (HDS) for model 2 :
• HDS 1: The random vector h[q] is an admissible HDS w.r.t. C h be-
cause py ,h (y[q], h[q]; C h , C n,s ) = py |h[q] (y[q]|h[q]; C n,s ) ph (h[q]; C h ), i.e.,
the conditional distribution is independent of C h [68].
• HDS 2: The random vector n[q] is an admissible HDS w.r.t. C n,s (or cn )
because py ,n (y[q], n[q]; C h , C n,s ) = py |n[q] (y[q]|n[q]; C h ) pn (n[q]; C n,s ),
i.e., the conditional distribution is independent of C n,s [68].
For model 1, we will slightly extend the definition of HDS 1 in Section 3.3.3
to be able to incorporate the block-Toeplitz structure of C hT , but the general
algorithm introduced below is still applicable.
(0) (0)
We start with an initialization C h and C n,s . Every iteration of the SAGE
algorithm to solve the ML problem (3.37) (and (3.36) for model 1) consists
of three steps (Figure 3.2):
1. Choose a HDS for iteration i + 1. We propose three alternatives:
a. Start with HDS 1 and then alternate between HDS 1 and HDS 2. Alter-
nating with period Niter = 1 may result in an increased estimation error
(i)
for C n,s , since the algorithm cannot distinguish sufficiently between the
10
The ML problem for model 2 is simpler than for model 1 as C h does not have any
special structure.
correlations in y [q] origining from the signal and noise, respectively, and
is trapped in a local maximum.
b. Choose HDS 1 in the first Niter iterations, then switch to HDS 2 for Niter
iterations. Now alternate with this period. This has the advantage that
(N )
the estimate C h iter of C h has already improved considerably before
(0)
we start improving C n,s . The resulting local maximum of the likelihood
function corresponds to a better estimation accuracy.
c. Choose HDS 1 and HDS 2 simultaneously, which is similar to the orig-
inal EM algorithm as we do not alternate between the HDSs. The con-
vergence is slower than for the first variation.
2. E-step: Perform the CM estimate of the log-likelihood for the selected HDS
(i) (i)
given C h and C n,s from the previous iteration.
3. M-step: Maximize the estimated log-likelihood w.r.t. C h for HDS 1 and
C n,s for HDS 2.
The algorithm is shown in Figure 3.2, where we choose a maximum number
of iterations imax to have a fixed complexity. In the following subsections we
derive the E- and M-step for both hidden data spaces explicitly: We start with
HDS 2 for the noise covariance matrix C n,s or variance cn , respectively. For
problem (3.36) and model 1, we give the iterative solution in Section 3.3.3. As
a special case, we treat model 1 with P = 1 in Section 3.3.5. In Section 3.3.4
we present both steps for estimating the channel correlations based on HDS 1,
model 2, and problem (3.37).
3.3.2 Estimation of the Noise Covariance Matrix
The HDS 2 is defined as nT [q] and n[q] for model 1 and model 2, respectively.
We derive the E- and M-step starting with the more general model 1.
Model 1
The log-likelihood function of the HDS 2 is

B X
X X N
LnT ({nT [q]}q∈OT ; C n,s ) = ln pn (n[q − b, n]; C n,s ) (3.42)
q∈OT b=1 n=1

= IBN −M ln π − ln det[C n,s ] − tr C̃ n,s C −1
n,s
with sample covariance matrix

B N
1 X XX
C̃ n,s = n[q − b, n]n[q − b, n]H (3.43)
IBN n=1
q∈OT b=1
based on the temporal samples n[q, n] of the noise process, which are defined
in (3.18).
The CM estimate of the log-likelihood function (3.42)
h i
(i)
EnT LnT ({nT [q]}q∈OT ; C n,s )|{y T [q]}q∈OT ; C hT , C (i)
n,s (3.44)
forms the E-step in a SAGE iteration based HDS 2. It results in the CM

estimate of the sample covariance matrix (3.43)
h i
(i)
En C̃ n,s |{y T [q]}q∈OT ; C hT , C (i)
n,s , (3.45)
which now substitutes C̃ n,s in (3.42). As n[q −b, n] = J b,n nT [q] with selection
matrix J b,n = eT
(b−1)N +n ⊗ I M ∈ {0, 1}
M ×BM N
and the definition of OT in
Section 3.2, we obtain
h i
(i) (i)
En n[q − b, n]n[q − b, n]H |y T [q]; C hT , C (i) T
n,s = J b,n RnT |y T [q] J b,n (3.46)
for every term in (3.43). With (A.2) and (A.3) the conditional correlation
matrix reads
h i
(i) (i) (i) (i) (i)
RnT |yT [q] = EnT nT [q]nT [q]H |y T [q]; C hT , C (i) H
n,s = n̂T [q]n̂T [q] + C nT |y T [q] ,
(3.47)
where
h i
(i) (i)
n̂T [q] = EnT nT [q]|y T [q]; C hT , C (i)
n,s
= C (i) (i),−1
nT yT C yT y T [q] = W (i)
nT y T [q] (3.48)
is the CM estimate of the noise nT [q] based on the covariance matrices from
the previous iteration i. With the definitions in Section 3.2 we have
(i)
C (i) H (i)
yT = S T C hT S T + I BN ⊗ C n,s
C (i) (i) (i)

nT yT = C nT = I BN ⊗ C n,s .
The conditional covariance matrix is (cf. (A.3))

(i)
C nT |yT [q] = C (i) (i) (i),H (i) (i) (i),−1 (i)
nT − W nT C nT yT = C nT − C nT C yT C nT . (3.49)
Maximization of the CM estimate of the log-likelihood for the HDS 2

in (3.44) leads to the improved estimate of C n,s in step i + 1, which is the
CM estimate of the sample covariance matrix (3.45)
B N
1 X XX (i)
C (i+1)
n,s = J b,n RnT |yT [q] J T
b,n
IBN n=1
q∈OT b=1
(3.50)
B N
1 XX (i)
= J b,n W (i) (i),H
nT C̃ yT W nT + C nT |yT [q] J T
b,n
BN n=1
b=1
(i)
with sample covariance matrix of the observations C̃ yT (3.22) and W nT
from (3.48).
Assuming spatially white noise the estimate for the noise variance is
1 h i
(i)
c(i+1)
n = En tr C̃ n,s |{y T [q]}q∈OT ; C hT , c(i)
n
M
1 X h (i) i
= tr RnT |yT [q]
IBM N
q∈OT
1 h i
(i)
= tr W (i) (i),H
nT C̃ yT W nT + C nT |yT [q] (3.51)
BM N
1
P PB 2
because tr[C̃ n,s ] = IBN q∈OT b=1 kn[q − b]k2 .
The final estimates are given after iteration imax of the SAGE algorithm
Ĉ n,s = C (i
n,s
max )
ĉn = c(i
n
max )
. (3.52)
Model 2
Model 2 is a special case of model 1 for I = 1 and C hT = I B ⊗ C h . The

log-likelihood function of HDS 2 is
N
XX
Ln ({n[q]}q∈O ; C n,s ) = ln pn (n[q, n]; C n,s )
q∈O n=1

= −BN M ln π − BN ln det[C n,s ] − BN tr C̃ n,s C −1
n,s
(3.53)
with sample covariance matrix

N
1 XX
C̃ n,s = n[q, n]n[q, n]H . (3.54)
BN n=1q∈O
Maximization of the CM estimate from the E-step

h i
(i)
En Ln ({n[q]}q∈O ; C n,s )|{y[q]}q∈O ; C h , C (i)
n,s (3.55)
leads to the estimates in iteration i + 1

h i N
(i) 1 XX (i)
C (i+1)
n,s = En C̃ n,s |{y[q]}q∈O ; C h , C (i)
n,s = J n Rn|y[q] J T
n,
BN n=1 q∈O
(3.56)
1 h i 1 X h i
(i) (i)
c(i+1)
n = En tr C̃ n,s |{y[q]}q∈O ; C h , c(i)
n = tr Rn|y[q]
M BM N
q∈O
(3.57)
with n[q, n] = J n n[q] and J n = eT n ⊗ I M ∈ {0, 1}

M ×M N
. The conditional
correlation matrix based on the previous iteration i reads
h i
(i) (i) (i) (i) (i)
Rn|y[q] = En n[q]n[q]H |y[q]; C h , C (i) H
n,s = n̂ [q]n̂ [q] + C n|y[q] (3.58)
with
h i
(i)
n̂(i) [q] = En n[q]|y[q]; C h , C (i)
n,s
= C (i) (i),−1
ny C y y[q] = W (i)
n y[q] (3.59)
(i)
C n|y[q] = C (i)
n − W (i) (i),H
n C ny = C (i)
n − C (i)
n Cy
(i),−1 (i)
Cn (3.60)
and
C (i) (i)
n = I N ⊗ C n,s
C (i) (i)
ny = C n
(i)
C (i) H (i)
y = SC h S + C n .
In summary the improved estimates in iteration i + 1 read

N
1 X (i)
C (i+1)
n,s = J n W (i)
n C̃ y W n
(i),H
+ C n|y[q] J T
n, (3.61)
N n=1
1 h i
(i)
c(i+1)
n = tr W (i)
n C̃ y W (i),H
n + C n|y[q] (3.62)
MN
with sample covariance matrix C̃ y (3.24).
Initialization
We propose to initialize the SAGE algorithm with the ML estimate of the

noise covariance matrix C n,s based on the observations in the noise subspace
{n̄T [q]}q∈OT (3.27). With (3.29) this optimization problem reads
B
X X
C (0)
n,s = argmax ln pn̄ (n̄[q − b]; C n,s )
C n,s ∈SM
+,0 q∈OT b=1

= argmax IB N̄ − ln det[C n,s ] − tr C̃ n̄,s C −1
n,s , (3.63)
C n,s ∈SM
+,0
where we used det[I B N̄ ⊗ C n,s ] = (det C n,s )B N̄ (A.13) in the last step. Be-
cause C n,s ∈ SM+,0 , its maximization yields the sample covariance matrix of
the noise in the noise subspace
B N̄
1 X XX
C (0)
n,s = C̃ n̄,s = n̄[q − b, n]n̄[q − b, n]H
IB N̄ q∈O b=1 n=1
T
B
1 X X H
= Y [q − b]Ā ĀY [q − b]H
IB N̄ q∈O b=1
T
B
1 X X
= (Y [q − b] − Ĥ[q − b]S̄)(Y [q − b] − Ĥ[q − b]S̄)H .
IB N̄ q∈O b=1
T
(3.64)
Here, we define n̄[q, n] by n̄[q] = [n̄[q, 1]T , n̄[q, 2]T , . . . , n̄[q, N̄ ]T ]T (similar
†
to (3.18)) and the ML channel estimate Ĥ[q] = Ĥ ML [q] = Y [q]S̄ from (2.31)
⊥ H †
with Ĥ ML = unvec ĥML .11 P̄ = Ā Ā = I N − S̄ S̄ is the projector on
† H H
the noise subspace with S̄ = S̄ (S̄ S̄ )−1 .
This estimate is unbiased in contrast to the estimate of the noise covariance
matrix in (2.32). Note that the difference is due to the scaling by N̄ instead
of N .
For spatially white noise, the ML estimation problem is (cf. (3.29))
B
X X
max ln pn̄ (n̄[q − b]; C n,s = cn I M ) . (3.65)
cn ∈R+,0
q∈OT b=1
It is solved by the unbiased estimate of the variance

B X N̄
1 X X
c(0)
n = c̃n = kn̄[q − b, n]k22
IB N̄ M q∈O b=1 n=1
T
(3.66)
which can be written as
11 ∗
Additionally, the relations n̄[q] = AH y[q] = (Ā ⊗I M )y[q] and n̄[q] = vec[N̄ [q]],
y[q] = vec[Y [q]] together with (A.15) are required (Section 3.2).
B
1 X X H
c(0)
n = c̃n = kY [q − b]Ā k2F
IB N̄ M q∈O b=1
T
B
1 X X
= kY [q − b] − Ĥ[q − b]S̄k2F . (3.67)
IB N̄ M q∈O b=1
T
Interpretation
Consider the updates (3.51) and (3.62) for estimation of the noise variance
cn . How do they change the initialization from (3.67), which is based on the
noise subspace only?
The estimate in iteration i + 1 (3.51) can be written as
!!
(i)
(i+1) 1 (i) βcn
cn = M N̄ c̃n + P cn 1− , (3.68)
MN PB
where
h −1 (i),−1 i
(i),−1 (i),−1
β = tr S H S
T T C ĥ
− C ĥ
C̃ C
ĥT ĥ >0 (3.69)
T T T
is the trace of a generalized Schur complement [132]. We observe that the

observations of the noise in the signal subspace are taken into account in
order to improve the estimate of the noise variance (similarly for general
C n,s ).12 h −1 (i),−1 i (i)
The last inequality in β < tr S H
TST C ĥ ≤ P B/cn follows from
T
(i) (i) (i) (i)
C ĥ = C hT + cn (S H
T ST)
−1
omitting C hT . Together with β > 0 this relation
T
(i) (i)
yields a bound on the weight for cn in (3.68): 0 < 1 − βcn /(P B) < 1.
Together with M N = M N̄ + P (by definition) and interpreting (3.68) as
weighted mean we conclude that
c(i+1)
n < c̃n , i > 0, (3.70)
(0)
with initialization cn = c̃n from (3.67).
For a sufficiently large noise subspace, c̃n is already a very accurate es-
timate. Numerical evaluations show that the SAGE algorithm tends to in-
troduce a bias for a sufficiently large size of the noise subspace and for this
(i)
initialization. In this case the MSE of the estimate cn for the noise variance
cn is increased compared to the initialization.
12
The positive semidefinite Least-Squares approaches in Section 3.5.3 show a similar
behavior, but they take into account only a few dimensions of the signal subspace.
3.3.3 Estimation of Channel Correlations in the Time,

Delay, and Space Dimensions
The channel covariance matrix C hT of all channel parameters corresponding

to B samples in time, M samples in space, K transmitters, and for a channel
order L has block Toeplitz structure TP,B +,0 (Model 1 in Section 3.2). Fuhrmann
et al.[80] proposed the EM algorithm for the case that the parameters are
observed directly (without noise). This is an extension of their algorithm for
ML estimation of a covariance matrix with Toeplitz structure [81, 149]. The
idea for this algorithm stems from the observation that the ML problem for
a positive definite covariance matrix with Toeplitz structure is very likely to
be ill-posed, i.e., yields a singular estimate, as presented in Theorems 3.2 and
3.3. They also showed that a useful regularization is the restriction to covari-
ance matrices with positive definite circulant extension of length B̄, which
results in a positive definite estimate with probability one (Section 3.1.2). A
Toeplitz covariance matrix with circulant extension is the upper left block of
a circulant covariance matrix, which describes the autocovariance function
of a (wide-sense) periodic random sequence with period B̄. Moreover, the
ML estimation of a circulant covariance matrix can be obtained explicitly
[46] (see Section 3.1.1). Based on these two results Fuhrmann et al.choose a
full period of the assumed periodic random process as complete data space
and solve the optimization problem iteratively based on the EM algorithm.
In [80] this is extended to block Toeplitz structure with the full period of a
multivariate periodic random sequence as complete data.
We follow their approach and choose a hidden data space for C hT which
extends hT ∈ CBP to a full period B̄ of a P -variate periodic random sequence.
This is equivalent to the assumption of a C hT with positive definite block
circulant extension TP,B ext+,0 . Necessarily we have B̄ ≥ 2B − 1.
13
Formally, the
hidden data space is the random vector

hT [q]
z[q] = ∈ CB̄P , (3.71)
hext [q]
where hext [q] ∈ C(B̄−B)P contains the remaining B̄ − B samples of the P -

variate random sequence. Its block circulant covariance matrix reads
C z = (V ⊗ I P )C a (V H ⊗ I P ), (3.72)
where the block diagonal matrix
13
To ensure an asymptotically unbiased estimate, B̄ has to be chosen such that the
true C hT has a positive definite circulant extension of size B̄. In case P = 1, sufficient
lower bounds on B̄ are given by [47, 155], which depend on the minimum eigenvalue
of C hT . If the observation size I and B are small, unbiasedness is not important, as
the error is determined by the small sample size. Thus, we can choose the minimum
B̄ = 2B − 1.
 
C a[1]

Ca =  .. 
(3.73)
. 
C a[B̄]
is the covariance matrix of a[q] = [a[q, 1]T , a[q, 2]T , . . . , a[q, B̄]T ]T , a[q, n] ∈
CP is given by z[q] = (V ⊗ I P )a[q], and C a[n] = E[a[q, n]a[q, n]H ]. For
V B ∈ CB×B̄ , we define the partitioning of the unitary matrix [V ]k,ℓ =
√1 exp(j2π(k − 1)(ℓ − 1)/B̄), k, ℓ = 1, 2, . . . , B̄,
B̄

VB
V = . (3.74)
V̄ B
The upper left BP × BP block of C z is the positive definite block Toeplitz

matrix
P,B
C hT = (V B ⊗ I P )C a (V H
B ⊗ I P ) ∈ Text+,0 , (3.75)
which we would like to estimate.

The log-likelihood of the hidden data space (3.71) is

Lz ({z[q]}q∈OT ; C z ) = −I B̄P ln π − I ln det[C z ] − Itr C̃ z C −1
z (3.76)
with
P (hypothetical) sample covariance matrix of the hidden data space C̃ z =
1 H
I q∈OT z[q]z[q] . It can be simplified to
B̄
X B̄
X h i

Lz ({z[q]}q∈OT ; C z ) = −I B̄P ln π − I ln det C a[n] − I tr C̃ a[n] C −1
a[n]
n=1 n=1
B̄
X
= La[n] ({a[q, n]}q∈OT ; C a[n] ) (3.77)
n=1
exploiting the block circulant structure of C z , which yields ln det[C z ] =

PB̄
ln det[C a ] = n=1 ln det[C a[n] ] and
B̄ h i
X −1
tr C̃ z C −1
z = tr C̃ a C −1
a = tr C̃ a[n] C a[n] (3.78)
n=1
with C̃ a[n] = J ′n C̃ a J ′T
n and J ′n = [0P ×P (n−1) , I P , 0P ×P (B̄−n) ] ∈
P ×P B̄
{0, 1} . Note that the log-likelihood function (3.77) of {z[q]}q∈OT
is equivalent to the log-likelihood function of the observations
{a[q, n]}q∈OT , n = 1, 2, . . . , B̄, of B̄ statistically independent random
vectors a[q, n].
E- and M-Step
The CM estimate of the likelihood function (3.76) based on estimates from

the previous iteration i is
h i
Ez Lz ({z[q]}q∈OT ; C z )|{tT [q]}q∈OT ; C (i) (i)
z , C n,s =
h h i i
− I B̄P ln π − I ln det[C z ] − Itr Ez C̃ z |{ĥT [q]}q∈OT ; C (i)
z , C (i)
n,s C −1
z .
(3.79)
With (A.2) and (A.3) the CM estimate of the sample covariance matrix C̃ z
of the HDS can be written as
h i
(i)
Ez C̃ z |{ĥT [q]}q∈OT ; C (i) (i) (i) (i),H
z , C n,s = W z C̃ ĥT W z + C z|ĥ [q] (3.80)
T
with sample covariance matrix of the observations C̃ ĥT =

1
P H
I q∈OT ĥT [q]ĥT [q] , conditional mean
h i
ẑ (i) [q] = Ez z[q]|tT [q]; C (i)
z , C (i)
n,s
h i
= Ez z[q]|ĥT [q]; C (i) (i)
z , C n,s (3.81)

I BP (i),−1
= C (i)
z C ĥ ĥT [q], (3.82)
0P (B̄−B)×BP T
| {z }
(i)
Wz
(i) (i) ∗ T (i)

C ĥ = C hT + I B ⊗ (S̄ S̄ )−1 ⊗ C n,s , and conditional covariance matrix
T
(i)
C z|ĥ = C (i) (i) (i)
z − W z [I BP , 0P (B̄−B)×BP ]C z . (3.83)
T [q]
It follows from the definition of tT [q] (3.27) that ĥT [q] and n̄T [q] are statis-
tically independent and the conditioning on tT [q] in (3.79) reduces to ĥT [q]
in (3.80) and (3.81).
We can rewrite (3.79) similarly to (3.77), which corresponds to the es-
timation of B̄ unstructured positive definite covariance matrices C a[n] , n =
1, 2, . . . , B̄. Therefore, maximization of (3.79) results in the estimate of the
blocks

(i+1) (i)
C a[n] = J ′n (V H ⊗ I P ) W (i)z C̃ ĥT W (i),H
z + C z|ĥ [q]
(V ⊗ I P )J ′,T
n (3.84)
T
(i+1) (i+1)
on the diagonal of C a = (V H ⊗ I P )C z (V ⊗ I P ) in iteration i + 1. The
(i+1)
upper left block of C z is the estimate of interest to us:
(i+1)
C hT = (V B ⊗ I P )C (i+1)
a (V H
B ⊗ IP )
= [I BP , 0BP ×P (B̄−B) ]C (i+1)

z [I BP , 0BP ×P (B̄−B) ]T ∈ TP,B
ext+,0 .
(3.85)
After imax iterations the algorithm is stopped and our final estimate is Ĉ hT =
(i )
C hTmax .
Initialization
(0)
The SAGE algorithm requires an initialization C z of C z . We start with an
estimate of C hT (for the choice of OT in Section 3.2)
I
(0) 1 X
C hT = Ĥ[q]Ĥ[q]H ∈ TP,B
+,0 , (3.86)
IB q=1
where
 
ĥ[q − 1] . . . ĥ[q − B]

Ĥ[q] =  .. .. 
∈C
BP ×2B−1
(3.87)
. .
ĥ[q − 1] . . . ĥ[q − B]
(0)
is block Toeplitz. C hT is also block Toeplitz. Its blocks are given by the
estimates of C h [ℓ]
I B−ℓ
(0) 1 XX
C h [ℓ] = ĥ[q − i]ĥ[q − i − ℓ]H , ℓ = 0, 1, . . . , B − 1, (3.88)
IB q=1 i=1
which is the multivariate extension of the standard biased estimator [203, p.

24]. It is biased due to the normalization with IB instead of I(B −ℓ) in (3.88)
and the noise in ĥ[q]. But from the construction in (3.86) we observe that
(0)
C hT is positive semidefinite. For I = 1 and P > 1, it is always singular
because BP > 2B − 1. Therefore, we choose I > 1 to obtain a numerically
stable algorithm.14
(0) (0)
But (3.86) defines only the upper left block of C z . We may choose C z
to be block circulant with first block-row
(0) (0) (0) (0) (0)
[C h [0], C h [1], . . . , C h [B − 1], 0P ×P (B̄−2B+1) , C h [B − 1]H , . . . , C h [1]H ],
(3.89)
14
Assuming a full column rank of Ĥ[q], a necessary condition for positive definite
(0) (0)
C hT is I ≥ P/(2 − 1/B) (at least I ≥ P/2). But non-singularity of C hT is not
required by the algorithm.
which is not necessarily positive semidefinite. This is only a problem if it re-

(i+1) (i)
sults in an indefinite estimate C z (3.84) due to an indefinite C z|ĥ [q]
T
in (3.83). For B̄ = 2B − 1, the block circulant extension is unique. We

now show that it is positive semidefinite which yields a positive semidefi-
(i+1)
nite C z (3.84).
Define the block circulant P (2B − 1) × (2B − 1) matrix

ˆ Ĥ[q]
H̄[q] = (3.90)
Ĥext [q]
with P (B − 1) × (2B − 1) block circulant matrix
Ĥext [q] =
 
ĥ[q − B] 0P ×1 ... 0P ×1 ĥ[q − 1] . . . ĥ[q − B + 1]
 .. .. .. .. 
 . . . .  . (3.91)
ĥ[q − 2] . . . ĥ[q − B] 0P ×1 . . . 0P ×1 ĥ[q − 1]
(0)
The unique block circulant extension of C h [ℓ] (3.86) is
I
1 Xˆ ˆ H (2B−1)P
C (0)
z = H̄[q]H̄[q] ∈ S+,0 , (3.92)
IB q=1
because B̄ = 2B − 1 and Ĥ[q]Ĥ[q]H is the upper left block of H̄[q]ˆ H̄[q]

ˆ H.
15
Due to its definition in (3.92) it is positive semidefinite. This results in the
following theorem.
(0)
Theorem 3.5. For B̄ = 2B − 1, C hT (3.86) has a unique positive semidefi-
(0)
nite block-circulant extension C z given by (3.90) and (3.92).
Therefore, choosing B̄ = 2B − 1 we have a reliable initialization of our itera-
tive algorithm. Its disadvantage is that the minimum B̄ is chosen, which may
be too small to be able to represent the true covariance matrix as a block
Toeplitz matrix with block circulant extension (see [47] for Toeplitz struc-
ture). But the advantage is a reduced complexity and a simple initialization
procedure, which is not as complicated as the constructive proofs in [47, 155]
suggest.
15 ˆ
If H̄[q], q ∈ OT , have full column rank (see [214]), then I ≥ P is necessary for
(0)
non-singularity of C z .
3.3.4 Estimation of Channel Correlations in the Delay

and Space Dimensions
Estimation of C h for model 2 in Section 3.2 is a special case of the deriva-

tions in Section 3.3.3: It is obtained for I = 1 and C hT = I B ⊗ C h (tempo-
rally uncorrelated multivariate random sequence). Similar results are given
by Dogandžić [60] based on the “expectation-conditional maximization either
algorithm” [142] for non-zero mean of h[q] and C n,s = cn I M , which uses a
different estimation procedure for cn than ours given in Section 3.3.2. The
performance is compared numerically in Section 3.6.
The following results were presented first in [220]. Here, we derive them
as a special case of Section 3.3.3. The conditional mean estimate of the log-
likelihood function for the HDS h[q] is
h i
(i)
Eh Lh ({h[q]}q∈O ; C h )|{ĥ[q]}q∈O ; C h , C (i)
n,s . (3.93)
Its maximization w.r.t. C h ∈ SP +,0 in iteration i + 1 yields the condi-

tional
PB mean estimate of the sample covariance matrix for the HDS C̃ h =
1 H
B q=1 h[q]h[q]
h i
(i+1) (i)
Ch = Eh C̃ h |{ĥ[q]}q∈O ; C h , C (i)
n,s
(3.94)
(i) (i),H (i)
= W h C̃ ĥ W h + C h|ĥ[q] .
Due to the definition of t[q] (3.25) it depends only on the ML estimates

PB
ĥ[q] of h[q] via their sample covariance matrix C̃ ĥ = B1 q=1 ĥ[q]ĥ[q]H . The
CM estimator of h[q] and its conditional covariance matrix are defined as
(see (A.3))
∗ T
−1
(i) (i) (i)
W h = Ch C h + (S̄ S̄ )−1 ⊗ C (i)
n,s
(3.95)
(i) (i) (i) (i)
C h|ĥ[q] = C h − W h C h
(i) (i) (i) (i)

with C n = I N ⊗C n,s based on (3.61) or C n = cn I M N from (3.62). Because
(i)
C h|ĥ[q] is positive semidefinite, we ensure a positive semidefinite estimate
(i+1)
Ch in every iteration (assuming a positive semidefinite initialization).
Initialization
The iterations are initialized with the sample mean estimate of the covariance
matrix C h
B
(0) 1 X
C h = C̃ ĥ = ĥ[q]ĥ[q]H , (3.96)
B q=1
which is biased and positive semidefinite (non-singular, if B ≥ P ). In Sec-

tion 3.5.1 it is derived based on least-squares approximation. Its bias stems
from the noise in ĥ[q]. Thus, the only task of SAGE is to remove this bias
(0) (i+1)
from C h with positive definite estimates C h .
(0)
If the SAGE algorithm is initialized with C n,s = 0M ×M and the first
iteration is based on HDS 1, then we obtain our initialization (3.96) in the
first iteration.
An alternative initialization is the estimate from the positive semidefinite
Least-Squares approach in Section 3.5.3.
3.3.5 Estimation of Temporal Channel Correlations
An interesting special case of Section 3.3.3 is P = 1. The ML problem (3.21)

reduces to P independent ML problem of this type (P = 1), if the elements of
h[q] are statistically independent, i.e., we assume a diagonal covariance matrix
C h . This is also of interest for prediction of a multivariate random sequence
with the sub-optimum predictor (2.55), which requires only knowledge of
B + 1 (Q = B) samples of the autocovariance sequence for every parameter.
Thus, C hT has Toeplitz structure with first row [ch [0], ch [1], . . . , ch [B − 1]]
in this case and the results from Section 3.3.3 specialize to the EM algorithm
which is given in [149] for known noise variance cn . The HDS is a full pe-
riod of B̄-periodic random sequence in z[q] ∈ CB̄ with circulant covariance
matrix C z , which leads to a diagonal C a = diag[ca[1] , ca[2] , . . . , ca[B̄] ] with
a[q] = V H z[q], a[q] = [a[q, 1], a[q, 2], . . . , a[q, B̄]]T , and ca[n] = E[|a[q, n]|2 ].
From (3.85) and (3.84) the E- and M-Step for P = 1 read
h i
(i+1) (i)
ca[n] = V H W (i) z C̃ ĥT W (i),H
z + C z|ĥT [q]
V
n,n
h i
(i+1) (i+1) (i+1) (i+1) B̄
Ca = diag ca[1] , ca[2] , . . . , ca[B̄] ∈ D+,0
(3.97)
C (i+1)
z = V C (i+1) H
a V ∈ TB̄
c+,0
(i+1)
C hT = [I B , 0B×(B̄−B) ]C (i+1)
z [I B , 0B×(B̄−B) ]T ∈ TB
ext+,0
(i) (i)
with W z and C z|ĥ as in (3.82) and (3.83).
T [q]
Initialization
(0)
As initialization for C hT we choose (3.86) for P = 1 and I = 1, which is a
positive definite and biased estimate [169, p. 153]. For B̄ = 2B − 1, we obtain
(0)
the unique positive definite circulant extension C z based on Theorem 3.5.16
3.3.6 Extensions and Computational Complexity
In Section 2.1 we discussed different restrictions on the structure of C hT . As

one example, we now choose
K
X
C hT = C T,k ⊗ ek eT BM K
k ⊗ C hk ⊆ S+,0 (3.98)
k=1
for L = 0 (frequency flat channel) and Q = B in (2.11).17 The covariance

matrix is a nonlinear function in C T,k ∈ TB +,0 with Toeplitz structure and
C hk ∈ SM
+,0 . We would like to estimate C hT with this constraint on its struc-
ture.
The advantage of this structure for C hT is a reduction of the number of
unknown (real-valued) parameters from (2B − 1)M 2 K 2 to K(2B − 1 + M 2 )
and a reduction of computational complexity.
Two assumptions yield this structure (compare Section 2.1):
• hk [q] in h[q] = [h1 [q]T , h2 [q]T , . . . , hK [q]T ]T are assumed to be statistically
independent for different k, i.e., E[hk [q]hk′ [q]H ] = C hk δ[k − k ′ ] (C h is
block-diagonal).
• The covariance matrix of hT,k [q] = [hk [q − 1]T , hk [q − 2]T , . . . , hk [q −
B]T ]T ∈ CBM has Kronecker structure, i.e., C hT,k = C T,k ⊗ C hk .
We can extend the definition of the HDS 1 and introduce K differ-
ent HDSs: zk [q] = [hT,k [q]T , hext,k [q]T ]T ∈ CB̄M , k ∈ {1, 2, . . . , K} with
hext,k [q] ∈ CM (B̄−B) are the HDSs w.r.t. C T,k and C hk (compare (3.71)).
Restricting zk to be a full period of a B̄-periodic M -variate random sequence
we get the following stochastic model of zk : The covariance matrix of zk [q]
is C zk = C c,k ⊗ C hk , where C c,k ∈ TB̄c+,0 is circulant and positive semidef-
inite with upper left block C T,k . Thus, we restrict C T,k to have a positive
semidefinite circulant extension in analogy to Sections 3.3.3 and 3.3.5.
To include a Kronecker structure in the M-step is difficult due to the non-
linearity. A straightforward solution would be based on alternating variables:
16
In [47] it is proposed to choose B̄ as an integer multiple of B, which ensures
existence of a positive semidefinite circulant extension.
17
Estimation with other constraints on the structure of C h — as presented in Sec-
tion 2.1 — can be treated in a similar fashion.
Choice Choice Complexity Complexity

of HDS of Model (full) (incl. structure)
HDS 1 Model 1 B3N 3M 3 B3P 3
” Model 1 (Kronecker) KM 3 N 3 + KB 3 KP 3 + KB 2
” Model 2 M 3N 3 P3
” Model 1 (P = 1) B3 B2
HDS 2 Model 1 B3N 3M 3 B3P 3
” Model 2 M 3N 3 P3
Table 3.1 Order of computational complexity of SAGE per iteration for an update
of C hT (HDS 1) and C n,s (HDS 2).
First, optimize the estimated log-likelihood of zk [q] (from E-step) w.r.t. C hk

for all K receivers keeping C c,k fixed; in a second step maximize the estimated
log-likelihood of zk [q] w.r.t. C c,k for fixed C hk from the previous iteration.
Simulations show that alternating between both covariance matrices does
not improve the estimate sufficiently to justify the effort. Therefore, we pro-
pose to maximize the estimated log-likelihood of zk [q] w.r.t. C hk assuming
C c,k = I B̄ and w.r.t. C c,k assuming C hk = I M . Before both estimates are
combined as in (3.98), Ĉ T,k is scaled to have ones on its diagonal.18 The
details of the iterative maximization with SAGE for different k and HDS 1
or 2 follow the argumentation in Sections 3.3.4 and 3.3.5.
To judge the computational complexity of SAGE algorithm for different
assumptions on C hT , we give the order of complexity per iteration in Ta-
ble 3.1: The order is determined by the system of linear equations to compute
(i) (i) (i)
W z (3.82), W h (3.95), and W n (3.48). When the structure of these sys-
tems of equations is not exploited, the (full) complexity is given by the third
column in Table 3.1. The structure can be exploited based on the Matrix
Inversion Lemma (A.19) or working with the sufficient statistic tT [q]. More-
(i)
over, for P = 1, we have a Toeplitz matrix C ĥ , which results in a reduction
T
from B 3 to B 2 if exploited. Application of both strategies results in the order
of complexity given in the last column of Table 3.1.
We conclude that the complexity of estimating C hT assuming only block
Toeplitz structure is large, but a significant reduction is possible for the model
in (3.98). In Section 3.6, we observe that iterations with HDS 2 do not lead
to improved performance; thus, the computationally complexity associated
with HDS 2 can be avoided, if we use the initialization (3.64).
18
The parameterization in (3.98) is unique up to a real-valued scaling.
3.4 Completion of Partial Band-Limited Autocovariance

Sequences
In Section 3.3.5 we considered estimation of B consecutive samples of the

autocovariance sequence ch [ℓ] = E[h[q]h[q − ℓ]∗ ] with ℓ ∈ {0, 1, . . . , B − 1} for
a univariate random sequence h[q]. For known second order moment of the
noise, it is based on noisy observations (estimates) of h[q]
h̃[q] = h[q] + e[q]. (3.99)
All B observations were collected in h̃T [q] = [h̃[q − ℓ1 ], h̃[q − ℓ2 ], . . . , h̃[q −

ℓB ]]T (denoted as ĥT [q] in Section 3.3), where we assumed that ℓi = i, i.e.,
equidistant and consecutive samples were available.
Let us consider the case that ℓi = 2i − 1 as an example, i.e., equidistant
samples with period NP = 2 are available. Moreover, for this choice of ℓi ,
we are given I statistically independent observations h̃T [q] with q ∈ OT =
{1, 2B +1, . . . , (I −1)2B +1}. The corresponding likelihood function depends
on
[ch [0], ch [2], . . . , ch [2(B − 1)]]T ∈ CB (3.100)
and the variance ce of e[q]. We are interested in ch [ℓ] for ℓ ∈ {0, 1, . . . , 2B −1}
but from the maximum likelihood problem we can only infer ch [ℓ] for ℓ ∈
{0, 2, . . . , 2(B − 1)}. Therefore, the problem is ill-posed.
Assuming that ch [ℓ] is a band-limited sequence with maximum frequency
fmax ≤ 0.25, the condition of the sampling theorem [170] is satisfied. In
principle, it would be possible to reconstruct ch [ℓ] for odd values of ℓ from its
samples for even ℓ, if the decimated sequence of infinite length was available.
We would only have to ensure that the interpolated sequence remains positive
semidefinite.
This problem is referred to as “missing data”, “gapped data”, or “missing
observations” in the literature, for example, in spectral analysis or ARMA
parameter estimation [203, 180]. Different approaches are possible:
• Interpolate the observations h̃[q] and estimate the auto-covariance se-
quence based on the completed sequence. This method is applied in spec-
tral analysis [203, p. 259]. If we apply a ML approach to the interpolated
observations, we have to model the resulting error sequence which now
also depends on the autocovariance sequence ch [ℓ]. This complicates the
problem, if we do not ignore this effect.
• Apply the SAGE algorithm from Section 3.3 with the assumption that
only even samples for the first 2B samples of h̃[q − ℓ] are given. Modeling
the random sequence as periodic with period B̄ ≥ 4B − 1 and a HDS
given by the full period (including the samples for odd ℓ), we can proceed
iteratively. We only require a good initialization of ch [ℓ] which is difficult to
obtain because more than half of the samples in one period B̄ are missing.
• Define the positive semidefinite 2B × 2B Toeplitz matrix C T ∈ T2B +,0

with first row [ĉh [0], ĉh [1], . . . , ĉh [2B − 1]]. Given estimates ĉh [ℓ], ℓ ∈
{0, 2, . . . , 2(B − 1)}, e.g., from the SAGE algorithm in Section 3.3.5, this
matrix is not known completely. We would like to find a completion
of this matrix. The existence of a completion for every given estimate
ĉh [ℓ], ℓ ∈ {0, 2, . . . , 2(B − 1)}, has to be ensured which we discuss be-
low. Generally, the choice of the unknown elements to obtain a positive
semidefinite Toeplitz matrix C T is not unique.
In [131] the completion of this matrix is discussed which maximizes the
entropy ln det[C T ] (see also the references in [130]). The choice of the en-
tropy is only one possible cost function and, depending on the application
and context, other cost functions may be selected to choose a completion
from the convex set of possible completions. No optimization has been
proposed so far, which includes the constraint on the total covariance se-
quence, i.e., the extension of ch [ℓ] for ℓ ∈ {−(2B −1), . . . , 2B +1} to ℓ ∈ Z,
to be band-limited with maximum frequency fmax ≤ 0.25.
Completion of Perfectly Known Partial Band-Limited

Autocovariance Sequences
Given ch [ℓ] for the periodic pattern ℓ = 2i with i ∈ {0, 1, . . . , B − 1}, find
a completion of this sequence for ℓ ∈ {1, 3, . . . , 2B − 1} which satisfies the
following two conditions on ch [ℓ], ℓ = 0, 1, . . . , 2B − 1:
1. It has a positive semidefinite extension on Z, i.e., its Fourier transform is
real and positive.
2. An extension on Z exists which is band-limited with maximum frequency
fmax ≤ 0.25.19
This corresponds to interpolation of the given decimated sequence ch [ℓ] at
ℓ ∈ {1, 3, . . . , 2B − 1}.
The required background to solve this problem is given in Appendix B.
The characterization of the corresponding generalized band-limited trigono-
metric moment problem in Theorem B.4 gives the necessary and sufficient
conditions required for the existence of a solution: The Toeplitz matri-
′
ces C̄ T with first row [ch [0], ch [2], . . . , ch [2(B − 1)]] and C̄ T with first row
′ ′ ′ ′ ′
[ch [0], ch [2], . . . , ch [2B − 4]] for ch [2i] = ch [2(i − 1)] − 2 cos(2πfmax )ch [2i] +
′
ch [2(i + 1)] and fmax = 2fmax are required to be positive semidefinite. Gen-
erally, neither of both matrices is singular and the band-limited positive
semidefinite extension is not unique.
First, we assume that both conditions are satisfied as ch [2i] is taken from
a autocovariance sequence band-limited to fmax . We are only interested
in the completion on the interval {0, 1, . . . , 2B − 1}. This corresponds to
19
The Fourier transform of the extension is non-zero only on the interval
[−2πfmax , 2πfmax ].
a matrix completion problem for the Toeplitz covariance matrix C T with

first row [ch [0], ch [1], . . . , ch [2B − 1]] given ch [ℓ] for the periodic pattern
2B−1
above. Admissible completions are {ch [ℓ]}ℓ=0 which have a band-limited
positive semidefinite extension on Z. This convex set of admissible exten-
sions is characterized by Theorem B.3, where C ′T is Toeplitz with first row
[ch [0], ch [1], . . . , ch [2B −2]] and c′h [ℓ] = ch [ℓ−1]−2 cos(2πfmax )ch [ℓ]+ch [ℓ+1].
As an application we choose univariate MMSE prediction (Section 2.3.1) of
the random variable h[q] based on h̃T [q] = [h̃[q − ℓ1 ], h̃[q − ℓ2 ], . . . , h̃[q − ℓB ]]T
for ℓi = 2i − 1, i ∈ {1, 2, . . . , B},
ĥ[q] = wT h̃T [q] (3.101)

−1
with wT = cH T
hT h C ĥT , where chT h = [ch [1], ch [3], . . . , ch [2B − 1]] , C h̃T =
C hT + ce I B , and C hT is Toeplitz with first row [ch [0], ch [2], . . . , ch [2(B − 1)]].
Therefore, C h̃T is given and chT h contains the unknown odd samples of ch [ℓ].
The minimax prediction problem (2.109) for the new uncertainty set reads
h i
min max E |h[q] − wT h̃T [q]|2 s. t. C T 0, C ′T 0. (3.102)
w ch
Th
Due to the minimax Theorem 2.2, it is equivalent to the maxmin problem

−1
max ch [0] − cH
hT h C h̃ chT h s. t. C T 0, C ′T 0 (3.103)
ch ∈C B T
Th
with the expression for the MMSE parameterized in the unknown chT h . For
this uncertainty set, the minimax MSE solution is equivalent to the minimum
(weighted) norm completion of the covariance matrix C T
min kchT h kC −1 s. t. C T 0, C ′T 0. (3.104)

ch h̃T
Th
According to Example B.2 in Section B.2 the trivial solution chT h = 0B×1 is
avoided if fmax < 0.25.
Problem (3.104) can be easily transformed to an equivalent formulation
involving a linear cost function and constraints describing symmetric cones
min t s. t. C T 0, C ′T 0, kchT h kC −1 ≤ t (3.105)

ch ,t h̃T
Th
which can be solved numerically using interior point methods, e.g.,

SeDuMi [207]. The Toeplitz structure of the matrices involved in (3.105) can
be exploited to reduce the numerical complexity by one order (compare [3]).
Example 3.4. Consider ch [ℓ] = J0 (2πfmax ℓ), which is perfectly known for
ℓ ∈ {0, 2, . . . , 2B − 2} with B = 5. We perform linear prediction as defined
in (3.101). The completion of the covariance sequence for l ∈ {1, 3, . . . , 2B−1}
is chosen according to the minimax MSE (3.102) or equivalently the minimum
Fig. 3.3 Relative root mean square (RMS) error (3.106) of completed partial band-
limited autocovariance sequence: Minimum norm completion with fixed fmax = 0.25
and true fmax (adaptive).
norm criterion (3.104). We consider two versions of the algorithm: An upper

bound for fmax according to the sampling theorem, i.e., fmax = 0.25, or the
minimum true fmax is given. The knowledge of fmax is considered in (3.105)
via C ′T . The relative root mean square (RMS) error
v
u PB
2
t i=1 |c h [2i − 1] − ĉh [2i − 1]|
u
PB (3.106)
2
i=1 |ch [2i − 1]|
for the completion ĉh [ℓ] is shown in Figure 3.3. For fmax ≤ 0.1, the relative
RMS error is below 10−3 . As the performance of both versions is very simi-
lar, the completion based on a fixed fmax = 0.25 should be preferred: It does
not require estimation of fmax , which is often also based on the (decimated)
autocovariance sequence ch [2i] [211]. Obviously, constraining the partial au-
tocovariance sequence to have a band-limited extension is a rather accurate
characterization. Furthermore, its interpolation accuracy shows that most of
the information on ch [ℓ] for ℓ ∈ {0, 1, . . . , 2B − 1} is concentrated in this
interval.
The MSE for prediction of h[q] based on the completion with true fmax
(adaptive) and fixed fmax = 0.25 in Figure 3.4 prove the effectiveness of our
approach. The upper bound given by the worst case MSE (3.103) is only
shown for adaptive fmax ; for fixed fmax = 0.25 it is almost identical to the
actual MSE given in Figure 3.4. Both upper bounds are rather tight w.r.t.
the MSE for perfect knowledge of {ch [ℓ]}2B−1
ℓ=0 . ⊓
⊔
Fig. 3.4 MSE E[|εQ [q]|2 ] for minimax MSE prediction with filter order Q = B = 5
based on the completed partial band-limited autocovariance sequence.
Completion of Estimated Partial Band-Limited Autocovariance

Sequences
If the true ch [ℓ] for ℓ ∈ {0, 2, . . . , 2(B − 1)} is not available, we propose the
following two-stage procedure:
• Estimate ch [ℓ] for ℓ ∈ {0, 2, . . . , 2(B − 1)} based on the SAGE algorithm
introduced in Section 3.3.5.
• Solve the optimization problem (3.102) or (3.104) for ĉh [ℓ], ℓ ∈
{0, 2, . . . , 2(B−1)}, to construct an estimate for ch [ℓ] for ℓ ∈ {0, 1, . . . , 2B−
1}, which results in a positive semidefinite Toeplitz matrix with first row
ĉh [ℓ], ℓ ∈ {0, 1, . . . , 2B − 1}. If the completion is based on the side informa-
tion fmax ≤ 0.25, i.e., we choose fmax = 0.25, a solution to (3.104) always
exists (Theorem B.4).
But if we assume an fmax < 0.25, a completion of ĉh [ℓ] for ℓ ∈
{0, 2, . . . , 2(B − 1)} such that it has positive definite and band-limited
extension on Z may not exist. In this case we first project ĉ =
[ĉh [0], ĉh [2], . . . , ĉh [2(B − 1)]]T on the space of positive semidefinite se-
quences band-limited to fmax solving the optimization problem
′
min kc − ĉk2 s.t. C̄ T 0, C̄ T 0 (3.107)
c
with minimizer ĉLS . The constraints result from Theorem B.4, where
′
C̄ T is Toeplitz with first row [ĉh [0], ĉh [2], . . . , ĉh [2(B − 1)] and C̄ T is
Toeplitz with first row [ĉ′h [0], ĉ′h [2], . . . , ĉ′h [2B − 4] for ĉ′h [2i] = ĉh [2(i −
′ ′
1)] − 2 cos(2πfmax )ĉh [2i] + ĉh [2(i + 1)] and fmax = 2fmax . This problem
can be transformed into a cone program with second order and positive
semidefinite cone constraints.
Problem (3.102) with ĉ or ĉLS is no robust optimization with guaranteed

maximum MSE anymore, because ĉh [ℓ] are treated as error-free in this for-
mulation.20 As in (3.104) it has to be understood as a minimum norm com-
pletion, where the weighting is determined by the application.
Extensions to Other Patterns of Partial Autocovariance Sequences
An extension of this completion algorithm (3.104) to other patterns P of

partial covariance sequences is only possible, if ch [ℓ] or ĉh [ℓ] are given for a
periodic pattern ℓ = NP i with i ∈ {0, 1, . . . , B −1} as stated by Theorem B.4.
For irregular (non-periodic) patterns of the partial covariance sequence, a
completion may not exist for all admissible values of ch [ℓ] or ĉh [ℓ] with ℓ ∈ P
(see the example in [131] for fmax = 1/2).
Thus, for an irregular spacing of the (noisy) channel observations, predic-
tion based on an estimated autocovariance sequence is problematic: Estima-
tion of the autocovariance sequence from irregular samples is difficult and a
completion does not exist in general, i.e., for all estimated partial autoco-
variance sequences forming partial positive semidefinite Toeplitz covariance
matrix (Appendix B). In this case, we recommend prediction based on a
worst case autocovariance sequence as introduced in Section 2.4.3.2.
For a periodic spacing, our results from above show that a band-limited
completion is possible. The proposed algorithm is still rather complex, but
illustrates the capability of this approach, which we explore further in Sec-
tion 3.6.
3.5 Least-Squares Approaches
In Section 3.3 we propose the SAGE algorithm to solve the ML problem (3.37)
for estimation of C h and C n,s (Model 2) iteratively. Of course, this ML
problem inherently yields an estimate of C y taking into account its linear
structure, i.e., C y is restricted to the set

S = C y |C y = SC h S H + I N ⊗ C n,s , C h 0, C n,s 0 (3.108)
H −1
H ⊥
= C y |C y = S C h + (S S) (I N ⊗ C n,s ) S + P (I N ⊗ C n,s ),
C h 0, C n,s 0} . (3.109)
The second description of S is based on the decomposition in signal and

noise subspace with the orthogonal projector on the M N̄ -dimensional noise
⊥,T ⊥ †
subspace P ⊥ = I M N − SS † = P̄ ⊗ I M , P̄ = I N − S̄ S̄.
20
Introducing a convex uncertainty set for ĉh [ℓ] would turn it into a robust minimax
optimization problem again.
3.5 Least-Squares Approaches 97
For the following argumentation, we write the ML problem (3.37) as a

maximization w.r.t. C y
max Ly ({y[q]}q∈O ; C y ) (3.110)

C y ∈S
with log-likelihood function

Ly ({y[q]}q∈O ; C y ) = −BM N ln π − B ln det[C y ] − Btr C̃ y C −1
y . (3.111)
The iterative solution of this problem is rather complex (Section 3.3.4

and 3.3.6). To reduce the numerical complexity of ML estimation, Stoica
and Söderström introduced the extended invariance principle (EXIP) [204]
(A brief review is given in Section 3.1.1). It yields an algorithm with the
following steps:
1. The set S for C y is extended to Sext such that the ML problem (3.110)
has a simple solution.21 In general the ML estimate based on the extended
set is not an element of S with probability one.22
2. An estimate in S is obtained from a weighted Least-Squares (LS) approxi-
mation of the ML estimate of the parameters for the extended set Sext .23
This estimate is asymptotically (in B) equivalent to the ML estimate
from (3.110) if the weighting in the LS approximation is chosen as the Hessian
matrix of the cost function (3.111). The Hessian matrix is evaluated based
on a consistent estimate of the parameters for the extended set.
We choose two natural extensions of S. The first is based on the decompo-
sition of C y in the covariance matrix of the signal and noise subspace (3.109)

Sext1 = C y |C y = SC ĥ S H + P ⊥ (I N ⊗ C n,s ), C ĥ 0, C n,s 0 ,
(3.112)
where C ĥ = E[ĥ[q]ĥ[q]H ] is the covariance of the ML channel estimate (3.25)

(see also (2.31)). The second choice does not assume any structure of C y and
is the set of all positive semidefinite M N × M N matrices
Sext2 = {C y |C y 0} = SM N
+,0 . (3.113)
The relation of the extended sets to S is S ⊆ Sext1 ⊆ Sext2 .

Application of the EXIP yields the following optimization problems:
21
S is a subset of the extended set.
22
If it is an element of S, it is also ML estimate based on S due to the ML invariance
principle [124].
23
This corresponds to a weighted LS solution in S of the likelihood equation cor-
responding to the extended set. The likelihood equation is the gradient of the log-
likelihood function (e.g., (3.2)) set to zero, which is a necessary condition for a ML
solution (compare with (3.4)).
• The ML estimates of the parameters C ĥ and C n,s for C y constrained to

Sext1 are (Step 1)
1 X
C̃ ĥ = ĥ[q]ĥ[q]H = S † C̃ y S †,H
B
q∈O
(3.114)
1 X ⊥
C̃ n,s = Y [q]P̄ Y [q]H
B N̄ q∈O
with N̄ = N − K(L + 1) and Y [q] = [y[q, 1], y[q, 2], . . . , y[q, N ]] (y[q, n]
defined equivalently to (3.18)). The corresponding weighted LS approxi-
mation of these parameters reads (Step 2)
min kC̃ ĥ − C h − (S H S)−1 (I N ⊗ C n,s )k2F,W 1 + kC̃ n,s − C n,s k2F,W 2

C h ∈CP ×P
C n,s ∈CM ×M
(3.115)
√ −1 √ −1
with weights W 1 = B C̃ ĥ and W 2 = B N̄ C̃ n,s . The weights are
related to the inverse error covariance matrix of the estimates (3.114) [133]
which is based on consistent estimates of the involved covariance matrices.
The weighted Frobenius norm is defined as kAk2F,W = tr[AW AW ] with
A = AH and W 0. For white noise C n,s = cn I M , we obtain
min kC̃ ĥ − C h − (S H S)−1 cn k2F,W 1 + (c̃n − cn )2 w22 , (3.116)

C h ∈CP ×P ,cn ∈R
where c̃n is an estimate of cn based on the noise subspace (compare (3.29)

and (3.67))
1 X
c̃n = kP ⊥ y[q]k22 . (3.117)
B N̄ M q∈O
√
The new weight w2 = BM N̄ c̃−1 n is the inverse of the square root of the
error covariance of c̃n which depends on the unknown cn ; it is evaluated
for its consistent estimate c̃n which is sufficient according to the EXIP.
• The ML estimate of C y for the extended set Sext2 (3.113) is the sample
covariance matrix (Step 1)
1 X
C̃ y = y[q]y[q]H . (3.118)
B
q∈O
Its weighted LS approximation reads (Step 2)
min kC̃ y − SC h S H − I N ⊗ C n,s k2F,W (3.119)

C h ∈CP ×P ,C n,s ∈CM ×M
and for spatially white noise
min kC̃ y − SC h S H − cn I M N k2F,W . (3.120)

√ −1
The weighting matrix is W = B C̃ y [133] based on the consistent esti-
mate C̃ y of C y .
All weighted LS problems resulting from the EXIP yield estimates of C h
and C n,s (or cn ) which are linear in C̃ ĥ , C̃ n,s , or C̃ y . As this approach is only
asymptotically equivalent to ML, the estimates may be indefinite for small
B. Moreover, for small B the weighting matrices are not defined because
the corresponding matrices are not invertible. For example, C̃ y is invertible
only if B ≥ M N . But M N , e.g., M = 4 and N = 32, can be larger than
the maximum number of statistically independent observations y[q] available
within the stationarity time of the channel parameters h[q].
Therefore, we consider unweighted LS approximation with W 1 = I P ,
W 2 = I M , w2 = 1, and W = I M N in the sequel. First, we derive solu-
tions for LS approximations assuming C n,s = cn I M , which we generalize in
Section 3.5.4.
3.5.1 A Heuristic Approach
Before we treat the LS problems from above, we introduce a simple heuristic

estimator of C h which provides further insights in the improvements of the
solutions to follow.
In [56] we proposed the sample covariance matrix C̃ ĥ (3.114) of the ML
channel estimates ĥ[q] = S † y[q] (2.31) as an estimate of C h :
B
Heur 1 X
Ĉ h = S † C̃ y S †,H = ĥ[q]ĥ[q]H . (3.121)
B q=1
It is the solution of the LS approximation problem
min kC̃ y − SC h S H k2F . (3.122)

Ch
Heur
Implicitly the noise variance is assumed small and neglected. Therefore, Ĉ h
is biased with bias given by the last term of
h Heur i
E Ĉ h = C h + cn (S H S)−1 . (3.123)
This estimate is always positive semidefinite.24
3.5.2 Unconstrained Least-Squares
The LS approximation problem (3.116) derived from the EXIP based on Sext1
(Step 2) is
LS
(Ĉ h , ĉLS
n )= argmin kC̃ ĥ − C h − (S H S)−1 cn k2F + (c̃n − cn )2 . (3.124)
With the obvious solution
1
ĉLS
n = trace C̃ y − P C̃ y P
M N̄
1 (3.125)
= trace P ⊥ C̃ y P ⊥
M N̄
LS
Ĉ h = S † C̃ y S †,H − (S H S)−1 ĉLS
n
the cost function (3.124) is zero.

The second LS problem (3.120) which is based on the second parameteri-
zation Sext2 results in [103, 41, 54]
min kC̃ y − SC h S H − cn I M N k2F . (3.126)

C h ,cn
Introducing the projectors P = SS † on the signal subspace and P ⊥ =

(I M N − SS † ) on the noise subspace together with C̃ y = (P ⊥ + P )C̃ y (P ⊥ +
P ) and I M N = P + P ⊥ we can decompose this cost function in an approxi-
mation error in the signal and noise subspace25
kC̃ y − SC h S H − cn I M N k2F =
kP C̃ y P − SC h S H − cn P k2F + kP ⊥ C̃ y P ⊥ − cn P ⊥ k2F . (3.127)
Choosing C h = S † C̃ y S †,H −(S H S)−1 cn the approximation error in the signal

subspace is zero and minimization of the error in the noise subspace
min kP ⊥ C̃ y P ⊥ − cn P ⊥ k2F , (3.128)

cn
yields ĉLS
n .
24 Heur
Note that the SAGE algorithm (Section 3.3.4) provides Ĉ h in the first iteration
(0)
when initialized with cn = 0.
25
Note that kA + Bk2F = kAk2F + kBk2F for two matrices A and B with inner
product tr[AH B] = 0 (“Theorem of Pythagoras”).
LS
Both LS problems (3.124) and (3.126) result in the same estimates Ĉ h
Heur LS
and ĉLS
n . The bias of the heuristic (3.123) is removed in Ĉ h
estimator Ĉ h
LS
[41], which is now unbiased. The estimate Ĉ h is
indefinite with a non-zero
probability which is mainly determined by C h , B, N , and cn . Depending on
the application, this can lead to a substantial performance degradation.
LS LS
On the other hand, if Ĉ h is positive semidefinite, Ĉ h and ĉLS n are the
LS LS
ML estimates: Ĉ h and ĉn are the unique positive semidefinite solutions
to the likelihood equation (cf. (3.4)). Here, the likelihood equation is C̃ y =
SC h S H +cn I M N and the approximation error in (3.126) is zero in this case.26
3.5.3 Least-Squares with Positive Semidefinite

Constraint
LS
Because Ĉ h (3.125) can be indefinite, we introduce positive semidefinite
constraints in both LS problems (3.124) and (3.126). We start with the latter
problem because the solution for the first follows from it.
LS Approximation for Sext2
We extend problem (3.126) to
psd2
(Ĉ h , ĉpsd2
n ) = argmin kC̃ y − SC h S H − cn I M N k2F s.t. C h 0. (3.129)
C h ,cn
To solve it analytically we proceed based on (3.127) and define S ′ =

H
S(S H S)−1/2 with orthonormalized columns, i.e., S ′ S ′ = I P . Thus, we have
H
P = S ′ S ′ . We rewrite (3.127) using the EVD of
S ′H C̃ y S ′ = V ΣV H (3.130)
with Σ = diag [σ1 , σ2 , . . . , σP ], σi ≥ σi+1 ≥ 0, and of
X = (S H S)1/2 C h (S H S)1/2 = U x D x U H
x (3.131)
with D x = diag [d1 , d2 , . . . , dP ]:
26
In [60] the solution (3.125) is also obtained as a special case of the iterative solution
of (3.110) with the “expectation-conditional maximization either algorithm”.
H H
kC̃ y − SC h S H − cn I M N k2F = kS ′ (S ′ C̃ y S ′ − X − cn I P )S ′ k2F +
+ kP ⊥ C̃ y P ⊥ − cn P ⊥ k2F
H
= kS ′ C̃ y S ′ − U x D x U H 2 ⊥
x − cn I P kF + kP C̃ y P
⊥
− cn P ⊥ k2F . (3.132)
First, we minimize this expression w.r.t. U x subject to the constraint

UHx U x = I P . The corresponding Lagrangian function with Lagrange pa-
rameters {mi }Pi=1 ≥ 0 and M ∈ C
P ×P
is
H
F(U x , {di }P P ′ ′ H 2
i=1 , cn , {mi }i=1 , M ) = kS C̃ y S − U x D x U x − cn I P kF

+ kP ⊥ C̃ y P ⊥ − cn P ⊥ k2F + tr M T (U H ∗ H
x U x − I P ) + tr M (U x U x − I P ) −
P
X
− d i mi . (3.133)
i=1
The Karush-Kuhn-Tucker (KKT) conditions lead to
∂F
= U x M T + U x M ∗ − 2(S ′H C̃ y S ′ − cn I P )U x D x = 0P ×P . (3.134)
∂U ∗x
The left hand side of the resulting equation

1
(M T + M ∗ ) = U H ′H ′
x (S C̃ y S − cn I P )U x D x (3.135)
2
is Hermitian. If all di > 0 are distinct, the right hand side is only Hermitian
if U x = V . Later we see that di > 0 are distinct if the eigenvalues σi are
distinct.
Due to the definition of the Frobenius norm and U x = V , which is the
optimum choice of eigenvectors, the cost function is now equivalent to
P
X
kC̃ y − SC h S H − cn I M N k2F = (σi − di − cn )2 + kP ⊥ C̃ y P ⊥ − cn P ⊥ k2F .
i=1
(3.136)
Completing the squares in (3.136) and omitting the constant, the optimiza-
tion problem (3.129) is
P
X 2
min (σi − di − cn )2 + M N̄ cn − ĉLS
n s.t. di ≥ 0. (3.137)
{di }M K
i=1 ,cn i=1
Note that tr[P ⊥ ] = M N̄ . Assuming distinct σi this optimization problem

can be solved interpreting its KKT conditions as follows:
1. If the unconstrained LS solution (3.125) is positive semidefinite, i.e., di ≥ 0
for all i, it is the solution to (3.137).
2. The constraints on di are either active or inactive corresponding to di

being zero or positive, i.e., 2P possibilities should be checked in general.
These can be reduced exploiting the order of σi , which is our focus in the
sequel.
3. As a first step we could set dj = 0 for all j with σj − ĉLS n < 0. All indices j
with σj −ĉLS
n < 0 are collected in the set Z ⊆ {1, 2, . . . , P } with cardinality
Z = |Z|. For the remaining indices, we choose di = σj − cn . Thus, the cost
function in (3.137) is
X 2
(σi − cn )2 + M N̄ cn − ĉLS
n . (3.138)
i∈Z
Minimization w.r.t. cn yields

!
1 X
ĉpsd2
n = M N̄ ĉLS
n + σi . (3.139)
M N̄ + Z i∈Z
Because σi < ĉLSn , i ∈ Z, we have ĉn

psd
≤ ĉLS
n with strict inequality in case
Z > 0. Thus, also information about the noise in the signal subspace is
considered, i.e., part of the signal subspace is treated as if it contained
noise only.27
4. But the cost function (3.138) could be reduced further if only fewer di
are chosen zero. This is possible if we start with the smallest σi , i.e.,
i = P , and check whether σi − ĉLS n < 0. If negative, we set dP = 0. This
decreases ĉpsd
n based on Z = {P }. We continue with j = P − 1 and check
if σP −1 − ĉpsd
n < 0. ĉpsd
n is recomputed based on (3.139) in every step. As
ĉpsd
n decreases, generally fewer σ i − ĉpsd
n are negative than for σi − ĉLS
n . We
continue decreasing i until σi −ĉn ≥ 0, for which we choose di = σi −ĉpsd
psd
n .
5. Thus, the number Z = |Z| of zero di is minimized, i.e., fewer terms appear
in the cost function (3.138). This decreases the cost function.
These observations yield Algorithm 3.1 which provides a computationally
efficient solution to optimization problem (3.137). The estimates ĉpsd
n and
psd
Ĉ h are biased as long as the probability for any σi − ĉLSn < 0 is non-
zero, which is more likely for small B, large noise variance cn , and higher
correlations in C h .
LS Approximation for Sext1
An alternative approach is the positive semidefinite LS approximation of C̃ ĥ
27
The SAGE algorithm for solving the ML problem (3.110) on S also exploits the
noise in the signal space to improve its estimate of cn (Section 3.3.2).
Algorithm 3.1 Positive semidefinite estimate of channel covariance matrix

and noise variance related to the EXIP with Sext2 .
Z = 0, Z = {}
2: ĉn = ĉLS
n
S ′ = S(S H S)−1/2
H
4: compute EVD S ′ C̃ y S ′ = V ΣV H
Σ = diag [σ1 , σ2 , . . . , σP ], σj ≥ σj+1 , ∀j
6: for i = P, P − 1, . . . , 1 do
if σi − ĉn < 0 then
8: di = 0
Z ←Z +1
10: Z ← Z ∪ {i}
1 LS
P
ĉn = M N̄ +Z M N̄ ĉn + σi
i∈Z
12: else
di = σi − ĉn
14: end if
end for
16: D = diag [d1 , d2 , . . . , dP ]
ĉpsd2
n = ĉn
psd2
18: Ĉ h = (S H S)−1/2 V DV H (S H S)−1/2
psd1
(Ĉ h , ĉpsd1
n ) = argmin kC̃ ĥ − C h − (S H S)−1 cn k2F + (c̃n − cn )2
C h ,cn (3.140)
s.t. C h 0.
For S H S = N I P , a solution can be derived with the same steps as before.

We parameterize C h with its eigenvalue decomposition C h = U DU H , where
U H U = I P , D = diag [d1 , d2 , . . . , dP ], and di ≥ 0.
The eigenvectors of C̃ ĥ and C h can be shown to be identical. The opti-
mization problem for the remaining parameters is
P
X
min (σi − di − cn /N )2 + (cn − ĉLS
n )
2
s.t. di ≥ 0. (3.141)
C h ,cn
i=1
It is solved by Algorithm 3.2.

The estimate of cn is
!
1 1 X
ĉpsd1
n = ĉLS
n + σi ≤ ĉLS
n (3.142)
1 + Z/N 2 N
i∈Z
Algorithm 3.2 Positive semidefinite estimate of channel covariance matrix

and noise variance related to the EXIP with Sext1 .
Z = 0, Z = {}
2: ĉn = ĉLS
n
compute EVD S † C̃ y S †H = V ΣV H
4: Σ = diag [σ1 , σ2 , . . . , σP ], σj ≥ σj+1 ∀j
for i = P, P − 1, . . . , 1 do
6: if σi − ĉn /N < 0 then
di = 0
8: Z ←Z +1
Z ← Z ∪ {i}
1 LS 1
P
10: ĉn = 1+Z/N 2 ĉn + N σ i
i∈Z
else
12: di = σi − ĉn /N
end if
14: end for
D = diag [d1 , d2 , . . . , dP ]
16: ĉpsd1
n = ĉn
psd1
Ĉ h = V DV H
with the set Z of indices i for which di = 0. For the same set Z, this estimate
is closer to ĉLS
n than ĉn
psd2
because ĉpsd2
n ≤ ĉpsd1
n ≤ ĉLS LS
n . If ĉn is already
accurate, i.e., for sufficiently large B and N̄ , the bias increases the MSE
w.r.t. cn ; but for small N̄ and B the bias reduces the estimation error (this
is not observed for the selected scenarios in Section 3.6).
A Heuristic Modification
Because ĉpsd1
n and ĉpsd2
n are biased and the their estimation error is typically
larger than for the unbiased estimate ĉLSn , the following heuristic achieves
better result, when applied to MMSE channel estimation:
1. Estimate cn using ĉLS
n (3.125), which is unbiased and based on the true
noise subspace.
2. Solve optimization problem (3.141)
P
X
min (σi − di − ĉLS
n )
2
s.t. di ≥ 0, (3.143)
{di }M K
i=1 i=1
i.e., set all di = 0 for σi − ĉLS

n < 0. The solution reads
psd3
Ĉ h = V D+ V H , (3.144)
where D + performs max(0, di ) for all elements in D from the EVD of

LS
Ĉ h = V DV H . (3.145)
This is equivalent to discarding the negative definite part of the estimate

LS
Ĉ h (3.125) similar to [92].
Computational Complexity
The additional complexity for computing the positive definite solution com-
H
pared to (3.125) results from the EVD of S ′ C̃ y S ′ and computation of
H 1/2
(S S) and
(S H S)−1/2 (only for Algorithm 3.1). For the indefinite least-squares esti-
mate (3.125), a tracking-algorithm of very low-complexity was presented by
[141], whereas tracking of eigenvalues and eigenvectors is more difficult and
complex.
3.5.4 Generalization to Spatially Correlated Noise
Next we generalize our previous results to a general noise covariance matrix

C n = I N ⊗ C n,s and solve (3.115) and (3.119) without weighting. We start
with (3.119) and rewrite (3.127) for the more general noise covariance matrix
C n = I N ⊗ C n,s
kC̃ y − SC h S H − I N ⊗ C n,s k2F = kP C̃ y P − SC h S H − P (I N ⊗ C n,s )P k2F

+ kP ⊥ C̃ y P ⊥ − P ⊥ (I N ⊗ C n,s )P ⊥ k2F . (3.146)
As before the first term is zero choosing C h from the space of Hermitian
P -dimensional matrices as
LS LS
Ĉ h = S † (C̃ y − I N ⊗ Ĉ n,S )S †,H (3.147)
LS LS
given an estimate Ĉ n,S as described below. The estimate Ĉ h is unbiased,
but indefinite in general. Now, we can minimize the second term in (3.146)
min kP ⊥ C̃ y P ⊥ − P ⊥ (I N ⊗ C n,s )P ⊥ k2F . (3.148)

C n,s
This yields the estimate of the noise covariance matrix C n,s

3.6 Performance Comparison 107
N
LS 1 X
Ĉ n,s = (eT ⊗ I M )P ⊥ C̃ y P ⊥ (en ⊗ I M ), (3.149)
N − K n=1 n
which is based on the noise subspace. It is equivalent to the sample covariance

of the estimated noise
n̂[q, n] = (eT ⊥
n ⊗ I M )P y[q] = y[q, n] − Ĥ[q]s[n] (3.150)
with the ML channel estimate (2.31)

Ĥ[q] = unvec ĥ[q] = unvec S † y[q] . (3.151)
This leads to
B N B
LS 1 XX 1 X ⊥
Ĉ n,s = H
n̂[q, n]n̂[q, n] = Y [q]P̄ Y [q]H (3.152)
B N̄ q∈O n=1 B N̄ q∈O
as in (3.64) and (3.114). This estimate of the noise covariance matrix is very
similar to the well-known ML estimate [158] of the noise covariance matrix,
when estimated jointly with the channel h[q] (cf. (2.32)). The difference is in
the scaling by N̄ = N − K instead of N which yields an unbiased estimate
with smaller estimation error in this case.
Optimization problem (3.115) yields the same solution. Again both prob-
lems differ if we add a positive semidefinite constraint on C h .
Here a simple positive semidefinite solution of C h can be obtained similar
to the heuristic introduced at the end of Section 3.5.3: A priori we choose
LS
Ĉ n,s as the estimate of C n,s . Then, for (3.115) with positive semidefinite
LS
constraint Ĉ h is given by the positive semidefinite part of Ĉ h (3.147), i.e.,
setting the negative eigenvalues to zero.
3.6 Performance Comparison
The estimators for channel and noise covariance matrices which we present
in Sections 3.3 and 3.5 differ significantly regarding their computational com-
plexity. But how much do they differ in estimation quality?
Another important question is, how many observations (measured by B
and I) are necessary to obtain sufficient MMSE channel estimation and pre-
diction quality and whether this number is available for a typical stationarity
time of the wireless channel.
Estimation quality is often measured by the mean square error (MSE)
of the estimates. The MSE does not reflect the requirements of a partic-
ular application on the estimation accuracy. Therefore, we apply the esti-
Scenario I B M K Assumptions Results in

on C hT or C n,s Figures:
A 1 [1, 1000] 8 8 C n,s = cn I M 3.5, 3.6
1 100 8 8 3.7
B 1 [1, 1000] 8 1 C n,s 6= cn I M 3.8, 3.10
1 100 8 1 3.9, 3.11
C 1 [6, 50] 1 1 Clarke’s psd 3.14
1 20 1 1 (Figure 2.12) 3.12, 3.13, 3.16
D 1 20 1 1 bandpass psd 3.15, 3.17
(Figure 2.12)
E 20 20 4 4 block Toeplitz 3.18
Clarke’s psd
C n,s = cn I M
Table 3.2 Overview of simulation scenarios in Section 3.6.
mated covariance matrices to MMSE channel estimation (Section 2.2.1) and

MMSE prediction (Section 2.3.1). We discuss the performance for estimation
of channel correlations in space, time, and (jointly) in space and time domain
separately.
An overview of the selected scenarios is given in Table 3.2. In scenarios
with multiple antennas we assume a uniform linear array with half (car-
rier) wavelength spacing. Moreover, we only consider frequency flat channels
(L = 0); training symbols are chosen from Hadamard sequences with length
N = 16 (S H S = N I P ). More details on the models are given below and in
Appendix E.
Estimation of Spatial Channel Covariance Matrix
We compare the following algorithms for estimation of the spatial channel

covariance matrix C h and the noise variance cn or spatial noise covariance
matrix C n,s (the names employed in the legends of the figures are given in
quotation marks at the end):
• ML estimation of C h with SAGE is based on (3.94) (Section 3.3.4) with
imax = 10 iterations. As initialization we use the heuristic positive semidef-
inite LS estimate (3.144). The unbiased estimates (3.64) or (3.67) are em-
ployed for the noise covariance matrix C n,s and variance cn , respectively.
(“SAGE”)
• This version is identical to “SAGE” except for estimation of C n,s or cn ,
which are estimated iteratively based on HDS 2 for the noise (Section 3.3.2)
with initialization from (3.64) or (3.67). In imax = 10 iterations we use
Niter = 5 iterations for the HDS 1, the next 5 iterations with HDS 2.
(“SAGE (include HDS for noise)”)
• The heuristic estimator of C h is (3.123) and for estimation of C n,s or cn on
we apply (3.125) and (3.152) (identical to (3.64) and (3.67)). (“Heuristic”)
• The unbiased LS estimate is (3.125) for C n,s = cn I M and (3.147)

with (3.152) for spatially correlated noise. (“Unbiased LS”)
• The positive semidefinite LS estimator based on Sext1 and C n,s = cn I M is
given in Algorithm 3.2. (“Pos. semidef. LS (ext. 1)”)
• The positive semidefinite LS estimator based on Sext2 and C n,s = cn I M is
given in Algorithm 3.1. (“Pos. semidef. LS (ext. 2)”)
• The heuristic positive semidefinite LS approach is (3.144) with (3.152) or
ĉn from (3.125). (“Pos. semidef. LS (heuristic)”)
• The solution of the ML problem for C n,s = cn I M based on the expectation-
conditional maximization either (ECME) algorithm is given by Equation
(3.8) in [60]. (“ECME”)
As standard performance measure we choose the average MSE of ĉn nor-
malized to cn and the average MSE of Ĉ h normalized to tr[C h ], where the
MSEs are averaged over 1000 estimates (approximation of the expectation
operator given in the ordinate of the figures). Choosing channel estimation as
application, we give the MSE of the channel estimate ĥ = Ŵ y, where Ŵ is
the MMSE channel estimator (Section 2.2.1) evaluated for Ĉ h and Ĉ n,s .28 It
is also averaged over 1000 independent estimates. Normalization with tr[C h ]
ensures that the maximum normalized MSE for perfect knowledge of the co-
H
variance matrices is one. For Ĉ h = Û Λ̂Û , S H S = N I P , and Ĉ n,s = ĉn I M ,
the MMSE estimator (2.22) can be written as (compare Section 2.2.4)
−1
ĉn H
Ŵ = Û Λ̂ Λ̂ + I P Û S H /N. (3.153)
N
Now, assume that Ĉ h is indefinite, i.e., some diagonal-elements λ̂i of Λ̂ are

negative. If λ̂i ≈ −ĉn /N , the system of equations, whose solution is Ŵ , is
ill-conditioned.29 This leads to a very large norm of Ŵ and a very large MSE
of Ŵ and of ĥ: The regularization of the MMSE estimates, which leads to a
loading of Λ̂ with ĉn /N > 0 (3.153) and decreased norm of Ŵ , is based on
the assumption Ĉ h 0; if this is not true, it has the adverse effect.
As performance references, we employ the MSE in ĥ with a ML channel
estimator W = S † (2.31): It does not rely on C h and the MMSE estimator
is employed aiming at a reduced MSE compared to ML channel estimator
(“ML channel estimation”). Thus, this reference will give us the minimum
number of observations B for acceptable minimum performance. As a lower
bound we use the MSE of the MMSE estimator with perfect knowledge of
the covariance matrices (“perfect knowledge”).
For the simulations, we consider two scenarios with B statistically inde-
pendent observations:
28
Given ML estimates of the covariance matrices, it is the ML estimate of the MMSE
channel estimator according to the ML invariance principle.
29
This degree of ill-conditioning does not lead to numerical problems on a typical
floating point architecture.
Fig. 3.5 Normalized MSE of noise variance cn vs. B for Scenario A (cn = 1).
Scenario A: We choose the same parameters as in Section 2.2.4 and

Figure 2.6: M = K = 8, block-diagonal C h normalized to tr[C h ] =
M K, Laplace angular power spectrum with spread σ = 5◦ and ϕ̄ =
[−45◦ , −32.1◦ , −19.3◦ , −6.4◦ , 6.4◦ , 19.3◦ , 32.1◦ , 45◦ ], and white noise C n,s =
cn I M .
The unbiased estimator for cn (cf. (3.125) and (3.67)) has the smallest
normalized MSE (Figure 3.5). All other estimates attempt to improve ĉn
by exploiting the knowledge about the noise in the signal subspace. The
ECME algorithm performs worst. The estimate ĉn with positive semidefinite
LS based on Sext1 (ext. 1) is considerably better than for Sext2 . The per-
formance of SAGE degrades, when HDS 2 (of the noise) is included in the
(0)
iterations. Because initialization of SAGE with C h is not good enough, it
is not able to take advantage of the additional information about cn in the
signal subspace: The algorithm cannot distinguish sufficiently between noise
and signal correlations. Therefore, we consider only the unbiased estimator
for the noise covariance matrix in the sequel.
The MSE in Ĉ h for all estimators (except the heuristic estimator) is ap-
proximately identical, because the main difference between the estimators lies
in the estimation quality of the small eigenvalues. In [60] the MSE in Ĉ h of
the ECME algorithm was shown to be close to the Cramér-Rao lower bound
(sometimes better due to a bias); we do not investigate this relation further,
because the MSE in Ĉ h is similar for all considered estimators. Moreover, it
is not a good indicator for the applications considered here, which is shown
by the next results.
For B ≤ 200, the MSE of the MMSE estimator based on unbiased LS is
larger than for the ML channel estimator and increases very fast for small B
(Figure 3.6): Ĉ h is indefinite with smallest eigenvalues in the size of −ĉn /N ,
which leads to a large error in Ŵ (3.153). For small B, SAGE and the
heuristic estimator have a very similar MSE and outperform the ML chan-
Fig. 3.6 Normalized MSE of channel estimates ĥ = W y vs. B for Scenario A

(cn = 1).
nel estimator for B ≥ 50. For large B, SAGE outperforms the heuristic due
to the reduced or removed bias. The MMSE performance of all other esti-
mators30 (positive semidefinite LS, SAGE with HDS for noise, and ECME)
is almost identical to SAGE. In Figure 3.6 and 3.7 the heuristic sometimes
outperforms SAGE, because its bias serves as loading in the inverse similar
to minimax robust channel estimators (Section 2.4.2). Figure 3.7 shows the
MSE vs. SNR = −10 log(cn ) for B = 100; the degradation in MSE of the
unbiased LS is large (Figure 3.7).
A performance of the MMSE estimators, which is worse than for ML chan-
nel estimation, can be avoided by a (robust) minimax MSE design as pre-
(2)
sented in Section 2.4.2: The set size described by αh (2.90) has to be adapted
to B, C h , and the SNR, but a good adaptation rule is not straightforward.
The number of statistically uncorrelated observations B available depends
on the ratio of stationarity time to coherence time: Channel measurements
show that this ratio is about 100 in an urban/suburban environment [227].31
Consequently, the relevant number of observations B is certainly below 200,
where the heuristic estimator is close to optimum for uncorrelated noise. This
changes for correlated noise.
Scenario B: We choose M = 8, K = 1, and C h for a Laplace angular power
spectrum with spread σ = 5◦ and ϕ̄ = 0◦ . The spatial noise correlations are
C n,s = C i + cn I M , which models white noise with cn = 0.01 additionally to
an interferer with mean direction ϕ̄ = 30◦ , 30◦ uniform angular power spec-
trum, and tr[C i ] = M ci (ci is the variance of the interfering signal). Among
the LS approaches (besides the simple heuristic) only the unbiased and the
30
This results are not shown here.
31
The measure employed by [227] quantifies the time-invariance of the principle
eigenmode of C h . It is not clear yet, whether these results can be generalized to the
whole covariance matrix and to channel estimation.
Fig. 3.7 Normalized MSE of channel estimates ĥ = W y vs. SNR = −10 log(cn ) for
Scenario A (B = 100).
heuristic positive semidefinite LS estimators were derived for correlated noise

in Section 3.5.4.
In Figures 3.8 and 3.9 the MSE in Ĉ h is given for different B and ci ,
respectively: For high B and small SIR = −10 log(ci ) (large ci ), the improve-
ment of SAGE, whose MSE cannot be distinguished from unbiased LS and
positive semidefinite LS approaches, over the heuristic is largest: It removes
Heur
the bias of the heuristic LS estimator, which is E[Ĉ h ] − C h = C n,s /N for
K = 1.
The MSE in MMSE channel estimation shows a significant performance
improvement over the heuristic estimator for small to medium SIR and
B ≥ 20 (Figures 3.10 and 3.11). For B ≥ 5, all estimators which yield a pos-
itive semidefinite Ĉ h outperform the ML channel estimator. SAGE, which
is initialized with the heuristic positive semidefinite LS, only yields a tiny
improvement compared to its initialization.
From this observation we conclude that the most important task of the
estimators is to remove the bias corresponding to noise correlations from Ĉ h ,
while maintaining a valid (positive semidefinite) channel covariance matrix.
This is already ensured sufficiently by the heuristic positive semidefinite LS
estimator, which is much simpler than SAGE.
Estimation and Completion of Autocovariance Sequence
The algorithms for estimating the temporal channel correlations (autocorrela-

tion sequence) from Section 3.3.5 are implemented as follows assuming perfect
knowledge of the error variance ce (compare with models in Section 2.3.1).
The two main estimation approaches are:
Fig. 3.8 Normalized MSE of spatial covariance matrix vs. B for correlated noise
(Scenario B, ci = 1, cn = 0.01).
Fig. 3.9 Normalized MSE of spatial covariance matrix vs. SIR = −10 log(ci ) for
correlated noise (Scenario B, B = 100).
• The biased sample (heuristic) autocovariance estimator (3.86) (for P = 1),

(“MMSE (Heuristic - Initialization)”)
• the SAGE algorithm (3.97) with B̄ = 2B − 1, initialization by (3.86) (for
P = 1), and imax = 10 iterations as described in Section 3.3.5. (“MMSE
(SAGE)”)
For NP = 2, the even samples of the autocovariance sequence are estimated
with the biased heuristic or SAGE estimator before completing the neces-
sary odd values based on the minimax problem (3.102). We choose a fixed
maximum frequency fmax = 0.25 of the autocovariance sequences to be com-
Fig. 3.10 Normalized MSE of channel estimates ĥ vs. B for correlated noise (Sce-
nario B, ci = 1).
Fig. 3.11 Normalized MSE of channel estimates ĥ vs. SIR = −10 log(ci ) for corre-
lated noise (Scenario B, B = 100).
pleted. The minimax completion is solved for the second order cone formu-
lation (3.105) with SeDuMi [207].32
The estimation accuracy is measured by the average MSE of univariate
MMSE prediction E[|εQ [q]|2 ] (ch [0] = 1) with estimation error εQ [q] = ĥ[q] −
h[q] and ĥ[q] = wT yT [q] (Section 2.3.1). It is averaged over 1000 estimates.
We choose a prediction order Q = 5, which requires a minimum of B = 6
observations to be able to estimate the necessary Q + 1 = 6 samples of the
autocovariance sequence (for NP = 1). Here, we do not consider the MSE in
32
Because we choose fmax = 0.25 the optimization problem (3.102) has a solution
for all estimates and the projection of the estimates on the set of partial sequences
which have a positive semidefinite band-limited (to fmax = 0.25) extension (3.107) is
not necessary.
Fig. 3.12 MSE of channel prediction (Q = 5) vs. maximum frequency fmax (Sce-
nario C, ch [0] = 1, ce = 0.01, B = 20).
Fig. 3.13 MSE of channel prediction (Q = 5) vs. iteration number (Scenario C,

ch [0] = 1, ce = 0.01).
the estimated autocovariance sequence, as it is not reduced significantly by

SAGE: The gain in MSE is small compared to the gains obtained in MMSE
prediction.
As lower bound on the MSE performance we employ the MSE of MMSE
prediction with perfect knowledge of the autocovariance sequence and Q = 5
(“MMSE (perfect knowledge)”). Simply using the last available observation
of the channel ĥ[q − 1], i.e., wT = [1, 0, . . . , 0], serves as an upper limit (a
trivial “predictor”), which should be outperformed by any more advanced
predictor (denoted as “Outdating” in the figures).
The error variance is chosen as ce = 0.01, which corresponds to N = 10
and cn = 0.1 (ce = cn /N as in (2.60)).
Fig. 3.14 MSE of channel prediction (Q = 5) vs. B (Scenario C, ch [0] = 1, ce = 0.01,

fmax = 0.05).
Scenario C, NP = 1: Clarke’s model with ch [0] = 1 is selected for the

power spectral density of the autocovariance sequence (Figure 2.12). MMSE
prediction based on estimates from SAGE significantly improves the predic-
tion accuracy over the biased heuristic estimator (Figure 3.12). For B = 20,
it is already close to the lower bound. The SAGE algorithm converges in 7
iterations (Figure 3.13). Figure 3.14 shows the prediction MSE versus the
number of observations B for fmax = 0.05: More than B = 10 observations
should be available to ensure reliable prediction, which outperforms simple
“Outdating”. A stationarity time of the channel over B = 10 observations is
sufficient to obtain reliable estimates: The theory and measurements by Matz
[140] indicates a stationarity time between 0.7 s and 2 s, which indicates that a
sufficient number of (correlated) channel observations is available in wireless
communications.
Since Clarke’s power spectral density is very close to the rectangular power
spectral density, which is the worst-case power spectral density of the min-
imax predictor, the performance of minimax prediction and prediction with
perfect knowledge is very similar (Figure 2.15), but depends on the knowledge
of fmax .
Scenario D, NP = 1: Now, we consider the band-pass power spectral den-
sity with ch [0] = 1 (Figure 2.12). We also give the minimax robust predictor
based on the class of power spectral density band-limited to fmax (fmax per-
fectly known) as reference. This minimax predictor is too conservative for this
scenario (Section 2.4.3.4) in case of moderate too high Doppler frequencies.
Note that its performance depends on an (approximate) knowledge of fmax .
MMSE prediction based on the heuristic estimator performs worse than the
minimax predictor for low fmax (Figure 3.15). But with the SAGE estimator
the MSE performance of the predictor follows closely the MSE for prediction
with perfect knowledge of the autocovariance sequence.
Fig. 3.15 MSE of channel prediction (Q = 5) vs. maximum frequency fmax (Sce-
nario D, bandpass power spectral density, ch [0] = 1, ce = 0.01, B = 20).
Scenario C and D, NP = 2: Above, we have considered spacing of the ob-

servations with period NP = 1. For a spacing with NP = 2, i.e., ℓi = NP i − 1
in (3.20), the ML problem for estimation of the samples of the autocovari-
ance sequence, which are required for prediction by one sample, is ill-posed.
We propose an optimization problem for minimax MSE or minimum norm
completion of the missing (odd) samples in Section 3.4. From Figure 3.16
we conclude that prediction quality for fmax ≤ 0.1 is equivalent to NP = 1
(Figure 3.12). But the complexity of the completion step is significant. For
fmax ≥ 0.25, the MSE of prediction is very close to its maximum of one.
And for fmax ≥ 0.2, the gap between outdating and MMSE prediction with
perfect knowledge is small and the SAGE estimator with a completion based
on B = 20 observations is not accurate enough to provide a gain over “Out-
dating”.
This is different for the band-pass power spectral density (Figure 3.17):
Prediction accuracy based on SAGE is superior to all other methods. Only for
small fmax , minimax prediction is an alternative (given approximate knowl-
edge of fmax ).
Estimation of Spatial and Temporal Channel Correlations
Finally, we consider estimation of spatial and temporal channel correlations

C hT together with the noise variance (N = 16). The noise variance cn = 0.1
(C n,s = cn I M ) is estimated with (3.67). The estimators for C hT are:
• The heuristic biased estimator of C hT is given by (3.86) (“MMSE (Heuris-
tic — Initialization)”)
Fig. 3.16 MSE of channel prediction (Q = 5) vs. maximum frequency fmax with
completion of autocovariance sequence for period NP = 2 of observations (Scenario C,
ch [0] = 1, ce = 0.01, B = 20).
Fig. 3.17 MSE of channel prediction (Q = 5) vs. maximum frequency fmax with
completion of autocovariance sequence for period NP = 2 of observations (Scenario D,
bandpass power spectral density, ch [0] = 1, ce = 0.01, B = 20).
• ML estimation is based on the SAGE algorithm (3.85) with imax = 10

iterations, B̄ = 2B−1, and the assumption of block Toeplitz C hT . (“MMSE
(SAGE)”)
• The SAGE algorithm is also employed with the same parameters, but
assuming the Kronecker model for C hT from (3.98). It estimates tempo-
ral and spatial correlations for every transmitter separately and combines
them according to (3.98) as described in Section 3.3.6 (“MMSE (SAGE &
Kronecker)”).
The estimation quality is measured by the average MSE of multivariate
MMSE prediction (Q = 5 and NP = 1, Section 2.3.1), which is obtained from
200 independent estimates. As upper limit, which should not be exceeded, we
Fig. 3.18 Normalized MSE of multivariate prediction (Q = 5) vs. maximum fre-

quency fmax (Scenario E, cn = 0.1).
choose the ML channel estimate based on the last observed training sequence
(“ML channel est. (Outdating)”). The lower bound is given by the MSE for
MMSE prediction with perfect knowledge of the covariance matrices.
Scenario E: Due to the high computational complexity for estimation of
C hT with block Toeplitz structure, we choose M = K = 4. The structure of
C hT is C hT = C T ⊗ C h (2.12), i.e., identical temporal correlations for all K
receivers. The block-diagonal C h is based on a Laplace angular power spec-
trum with spread σ = 5◦ and mean azimuth angles ϕ̄ = [−45◦ , −15◦ , 15◦ , 45◦ ]
(normalized to tr[C hT ] = M K). The temporal correlations are described by
Clarke’s power spectral density. We choose I = B = 20, i.e., 400 correlated
observations.
Although the number of observations is large, the performance degrada-
tion of SAGE with block Toeplitz assumption is significant and the gain over
the heuristic biased estimator too small to justify its high computational
complexity (Figure 3.18). But the assumption of C hT with Kronecker struc-
ture (3.98) improves the performance significantly and reduces the complexity
tremendously.
Chapter 4
Linear Precoding with Partial Channel
State Information
A wireless communication link from a transmitter with multiple antennas to

multiple receivers, which are decentralized and cannot cooperate, is a (vector)
broadcast channel (BC) from the point of view of information theory. Such
a point-to-multipoint scenario is also referred to as downlink in a cellular
wireless communication system (Figure 4.1).
If only one transmit antenna was available, the maximum sum capacity
would be achieved transmitting to the receiver with maximum signal-to-noise
ratio (SNR) in every time slot, i.e., by time division multiple access (TDMA)
with maximum throughput scheduling [83, p. 473]. But the additional degrees
of freedom resulting from multiple transmit antennas can be exploited to
improve the information rate: More than one receiver is served at the same
time and the data streams are (approximately) separated spatially (a.k.a.
space division multiple access (SDMA)). The sum capacity of the BC cannot
be achieved by linear precoding (see introduction to Chapter 5), but the
restriction to a linear structure Ttx (d[n]) = P d[n] (Figure 4.1) reduces the
computational complexity significantly [106].1
The design of a linear precoder (or beamformer) has attracted a lot of
attention during the previous ten years; it is impossible to give a complete
overview here. We list the main directions of research and refer to survey
articles with more detailed information. First we categorize them according
to their optimization criterion:
• Signal-to-interference-and-noise ratio (SINR) [19, 185]
• Mean square error (MSE) [116, 108]
• Information rate [83, 183]
• Various ad hoc criteria, e.g., least-squares (see references in [108]), appli-
cation of receive beamforming to precoding [19, 18].
1
Linear precoding is also referred to as beamforming, transmit processing, preequal-
ization (for complete CSI), or sometimes predistortion.
121
122 4 Linear Precoding with Partial Channel State Information
Fig. 4.1 Precoding for the wireless broadcast channel (BC) with multiple transmit
antennas and single antenna receivers.
They can also be distinguished by the channel state information (CSI), which
is assumed to be available at the transmitter (compare [19]): Complete CSI
(C-CSI), partial CSI (P-CSI), and statistical CSI (S-CSI) or channel distri-
bution information (see also Table 4.1 and Section 4.2 for our definitions).
For C-CSI, algorithms have been developed based on SINR, MSE, and infor-
mation rate. The case of P-CSI is mostly addressed for a quantized feedback
[107, 188]. Recently, sum rate is optimized in [31]. Robust optimization based
on MSE, which applies the Bayesian paradigm (Appendix C) for P-CSI, is
proposed in [53, 98, 58]. The standard approach related to SINR for S-CSI
is also surveyed in [19, 185]. A first step towards MSE-based optimization
for S-CSI which is based on receiver beamforming is introduced in [18]. A
similar idea is developed for sum MSE [112, 243] assuming a matched filter
receiver. A more systematic derivation for sum MSE is given in [58], which
include S-CSI as a special case.
A significant number of algorithms is based on the duality of the BC and
the (virtual) multiple access channel (MAC), which is proved for SINR in
[185] and for MSE (C-CSI) in [221, 146, 7].
Outline of Chapter
The system model for the broadcast channel and necessary (logical) training
channels is given in Section 4.1. After the definition of different degrees of
CSI at the transmitter and receiver in Section 4.2, we introduce the perfor-
mance measures for P-CSI in Section 4.3: The goal is to develop MSE-related
and mathematically tractable performance measures; we start from the point
of view of information theory to point out the relevance and deficiencies of
these practical optimization criteria. Our focus is the conceptual difference
between MSE and information rate; moreover, we develop approximations of
the BC which result in simple and suboptimum performance measures. We
also discuss adaptive linear precoding when only a common training chan-
nel is available, i.e., the receivers have incomplete CSI, which is important
4.1 System Model for the Broadcast Channel 123
Fig. 4.2 Time division duplex slot structure with alternating allocation of forward
link (FL) and reverse link (RL) time slots.
in practice. Algorithms for minimizing the sum MSE are introduced in Sec-
tion 4.4. They can be generalized to nonlinear precoding, for example. Two
dualities between the BC and the (virtual) MAC for P-CSI are developed in
Section 4.5. In Section 4.6, we briefly introduce optimization of linear pre-
coding under quality of service constraints, in particular, for systems with a
common training channel.
4.1 System Model for the Broadcast Channel
The communication links between a transmitter equipped with M antennas

and K single antenna receivers can be organized in a time division duplex
(TDD) mode. For notational simplicity, we assume that time slots (blocks)
with a duration of Tb alternate between the forward link (FL) and reverse
link (RL), i.e., are assigned with a period of 2Tb (NP = 2) in each direction
(Figure 4.2). The forward link from the transmitter to all receivers, which is
a.k.a. the downlink in a cellular communication system, consists of a (logical)
training channel with Nt training symbols followed by a (logical) data channel
with Nd data symbols.2 The physical channel is assumed constant during one
time slot (block) as in Section 2.1.3
Reciprocity of forward link and reverse link physical channels hk (Fig-
ure 4.3) is assumed, which allows the transmitter to obtain channel state
information (CSI) from observations of a training sequence in the reverse
link. The non-trivial aspects of hardware calibration to achieve (perfect)
reciprocity of the physical channel, which includes transmitter and receiver
hardware, are not considered here (e.g., see [87, 136]).
4.1.1 Forward Link Data Channel
In the forward link the modulated data dk [q tx , n], n ∈ {Nt + 1, . . . , Nt +

Nd }, for the kth receiver is transmitted in the q tx th time slot. The (linear)
precoder P is optimized for this time slot. The frequency flat channels (L = 0,
2
The slot duration is Tb = (Nd + Nt )Ts for a symbol period Ts .
3
The relative duration of the data to the training channel determines the throughput
penalty due to training.
Fig. 4.3 Forward link system model: Linear precoding with M transmit antennas to
K non-cooperative single-antenna receivers.
Fig. 4.4 System model for the forward link in matrix-vector notation.
Section 2.1) to all receivers are stacked in

 T  
h1 h1 [q tx ]T
 hT   h2 [q tx ]T 
 2△
H =  .  = H[q tx ] =   ∈ CK×M ,
 
.. (4.1)
 ..   . 
hT
K hK [q tx ]T
where hk ∈ CM contains the channel coefficients to the kth receiver (in the
complex baseband representation). The receive signal of the kth receiver for
the nth data symbol (n ∈ {Nt + 1, . . . , Nt + Nd }) is
K
X
rk [q tx , n] = hk [q tx ]T pk dk [q tx , n] + hk [q tx ]T pi di [q tx , n] + nk [q tx , n] (4.2)
i=1
i6=k
including additive white Gaussian noise nk [q tx , n].

In the sequel, we omit the transmit slot index q tx and symbol index n and
write the receive signal based on (4.1) as (Figure 4.3)
K
X
rk = hT T
k pk dk + hk pi di + nk
i=1 (4.3)
i6=k
= hT
k P d + nk .
Fig. 4.5 Forward link training channel in time slot q tx .
The linear precoders pk for all receivers are summarized in P =

[p1 , p2 , . . . , pK ] ∈ CM ×K . The modulated data symbols dk are modeled as un-
correlated, zero-mean random variables with variance one, i.e., E[dd H ] = I K
with d = [d1 , d2 , . . . , dK ]T . They are chosen from a finite modulation al-
phabet D such as QPSK or 16QAM in practice, whereas for more theoretical
and principal considerations we choose them as complex Gaussian distributed
random variables representing the coded messages. The additive noise is com-
plex Gaussian distributed with variance cnk = E[|nk |2 ]. Data and noise are
assumed to be uncorrelated.
The receivers cannot cooperate. Linear receivers gk ∈ C yield an estimate
of the transmitted symbols
d̂k = gk rk . (4.4)
In matrix-vector notation we have (Figure 4.4)
d̂ = Gr = GHP d + Gn ∈ CK (4.5)
with receive signal r = [r1 , r2 , . . . , rK ]T (equivalently for n) and the diagonal

receiver and noise covariance matrix
G = diag [g1 , g2 , . . . , gK ] and

C n = E[nn H ] = diag [cn1 , cn2 , . . . , cnK ] ,
respectively.
The precoder P is designed subject to a total power constraint

E[kP d k22 ] = tr P P H ≤ Ptx . (4.6)
4.1.2 Forward Link Training Channel
In the forward link, Nt training symbols sf [n] =

[sf,1 [n], sf,2 [n], . . . , sf,K [n]]T ∈ BK are transmitted at n ∈ {1, 2, . . . , Nt }
Fig. 4.6 Model of forward link training channel to receiver k for orthogonal training
sequences.
(Figure 4.5). For now, we assume one training sequence for each receiver.4
The training sequences are precoded with Q = [q 1 , . . . , q K ] ∈ CM ×K . For
example, Q can be chosen equal to P . The power constraint for the training
channel is
kQk2F ≤ Ptx
t
. (4.7)
The received training sequence
yf [q tx , n] = [yf,1 [q tx , n], yf,2 [q tx , n], . . . , yf,K [q tx , n]]T
at all receivers is
yf [q tx , n] = H[q tx ]Qsf [n] + n[q tx , n] ∈ CK . (4.8)
The noise random process is identical to the forward link data channel in (4.2)
and (4.5).
Omitting the current slot index q tx as above the signal at the kth receiver
reads
yf,k [n] = hT
k Qsf [n] + nk [n]. (4.9)
To introduce a model comprising all Nt training symbols we define the

kth row of S̄ f ∈ BK×Nt as sf,k = [sf,k [1], sf,k [2], . . . , sf,k [Nt ]]T ∈ BNt . With
y f,k = [yf,k [1], yf,k [2], . . . , yf,k [Nt ]]T (equivalently for nf,k ) we obtain
yT T T Nt
f,k = hk QS̄ f + nf,k ∈ C . (4.10)
H
Assuming orthogonal training sequences S̄ f S̄ f = Nt I K the receivers’ ML
channel estimate is (based on the corresponding model in Figure 4.6)
\T q = sH y T
(4.11)
hk k f,k f,k /Nt = hk q k + ek
with variance cek = cnk /Nt of the estimation error ek .
4
Due to constraints in a standardized system, the number of training sequences may
be limited (compare Section 4.2.2).
Fig. 4.7 Observations of training sequences from Q previous reverse link slots.
4.1.3 Reverse Link Training Channel
From our assumption of reciprocity of the physical channel described by H

(modeling propagation conditions as well as hardware), it follows for the
reverse link channel H RL [q] = H FL [q]T = H[q]T . The receive signal of the
nth training symbol s[n] ∈ BK from all K transmitters is
y [q, n] = H[q]T s[n] + v [q, n] ∈ CM (4.12)
with additive Gaussian noise v [q, n] ∼ Nc (0, C v ) (for example, C v = cn I M ).

We have N training symbols per reverse link time slot. The total receive
signal y [q] = [y [q, 1]T , y [q, 2]T , . . . , y [q, N ]T ]T is
y [q] = Sh[q] + v [q] ∈ CM N (4.13)
with S = [s[1], s[2], . . . , s[N ]]T ⊗ I M (as in Section 2.1) and h[q] =
vec H[q]T , where the channels hk [q] of all K terminals are stacked above
each other.
Due to our assumption of time slots which alternate between forward and
reverse link (Figure 4.2) at a period of NP = 2, observations y[q] from the
reverse link are available for q ∈ {q tx − 1, q tx − 3, . . . , q tx − (2Q − 1)} (for
some Q). All available observations about the forward link channel H[q tx ] at
time index q tx are summarized in
y T [q tx ] = S T hT [q tx ] + v T [q tx ] ∈ CQM N (4.14)
with y T [q tx ] = [y[q tx − 1]T , y[q tx − 3]T , . . . , y[q tx − (2Q − 1)]T ]T . Omitting

the transmit slot index we denote it
y T = S T hT + v T ∈ CQM N . (4.15)
This serves as our model to describe the transmitter’s CSI.

Category of CSI Knowledge of Characterization by
Complete CSI (C-CSI) channel realization h
Partial CSI (P-CSI) conditional channel PDF ph|yT (h|y T )
Statistical CSI (S-CSI) channel PDF ph (h)
Table 4.1 Categories of channel state information (CSI) at the transmitter and their
definitions.
4.2 Channel State Information at the Transmitter and

Receivers
The degree of knowledge about the system’s parameters is different at the

transmitter and receivers. The most important system parameter is the chan-
nel state H, i.e., the channel state information (CSI); but also the variances
cnk of the additive noise at the receivers (Figure 4.3) and the assumed signal
processing regime at the other side of the communication link P and {gk }Kk=1 ,
respectively, are important for optimizing signal processing. We will see be-
low that this information is not identical at the transmitter and receivers in
the considered forward link time slot q tx . This asymmetry in system knowl-
edge, e.g., CSI, has a significant impact on the design methodology for the
transmitter and receivers: The constrained knowledge has to be incorporated
into an optimization problem for the precoding parameters which should be
mathematically tractable.
4.2.1 Transmitter
We optimize the transmitter, i.e., P , in time slot q tx . Due to the time-variant

nature of the channel H[q] and the TDD slot structure (Figure 4.2), only in-
formation about previous channel realizations is available.5 The observations
from Q previous reverse link training periods are collected in y T (4.15).
The CSI available at the transmitter is described by the probability density
function (PDF) ph|yT (h|y T ) of h = h[q tx ] given y T (Table 4.1). If we have full
access to h via y T , we refer to it as complete CSI (C-CSI) at the transmitter:
It is available if C v = 0M ×M and the channel is time-invariant, i.e., has a
constant autocovariance function. The situation in which y T contains only
information about h which is incomplete is referred to as partial CSI (P-
5
The assumption of a TDD system and reciprocity simplifies the derivations below.
In principal, the derivations are also possible for frequency division duplex system
with a (quantized) feedback of the CSI available at the receivers to the transmitter.
4.2 Channel State Information at the Transmitter and Receivers 129
CSI). If yT is statistically independent of h, i.e., only knowledge about the

channel’s PDF ph (h) = ph|yT (h|y T ) is available, this is termed statistical
CSI (S-CSI) or also channel distribution information [83].
The common statistical assumptions about hT ∼ Nc (0, C hT ) and vT ∼
Nc (0, C vT ) yield a complex Gaussian conditional probability distribution
ph|yT (h|y T ).6
Additionally, we assume that the variances cnk of the noise processes at
the receivers are known to the transmitter.7
From its knowledge about the number of training symbols in the forward
link and the noise variances at the receivers the transmitter can infer the
receivers’ degree of CSI. Although the transmitter cannot control the re-
ceivers’ signal processing capabilities, it can be optimistic and assume, e.g.,
the Cramér-Rao lower bound for its estimation errors. When employing in-
formation theoretic measures to describe the system’s performance at the
transmitter, we make additional assumptions about the receivers’ decoding
capabilities.
4.2.2 Receivers
The kth receiver has only access to P and H via the received training
sequence in the forward link y f,k (4.10) and the observations of the data
rk (4.3). It is obvious that the variance crk of the received data signal can be
estimated directly as well as the overall channel hT k q k (4.11), which consists
of the physical channel to receiver k and the precoder for the kth training
sequence. With the methods from Chapter 3 also the sum of the noise and
interference variance is available.
Therefore, the receivers do not have access to P and H separately, but
only to a projection of the corresponding channel hk on a precoding vector
of the training channel q k . If the precoding for the data and training sig-
nals are identical, i.e., Q = P , we expect to have all relevant information
at a receiver for its signal processing, e.g., an MMSE receiver gk or ML de-
coding (depending on the conceptual framework in Section 4.3).8 But these
realizability constraints have to be checked when optimizing the receivers.
There are standardized systems, where the number of training sequences
S is smaller than the number of receivers. For example, in WCDMA of the
UMTS standard [1] different training (pilot) channels are available (see the
detailed discussion in [152, 167]):
6
The assumption about the noise is justified by the TDD model; in case of a low
rate quantized feedback it is not valid anymore.
7
In cellular systems with interference from adjacent cells which is not a stationary
process, this issue requires further care.
8
It may be inaccurate due to the finite number Nt of training symbols.
• A primary common training channel which is transmitted in the whole

cell sector (S = 1).
• A secondary common training channel which allows for multiple training
sequences (its number S is significantly less than the number of receivers).
• One dedicated training channel per receiver which is transmitted at a sig-
nificantly lower power than the primary common training channel.
If only one training sequence (S = 1) is available, it is shared by all receivers
(common training sequence). For example, it is precoded with a fixed precoder
q ∈ CM , which may be a “sector-beam” or omnidirectional via the first
antenna element (q = e1 ). Alternatively, a small number S < K of training
sequences, e.g., S = 4 as in Figure 4.13, is available; they are precoded with a
fixed (non-adaptive) precoder Q = [q 1 , q 2 , . . . , q S ]. For example, a fixed grid-
of-beams is chosen where every q i is an array steering vector (Section E) of
the antenna array corresponding to a spatial direction.
With a common training channel the kth receiver can only estimate hT k qk
and not hT p
k k as required for a coherent demodulation of the data sym-
bols. This incomplete CSI presents a limitation for the design of precoding,
because in general the transmitter cannot signal further information about
pk to the receivers. Optimization of the precoder P has to incorporate this
restriction of the receivers’ CSI, which we address in Sections 4.3.3 and 4.6.
4.3 Performance Measures for Partial Channel State

Information
When optimizing a precoder based on P-CSI at the transmitter, attention

has to be paid to the choice of the optimization criterion which measures the
transmission quality over the communication channel to every receiver. The
choice of a performance measure is influenced by several aspects, which are
partially in conflict:
• It should describe the transmission quality and the quality of service (QoS)
constraints of every receiver’s data.
• Its mathematical structure should be simple enough to be mathematically
tractable.
• The available CSI should be fully exploited, i.e., uncertainties in CSI are
modeled and considered.
• It needs to take into account restrictions on the receivers regarding their
processing capabilities and channel state information:
– explicitly by modeling the receivers’ processing capabilities or
– implicitly by creating precoding solutions, which are in accordance with
these restrictions.
ik nk
hT
k pk gk
rk
dk d̂k
Fig. 4.8 Link model for kth receiver in the broadcast channel (BC) (equivalent to
Figure 4.3).
The following argumentation starts with the point of view of information

theory. The achievable error-free information rate is given by the mutual
information between the coded data signal and received signals. Implicitly, it
assumes an ML receiver [40] for the transmitted code word of infinite length.
The information theoretic performance measures for the transmission quality
are difficult to handle analytically for P-CSI. We approximate these measures
simplifying the broadcast channel model (Figure 4.8).
A lower bound on the mean information rate is determined by the expected
minimum MSE in the estimates d̂k . The expected MSE is a criterion of its
own, but this relation shows its relevance for a communication system. It
assumes an MMSE receiver gk , which is optimum from the perspective of
estimation theory. This less ambitious receiver model may be considered more
practical.9 Also, note that the information rate is independent of the scaling
gk for the receive signal.
In a last step, we include restrictions on the receivers’ CSI explicitly. All
MSE-based optimization criteria are summarized and compared qualitatively
at the end regarding their accuracy-complexity trade-off.
4.3.1 Mean Information Rate, Mean MMSE, and

Mean SINR
Complete Channel State Information at the Transmitter
Figure 4.8 gives the link model to the kth receiver, which is equivalent to the
broadcast channel in (4.3) and Figure 4.3. The interference from the data
{di }i6=k designated to the other receivers is summarized in
K
X
ik = hT
k pi di , (4.16)
i=1
i6=k
which is a zero-mean random variable (di are zero-mean) with variance
9
For example, the MMSE determines also the bit error rate for QAM modulation
[160].
K
X
cik = |hT 2
k pi | . (4.17)
i=1
i6=k
We assume that dk are zero-mean complex Gaussian distributed with vari-

ance one, for simplicity.10 Therefore, for known hk the interference ik is also
complex Gaussian.
Under these constraints and for C-CSI at the transmitter and receivers
the mutual information I(dk ; rk ), i.e., the achievable information rate, of the
kth receiver’s communication link, which is an additive white Gaussian noise
(AWGN) channel, is well known and reads
Rk (P ; hk ) = I(dk ; rk ) = log2[1 + SINRk (P ; hk )] (4.18)
with
|hT
k pk |
2
SINRk (P ; hk ) = K
. (4.19)
P
|hT
k pi |
2 + cnk
i=1
i6=k
The maximum sum rate (throughput) is a characteristic point of the

region of achievable rates, on which we focus in the sequel. The sum rate
does not include fairness or quality of service constraints, but measures the
maximum achievable error-free data throughput in the system with channel
coding. It can be written as
K
"K #
X Y
Rsum (P ; h) = Rk (P ; hk ) = log2 (1 + SINRk (P ; hk )) (4.20)
k=1 k=1
" K
#
Y
= − log2 MMSEk (P ; hk )
k=1
" K
#
X
≥ K log2[K] − K log2 MMSEk (P ; hk ) (4.21)
k=1
in terms of
MMSEk (P ; hk ) = 1/(1 + SINRk (P ; hk )) (4.22)

and h = vec H T . The lower bound in (4.21) based on the sum MMSE
follows from the geometric-arithmetic mean inequality
10
The Gaussian probability distribution is capacity achieving in the broadcast chan-
nel with C-CSI at the transmitter, which requires dirty paper coding [232]. Under the
restriction of P-CSI it is not known whether this distribution is capacity achieving.
K
!1/K K
Y X .
MMSEk (P ; hk ) ≤ MMSEk (P ; hk ) K. (4.23)
k=1 k=1
The minimum of

MSEk (P , gk ; hk ) = Erk ,dk |gk rk − dk |2 = 1 + |gk |2 crk − 2Re gk hT
k pk
K
X
= 1 + |gk |2 |hT 2 2 T
k pi | + |gk | cnk − 2Re gk hk pk (4.24)
i=1
with variance of the receive signal crk = |hT 2

k pk | + cik + cnk is achieved
11
by
gk = gmmse
k (P , hk ) = (hT ∗
k pk ) /crk , (4.25)
which estimates the transmitted (coded) data dk . It reads
|hT
k pk |
2
MMSEk (P ; hk ) = min MSEk (P , gk ; hk ) = 1 − . (4.26)
gk crk
Example 4.1. An algorithm which maximizes (4.20) directly for C-CSI (see
also (4.28)) is given in [59].12 The comparison in [183] shows that other algo-
rithms perform similarly. In Figure 4.9 we compare the sum rate obtained by
[59] (“Max. sum rate”) with the solution for minimizing the sum MSE, i.e.,
maximizing the lower bound (4.21), given by [102]. As reference we give the
sum rate for time division multiple access (TDMA) with maximum through-
put (TP) scheduling where a matched filter pk = h∗k /khk k22 is applied for the
selected receiver; for M = 1, this strategy maximizes sum rate. Fairness is
not an issue in all schemes.
The results show that the sum MSE yields a performance close to max-
imizing the sum rate if K/M < 1. For K = M = 8, the difference is sub-
stantial: Minimizing the MSE does not yield the full multiplexing gain [106]
limPtx →∞ Rsum (P ; h)/ log2[Ptx ] = min(M, K) at high SNR.
The gain of both linear precoders compared to TDMA (with almost iden-
tical rate for K = 8 and K = 6), which schedules only one receiver at a time,
is substantial. ⊓
⊔
Partial Channel State Information at the Transmitter
So far our argumentation has been based on the assumption of C-CSI at

the transmitter. With P-CSI at the transmitter only the probability density
11
The LMMSE receiver is realizable for the assumptions in Section 4.2.2.
12
Note that (4.20) is not concave (or convex) in P and has multiple local maxima.
Fig. 4.9 Mean sum rate for linear precoding with C-CSI: Comparison of direct max-
imization of sum rate [59], minimization of sum MSE (4.69), and TDMA with maxi-
mum throughput (TP) scheduling (Scenario 1 with M = 8, cf. Section 4.4.4).
ph|yT (h|y T ) (Table 4.1) is available for designing P . There are two standard
approaches for dealing with this uncertainty in CSI:
• The description of the link by an outage rate and its outage probability,
which aims at the transmission of delay sensitive data.
• The mean or ergodic information rate with the expectation of (4.18) w.r.t.
the probability distribution of the unknown channel h and the observations
yT . Feasibility of this information rate requires a code long enough to
include many statistically independent realizations of h and yT .
Here, we aim at the mean information rate subject to an instantaneous power
constraint on every realization of P(y T ). We maximize the sum of the mean
information rates to all receivers
max Eh,yT [Rsum (P(yT ); h)] s.t. kP(y T )k2F ≤ Ptx . (4.27)
P
over all functions
P : CM N Q → CM ×K , P = P(y T ).
This is a variational problem. Because of the instantaneous power constraint,

its argument is identical to an optimization w.r.t. P ∈ CM ×K given y T
max Eh|yT [Rsum (P ; h)] s.t. kP k2F ≤ Ptx . (4.28)

P
The first difficulty lies in finding an explicit expression for

Eh|yT [Rsum (P ; h)], i.e., Ehk |yT [Rk (P ; hk )]. With (4.22) and Jensen’s
inequality the mean rate to every receiver can be lower bounded by the
mean MMSE and upper bounded by the mean SINR

− log2 Ehk |yT [MMSEk (P ; hk )] ≤ Ehk |yT [Rk (P ; hk )]
(4.29)
≤ log2 1 + Ehk |yT [SINRk (P ; hk )] .
Thus, Ehk |yT [MMSEk (P ; hk )] gives an inner bound on the mean rate re-
gion, which is certainly achievable, whereas the upper bound based on
Ehk |yT [SINRk (P ; hk )] does not guarantee achievability. With these bounds,
the mean sum rate is lower bounded by the mean sum of receivers’ MMSEs
and upper bounded by the sum rate corresponding to mean SINRs:
" K
#
X
K log2[K] − K log2 Ehk |yT [MMSEk (P ; hk )]
k=1
≤ Eh|yT [Rsum (P ; h)]
K
X
≤ log2 1 + Ehk |yT [SINRk (P ; hk )] . (4.30)
k=1
But the lower as well as the upper bound cannot be computed analytically
either, because MMSEk and SINRk are ratios of quadratic forms in complex
Gaussian random vectors hk and no explicit expression for their first order
moment has been found yet [161].13
The lower bound, although it may be loose, yields an information theoretic
justification for minimizing the conditional mean estimate of the sum MMSE
K
X
min Ehk |yT [MMSEk (P ; hk )] s.t. kP k2F ≤ Ptx . (4.31)
P
k=1
Optimization based on a conditional mean estimate of the cost function can

also be regarded as a robust optimization using the Bayesian paradigm (Ap-
pendix C).
The receivers’ statistical signal model defining the MSE, which results in
MMSEk (P ; hk ), is

dk ∗ ∗ 1 (hT k pk )∗
Edk ,rk d r
k k = T . (4.32)
rk hk pk crk
The LMMSE receiver corresponding to the lower bound assumes perfect

knowledge of hT
k pk and crk , which is available at the receiver for Q = P .
In summary, the mean (ergodic) information rate Ehk ,yT [Rk (P(yT ); hk )] as
performance measure is achievable under the following assumptions:
13
The probability distribution of the SINR [139] suggests that an analytical expres-
sion would not be tractable.
Fig. 4.10 Simplified AWGN fading BC model for the link to receiver k in Figure 4.8.
• Gaussian code books (coded data) of infinite length with rate equal to
Ehk ,yT [Rk (P(yT ); hk )],
• many statistically independent realizations of h and yT over the code
length,
• knowledge of hT k pk and cik + cnk at receiver k,
• receivers with ML decoding (or decoding based on typicality).
The conditional mean estimate Ehk |yT [MMSEk (P ; hk )] has a close relation to
the mean information rate. Thinking in the framework of estimation theory
(and not information theory), it assumes only LMMSE receivers.
4.3.2 Simplified Broadcast Channel Models: AWGN

Fading BC, AWGN BC, and BC with Scaled
Matched Filter Receivers
Now, we could pursue and search for approximations to

Ehk |yT [MMSEk (P ; hk )] or Ehk |yT [SINRk (P ; hk )], e.g., as in [161] for
the first order moment of a ratio of quadratic forms. We take a different
approach and approximate the BC model in Figure 4.8 in order to obtain
more tractable performance measures describing the approximated and
simplified model. Thus, we expect to gain some more intuition regarding the
value of the approximation.
The approximations are made w.r.t.
• the model of the interference (⇒ AWGN fading BC model),
• the model of the channel and the interference (⇒ AWGN BC model),
• and the transmitter’s model of the receiver gk (in the MSE framework).
AWGN Fading Broadcast Channel
In a first step, we choose a complex Gaussian model for the interference

ik (4.16) with same mean and variance, i.e., it has zero mean and variance
K
X K
X
c̄ik = Ehk |yT |hkT pi |2 = pH ∗
i Rhk |y T pi . (4.33)
i=1 i=1
i6=k i6=k
It is assumed to be statistically independent of the data signal hkT pk dk . With

P-CSI at the transmitter this yields a Rice fading channel with AWGN, where
hT
k pk and c̄ik + cnk are perfectly known to the kth receiver (Figure 4.10). The
mean information rate of this channel is bounded by (cf. (4.29))

− log2 Ehk |yT MMSEFAWGN
k (P ; hk ) ≤ Ehk |yT RFAWGN
k (P ; hk )
AWGN
≤ Rk = log2 1 + SINRk (P ) (4.34)
according to Jensen’s inequality and with definitions

|hTk pk |
2
RFAWGN
k (P ; h k ) = log 2 1 +
c̄ik + cnk
|hTk pk |
2
SINRFAWGN
k (P ; hk ) = (4.35)
c̄ik + cnk
|hT
k pk |
2
MMSEFAWGN
k (P ; hk ) = 1 − .
|hT
k pk |
2 + c̄ik + cnk
It is interesting that the upper bound depends on the widely employed SINR
definition14 (e.g. [19, 185])

Ehk |yT |hkT pk |2 pH ∗
k Rhk |y T pk
SINRk (P ) = K = , (4.36)
P T
2
c̄ik + cnk
Ehk |yT |hk pi | + cnk
i=1
i6=k
where the expected value is taken in the numerator and denominator of (4.19)
independently.15
For S-CSI with µh = 0, we have Rayleigh fading channels to every receiver.
Their mean information rate can be computed explicitly16

Ehk |yT RFAWGN
k (P ; hk ) = E1 SINRk (P )−1 exp SINRk (P )−1 log2[e] .
Again the mean rate is determined by SINRk .17 The dependency on only
SINRk has its origin in our Gaussian model for the interference, which is un-
14
Typically, no reasoning is given in the literature about the relevance of this rather
intuitive performance measure for describing the broadcast channel in Figure 4.8.
15
This corresponds to the approximation of the first order moment for the SINR
based on [161], which is accurate for P-CSI close to C-CSI (small error covariance
matrix C h|yT ).
16
The exponential integral is defined as [2]
Z ∞ exp(−t)
E1 (x) = dt. (4.37)
x t
17
Similarly, under the same assumption, the mean MMSE can be computed explicitly
Fig. 4.11 Simplified AWGN BC model for the link to receiver k in Figure 4.8.
correlated from the received data signal. But due to the exponential integral
the explicit expression of the mean rate for S-CSI is not very appealing for
optimization.
The minimum MMSEFAWGN k (P ; hk ) (4.35) relies on the statistical signal
model

dk ∗ ∗ 1 (hT k pk )∗
Edk ,rk dk rk = T 0 (4.38)
rk hk pk |hT 2
k pk | + c̄ik + cnk
of the receiver. This is an approximation to the original LMMSE receiver’s

model (4.32) with a mean variance of the interference.
AWGN Broadcast Channel
As the performance measures for the AWGN fading BC model (Fig-

ure 4.10) can only be evaluated numerically, we also make the Gaussian
T
approximation
T 2 for the data signal hk pk dk : It is zero-mean with variance
Ehk |yT |hk pk | because of E[dk ] = 0. This approximation yields the AWGN
BC model in Figure 4.11 with channel gain given by the standard devia-
1/2
tion Ehk |yT |hkT pk |2 . It is a system model based on equivalent Gaussian
distributed signals having identical first and second order moments as the
signals in Figure 4.8.18
The information rate for the AWGN BC model is
AWGN
Rk (P ) = log2 1 + SINRk (P ) (4.39)
with SINRk (P ) as defined in (4.36).

Switching again to the estimation theoretic perspective, the receiver gk
minimizing

MSEk (P , gk ) = Erk ,dk |gk rk − dk |2 (4.40)
h i “ ” “ ”
Ehk |yT MMSEFAWGN
k (P ; hk ) = SINRk (P )−1 E1 SINRk (P )−1 exp SINRk (P )−1
as a function of 1/SINRk (P ).
18
Similar approximations are made for system level considerations in a commu-
nication network. In [112, 243] a similar idea is pursued for optimization of linear
precoding for rank-one covariance matrices C hk .
based on the receivers’ statistical signal model

 q T 
1 E |h p |2
dk hk |y T k k
Edk ,rk dk∗ rk∗ = q T , (4.41)
rk Ehk |yT |hk pk |2 c̄rk

where c̄rk = Ehk |yT |hkT pk |2 + c̄ik + cnk , is
q
Ehk |yT |hkT pk |2
gk = . (4.42)
c̄rk
It yields the minimum

Ehk |yT |hkT pk |2
MMSEk (P ) = min MSEk (P , gk ) = 1 −
gk c̄rk
1
= (4.43)
1 + SINRk (P )
with a relation between MMSEk and SINRk in analogy to (4.22) for C-CSI
at the transmitter.
Back at the information theoretic point of view, maximization of the sum
rate for the AWGN BC model
K
X AWGN
max Rk (P ) s.t. kP k2F ≤ Ptx (4.44)
P
k=1
can be lower bounded using (4.23)

" K
# K
X X AWGN
K log2[K] − K log2 MMSEk (P ) ≤ Rk (P ). (4.45)
k=1 k=1
Therefore, maximization of the lower bound is equivalent to minimization of

the sum MSE for this model:
K
X
min MMSEk (P ) s.t. kP k2F ≤ Ptx . (4.46)
P
k=1
Irrespective of its relation to information rate, precoder design based on MSE

is interesting on its own. The cost function is not convex,19 but given in closed
form. Moreover, it is closely related to the performance measure SINRk , which
is standard in the literature, via (4.43).
19
For example, it is not convex in pk keeping pi , i 6= k, constant.
Fig. 4.12 Bounds on mean sum rate evaluated for the solution of (4.46) based on S-
CSI with the algorithm from Section 4.4.1: Upper bound based on mean SINR (4.30),
lower bound with mean sum MMSE (4.30), and the sum rate of the AWGN BC
model (4.44) (Scenario 1, M = K = 8 and σ = 5◦ ).
The relation of the information rate for the AWGN BC model to the
rate for the original model in Figure 4.8 is unclear. Numerical evaluation of
PK AWGN
the sum rate shows that it is k=1 Rk (P ) typically an upper bound to
Eh,yT [Rsum (P(yT ); h)] (4.27).
Example 4.2. In Figure 4.12 we evaluate the different bounds on the mean
sum rate (4.20) numerically for Scenario 1 defined in Section 4.4.4.20 As
precoder P we choose the solution given in Section 4.4.1 to problem (4.46)
for S-CSI. The sum rate for mean SINR (4.30) (upper bound) and the rate for
mean sum MMSE (4.30) (lower bound) are a good approximation for small
SNR. The sum rate of the AWGN BC model (4.44) is also close to the true
mean sum rate for small SNR, but the good approximation quality at high
SNR depends on the selected P . ⊓
⊔
System Model with Scaled Matched Filter Receivers
The previous two models result from a Gaussian approximation of the inter-
ference and data signal. But there is also the possibility of assuming a more
limited CSI at the receiver or receivers with limited signal processing capabil-
ities. Following this concept, we construct an alternative approach to obtain
a solution of (4.46) (cf. Section 4.4). It can be viewed as a mere mathematical
trick, which is motivated by a relevant receiver. Let us start presenting some
intuitive ideas.
20
The bounds for the AWGN fading BC are not shown, because they are of minor
importance in the sequel.
A common low-complexity receiver is a matched filter, which is matched

to the data channel. The scalar matched filter, which is matched to hTk pk
and scaled by the mean variance of the received signal, is21
(hT
k pk )
∗
gk = gcor
k (P , hk ) = . (4.47)
c̄rk
It is the solution of the optimization problem

max 2Re gk hT 2
k pk − |gk | c̄rk (4.48)
gk
maximizing the real part of the crosscorrelation between d̂k and dk with a
regularization of the norm of gk . This is equivalent to minimizing

CORk (P , gk ; hk ) = 1 + |gk |2 c̄rk − 2Re gk hT
k pk , (4.49)
which yields
|hT
k pk |
2
MCORk (P ; hk ) = min CORk (P , gk ; hk ) = 1 − . (4.50)
gk c̄rk
Note that the similarity (convexity and structure) of CORk (P , gk ; hk ) with

MSEk (P , gk ; hk ) (4.24) is intended, but MCORk (P ; hk ) is often negative.22
But on average its minimum is positive, because of

Ehk |yT |hT k pk |
2
MMSEk (P ) = Ehk |yT [MCORk (P ; hk )] = 1 − ≥ 0. (4.52)
c̄rk
This relation and the similarity of CORk (P , gk ; hk ) with MSEk (P , gk ; hk )

can be exploited to find algorithms for solving (4.46) (Section 4.4) or to
prove a duality (Section 4.5).
Moreover, CORk (P , gk ; hk ) is a cost function, which yields a practically
reasonable receiver (4.47) at its minimum. Assuming a receiver based on
the scaled matched filter (MF), the transmitter can minimize the condi-
tional mean estimate Ehk |yT [MCORk (P ; hk )] to improve the resulting re-
21
It is realizable under the conditions of Section 4.2.2.
22
It is negative whenever
1 (hT ∗
» –
k pk ) (4.51)
T
hk pk c̄rk
is indefinite (if c̄rk < |hT 2

k pk | ), i.e., is not a valid covariance matrix of the signals dk
and rk . This is less likely for small SINRk at S-CSI or for P-CSI which is close to
C-CSI.
ceiver cost.23 But this argument is of limited interest, because the relevance
of CORk (P , gk ; hk ) for a communication link remains unclear: Only for large
Ptx and C hk with rank one, it yields the identical cost as the MMSE receiver.
4.3.3 Modeling of Receivers’ Incomplete Channel State

Information
Generally, the receivers’ CSI can be regarded as complete relative to the

uncertainty in CSI at the transmitter, because considerably more accurate
information about hT k pk , cnk + cik , and crk is available to the kth receiver
than to the transmitter about H. But this is only true where the precoding
for the training channel is Q = P and a sufficient number of training symbols
Nt is provided.
Knowing the number of training symbols Nt , the noise variances at the
receiver cnk , and the precoder Q for the training channel, the transmitter has
a rather accurate model of the receivers’ CSI (cf. Section 4.2). This can be
incorporated in the transmitter’s cost function for optimizing P .
Information theoretic results for dealing with incomplete CSI at the re-
ceiver provide lower bounds on the information rate [143, 90]. Instead of
working with these lower bounds we discuss MSE-based cost functions to
describe the receivers’ signal processing capabilities.
Assume that the CSI at the kth receiver is given by the observation
y f,k (4.10) of the forward link training sequence and the conditional dis-
tribution phk |yf,k (hk |y f,k ). The receiver’s signal processing gk can be based
on the conditional mean estimate of the MSE (4.24)
K
X
Ehk |yf,k [MSEk (P , gk ; hk )] = 1 + |gk |2 pH ∗ 2
i Rhk |y f,k pi + |gk | cnk
i=1
T

− 2Re gk ĥf,k pk , (4.53)
H
where ĥf,k = Ehk |yf,k [hk ] and Rhk |yf,k = ĥf,k ĥf,k + C hk |yf,k . The minimum
cost which can be achieved by the receiver
T
(ĥf,k pk )∗
gk = P T
(4.54)
K
i=1 (|ĥf,k pi |
2 + pH ∗
i C hk |y f,k pi ) + cnk
is
23
Compared to the implicit assumption of an LMMSE receiver (4.25) in (4.31), less
information about rk is exploited by (4.47)
Fig. 4.13 Spatial characteristics of different fixed precoders Q (Q = q for a sector-

beam and Q = [q 1 , q 2 , q 3 , q 4 ] for a fixed grid-of-beams) for the forward link training
channel in comparison with (adaptive) linear precoding P = [p1 , p2 ] in the data
channel. (The spatial structure of the wireless channel is symbolized by the shaded
area.)
MMSEicsi
k (P ; y f,k ) = min Ehk |y f,k [MSEk (P , gk ; hk )]
gk
1
= . (4.55)
1+ SINRicsi
k (P ; y f,k )
The SINR definition

T
|ĥf,k pk |2
SINRicsi
k (P ; y f,k ) = K
P T
|ĥf,k pi |2 + pH
i C ∗
hk |y f,k pi + pH ∗
k C hk |y f,k pk + cnk
i=1
i6=k
(4.56)
now includes the error covariance matrix C hk |yf,k for the receiver’s CSI in
hk .24
Because the transmitter knows the distribution of yf,k (or ĥf,k ), it can
compute the mean of MMSEicsi k (P ; yf,k ) w.r.t. this distribution. As before,
an explicit expression for the conditional expectation does not exist and an
approximation similar to the AWGN BC (Figure 4.11) or based on a scaled
MF receiver can be found. The latter is equivalent to taking the conditional
expectation of the numerator and denominator in SINRicsi k separately.
Incomplete Channel State Information Due to Common Training

Channel
An example for incomplete CSI at the receiver is a system with a common

training channel (Section 4.2.2) if the precoding of the data and training
signals is different, i.e., P 6= Q. Typically, also a dedicated training channel
is available, but the transmit power in the common training channels is sig-
nificantly larger than in the dedicated training channel. Therefore, channel
estimates based on the former channels are more reliable. For this reason,
the transmitter has to assume that a receiver mainly uses the CSI from the
common training channel in practice (UMTS-FDD standard [1]).
At first glance, this observation requires transmission of the data over the
same precoder Q as the training channel (P = Q) to provide the receivers
with complete CSI. But this is not necessary, if the receivers’ incomplete
CSI is modeled appropriately and taken into account for optimization of an
adaptive precoder P . In [152] it is shown that such approaches still result in
a significantly increased cell capacity.
We restrict the derivations to S-CSI, which is the relevant assumption for
UMTS-FDD [1]. For this section, we assume Rice fading µh 6= 0 to elaborate
on the differences between two approaches.
The following argumentation serves as an example for the construction of
optimization criteria based on the concepts in the previous subsections.
As observation y f,k we choose the noise free channel estimate from (4.11)
based on the sth precoder q s
yf,k = hT
k qs . (4.57)
Given yf,k , the conditional mean estimate of hk and its error covariance ma-
trix required for the cost function (4.55) are

ĥf,k = µhk + C hk q ∗s (q T ∗ −1
s C hk q s ) hT T
k q s − µhk q s (4.58)
C hk q ∗s q T
s C hk
C hk |yf,k = C hk − . (4.59)
qT
s C ∗
hk q s
Because the expected value of (4.55) cannot be obtained explicitly, we

simplify the receiver model to a scaled MF (Section 4.3.2). It is the solution
of25
24
Alternatively, we could also consider an LMMSE receiver based on a ML channel
estimate (4.11) with zero-mean estimation error and given error variance.
25
Note that Eyf,k [Rhk |yf,k ] = Rhk .
K
X
min CORicsi
k (P ; yf,k ) = min 1 + |gk |
2
pH ∗ 2
i Rhk pi + |gk | cnk
gk gk
i=1
T

− 2Re gk ĥf,k pk . (4.60)
The mean of its minimum reads

" #
T
|ĥf,k pk |2 1
Eyf,k MCORicsi
k (P ; yf,k ) = Eyf,k 1− = icsi
(4.61)
c̄rk 1 + SINRk (P )
with
icsi
pH ∗
k Rĥ pk
f,k
SINRk (P ) = PK , (4.62)
H ∗ H ∗
i=1 pi Rhk pi − pk Rĥ pk + cnk f,k
which is identical to taking the expectation in the numerator and denomina-

tor of (4.56) separately, and
C hk q ∗s q T
s C hk
Rĥf,k = µhk µH
hk + . (4.63)
qT
s C ∗
hk q s
For µh = 0, this SINR definition is equivalent to [36]. The general case yields
the rank-two correlation matrix Rĥf,k for the kth receiver’s channel.
Instead of a receiver model with a scaled MF and the conditional mean es-
timate (4.58), we suggest another approach: Choose a receiver model, whose
first part depends only on the available CSI (4.57) and the second component
is a scaling independent of the channel realization hk . We are free to choose
a suitable model for the CSI-dependent component. Motivated by standard-
ization (UMTS FDD mode [1]), the available CSI (4.57) from the common
training channel is only utilized to compensate the phase. This results in the
receiver model
(hkT q s )∗
gk (hk ) = βk . (4.64)
|hkT q s |
The scaling βk models a slow automatic gain control at the receiver, which
is optimized together with the precoder P based on S-CSI. The relevant cost
function at the transmitter is the mean of the MSE (4.24)26 resulting from
this receiver model
26
For rank-one channels and CDMA, a sum MSE criterion is developed in [244] based
on an average signal model. An ad-hoc extension to general channels is presented in
[243, 25].
K
!
(hkT q s )∗ X
Ehk MSEk P , βk ; hk = 1 + |βk |2 pH ∗
i Rhk pi + cnk
|hkT q s | i=1
T !
(hkT q s )∗
− 2Re βk Ehk hk pk . (4.65)
|hkT q s |
The expectation of the phase compensation and the channel hk can be com-
puted analytically as shown in Appendix D.2.
Minimizing (4.65) w.r.t. βk yields

(h T q s )∗ 1
min Ehk MSEk P , kT βk ; hk = phase
(4.66)
βk |hk q s | 1 + SINRk (P )
with
h iT 2
(hkT q s )∗
Ehk h
|hkT q s | k
pk
phase
SINRk (P ) = 2 . (4.67)
K h iT
P (hkT q s )∗
pH ∗
i Rhk pi − Ehk h
|hkT q s | k
pk + cnk
i=1
It differs from (4.62): The matrix substituting the correlation matrix of the
kth data signal’s channel in the numerator of (4.67) is of rank one (compare
to (4.62) and (4.63)). For µh = 0, we get
T 2
(hkT q s )∗ π T 2
Ehk hk pk = (q C hk q ∗s )−1 q H ∗
s C hk pk
|hkT q s | 4 s
instead of (q T ∗ −1 H ∗
s C hk q s ) |q s C hk pk |2 in (4.62) (cf. Appendix D.2). The differ-
ence is the factor π/4 ≈ 0.79 which results from a larger error in CSI at the
receiver for only a phase correction.
This impact of the receiver model on the transmitter’s SINR-definition is
illustrated further in the following example.
Example 4.3. For a zero-mean channel with rank[C hk ] = 1, the error co-
variance matrix (4.59) for the conditional mean estimator is zero, because
the channel can be estimated perfectly by the receiver.27 Thus, (4.63) con-
verges to Rhk and the transmitter’s SINR definition (4.62) becomes iden-
tical to (4.36). On the other hand, for only a phase compensation at the
receiver, the data signal’s channel is always modeled as rank-one (cf. numer-
phase
ator of (4.67)) and SINRk (P ) is not identical to (4.36) in general. ⊓
⊔
The choice of an optimization criterion for precoding with a common train-
ing channel depends on the detailed requirements of the application, i.e., the
27
Note that the observation in (4.57) is noise-free. Moreover, we assume that q s is not
orthogonal to the eigenvector of C hk which corresponds to the non-zero eigenvalue.
most adequate model for the receiver. Moreover, rank-one matrices describ-
ing the desired receiver’s channel as in the numerator of (4.67) reduce the
computational complexity (Section 4.6.2 and [19]).
4.3.4 Summary of Sum Mean Square Error

Performance Measures
Above, we show the relevance of MSE for the achievable data rate to every
receiver. In particular, the sum of MSEs for all receivers gives a lower bound
on the sum information rate (or throughput) to all receivers. Although it may
be loose, this establishes the relevance of MSE for a communication system as
an alternative performance measure to SINR. Moreover, defining simplified
models for the BC (Section 4.3.2) we illustrated the relevance and intuition
involved when using the suboptimum performance measure SINRk (4.36)
for P-CSI and S-CSI: On the one hand, it can be seen as a heuristic with
similar mathematical structure as SINRk , on the other hand it describes the
data rates for the fading AWGN and AWGN BC (Section 4.3.2). Numerical
evaluation of the mean sum rate shows that the resulting suboptimum but
analytically more tractable performance measures are good approximations
for small SNR and small K/M < 1 whereas they are more loose in the
interference-limited region (Figure 4.12).
The derived cost functions Ehk |yT [MMSEk (P ; hk )] and MMSEk (P )
(equivalently SINRk (P )) are related to the achievable data rate on the consid-
ered link. But an optimization of precoding based on a linear receiver model
and the MSE is interesting on its own right. Optimization based on the MSE
can be generalized to frequency selective channels with linear finite impulse
response precoding [109, 113], multiple antennas at the receivers [146], and
nonlinear precoding.
As an example, we now consider the sum MSEs (for these definitions).
We compare the different possibilities for formulating optimization problems
and, whenever existing, refer to the proposed algorithms in the literature.
Other important optimization problems are: Minimization of the transmit
power under quality of service (QoS) requirements, which are based on the
MSE expression [146]; Minimization of the maximum MSE (balancing of
MSEs) among all receivers [101, 189, 146]. They are addressed in Section 4.6.
Complete Channel State Information at the Transmitter
Minimization of the sum of MMSEk (P ; hk ) (4.26), i.e., maximization of a

lower bound to the sum rate (4.20), reads
K
X
min MMSEk (P ; hk ) s.t. kP k2F ≤ Ptx . (4.68)
P
k=1
This is equivalent to a joint minimization of the sum of

MSEk (P , gk ; hk ) (4.24) with respect to the precoder P ∈ CM ×K and
the linear receivers gk ∈ C
K
X
min MSEk (P , gk ; hk ) s.t. kP k2F ≤ Ptx . (4.69)
P ,{gk }K
k=1 k=1
When minimizing w.r.t. gk first, we obtain (4.68).

This problem was first addressed by Joham [110, 111, 116] for equal re-
ceivers gk = β, which yields an explicit solution derived in Appendix D.1. The
generalization is presented in [102]: There, it is shown that an optimization
following the principle of alternating variables (here: alternating between P
and {gk }Kk=1 ) converges to the global optimum. A solution exploiting SINR
duality between MAC and BC is proposed in [189]; a similar and compu-
tationally very efficient approach is followed in [7, 146] using a direct MSE
duality.
The balancing of MSEs is addressed by [189, 101, 146], which is equivalent
to a balancing of SINRk [19, 185] because of (4.22).
Partial Channel State Information at the Transmitter
An ad hoc approach to linear precoding with P-CSI at the transmitter is

based on (4.69) substituting hk by its channel estimate or the predicted
channel parameters ĥk (Chapter 2):
K
X
min MSEk (P , gk ; ĥk ) s.t. kP k2F ≤ Ptx . (4.70)
P ,{gk }K
k=1 k=1
To solve it the same algorithms as for C-CSI can be applied.

Implicitly, two assumptions are made: The CSI at the transmitter and
receivers is identical and the errors in ĥk are small (see Appendix C). Cer-
tainly, the first assumption is violated, because the receivers do not have to
deal with outdated CSI.28 Moreover, as the mobility and time-variance in the
system increases the size of the prediction error increases and asymptotically
only S-CSI is available at the transmitter.
The assumption of small errors is avoided performing a conditional mean
estimate of the MSE given the observations y T [53]
28
Assuming a block-wise constant channel.
K
X
min Ehk |yT [MSEk (P , gk ; hk )] s.t. kP k2F ≤ Ptx . (4.71)
P ,{gk }K
k=1 k=1
Still identical CSI at transmitter and receivers is assumed. Asymptotically,

for S-CSI and E[hk ] = 0 it results in the trivial solution P = 0M ×K [25, 58]
(as for (4.70)), because the conditional mean estimate of h is ĥ = 0P .
The asymmetry in CSI between transmitter and receivers can be included
modeling the receivers’ CSI explicitly [58, 50]. As described in the previous
sections, this is achieved in the most natural way within the MSE framework
minimizing the MSE w.r.t. the receivers’ processing, which is a function of hk
(cf. (4.25)): MMSEk (P ; hk ) = mingk MSEk (P , gk ; hk ). Then, the transmitter
is optimized based on the conditional mean estimate of MMSEk (P ; hk ) (4.73)
K
X
P
k=1
The information theoretic implications are discussed in Section 4.3.1.

To reduce the complexity of this approach, we approximate the system
model as in Figure 4.11 and maximize a lower bound to the corresponding
mean sum rate (4.45)
K
X
min MMSEk (P ) s.t. kP k2F ≤ Ptx . (4.73)
P
k=1
The relation of MMSEk to CORk and the conditional mean of MCORk (4.52)
can be exploited to develop an algorithm for solving this problem, which we
address together with (4.72) in Section 4.4.
Incomplete Channel State Information at Receivers
For incomplete CSI at the receivers due to a common training channel, we

(h T q )∗
model the phase compensation at the receiver by gk (hk ) = |hkT qss | βk (4.64).
k
Minimization of the sum MSE with a common scalar βk = β
K
X (h T q s )∗
min Ehk MSEk P , kT β; hk s.t. kP k2F ≤ Ptx (4.74)
P ,β
k=1
|hk q s |
is proposed in [58, 7]. A solution can be given directly with Appendix D.1
(see also [116]).
For the SINR definition (4.62), power minimization is addressed in [36].
Balancing of the MSEs (4.65) and power minimization which is based on MSE
duality and different receivers βk is treated in [7, 8] for phase compensation
at the receiver. We present a new proof for the MSE duality in Section 4.5.2
and briefly address the power minimization and the minimax MSE problems
(balancing) in Section 4.6.
4.4 Optimization Based on the Sum Mean Square Error
In Section 4.3 we state that the mean MMSE gives a lower bound on the
mean information rate (4.29) and the mean sum rate can be lower bounded
by the sum of mean MMSEs (4.30). Maximization of this lower bound with
a constraint on the total transmit power reads (compare (4.31) and (4.72))
K
X
P
k=1
An explicit expression of the conditional expectation is not available. There-

fore, we introduced the AWGN model for the broadcast channel (Section 4.3.2
and Figure 4.11). Maximization of the lower bound of its sum rate is equiv-
alent to (compare (4.46) and (4.73))
K
X
min MMSEk (P ) s.t. kP k2F ≤ Ptx , (4.76)
P
k=1
where MMSEk (P ) is given in (4.43).

An analytical solution of both optimization problems, e.g., using KKT
conditions, seems to be difficult. An alternative approach could be based on
the BC-MAC dualities in Section 4.5, which we do not address here.29 Instead
we propose iterative solutions, which give insights into the optimum solution,
but converge to stationarity points, i.e., not the global optimum in general.
The underlying principle for these iterative solutions can be generalized to
other systems, e.g., for nonlinear precoding (Section 5.3) and receivers with
multiple antennas.

Transmitter
The algorithm exploits the observation that MMSEk (P ; hk ) and MMSEk (P )

result from models for the linear receivers gk and the corresponding
cost functions: The LMMSE receiver gk = gmmse k (P (m) ; hk ) (4.25) min-
29
Note that this relation has the potential for different and, maybe, computationally
more efficient solutions.
4.4 Optimization Based on the Sum Mean Square Error 151
imizes MSEk (P (m) , gk ; hk ) (4.24) and the scaled matched filter gk =

gcor
k (P
(m)
; hk ) (4.47) minimizes CORk (P (m) , gk ; hk ) (4.49) for given P (m) .
We would like to alternate between optimization of the linear precoder P
(m−1) K
for fixed receivers {gk }k=1 from the previous iteration m − 1 and vice
(0)
versa given an initialization {gk }K k=1 . To obtain an explicit solution for P
we introduce an extended receiver model gk = βgk (P ; hk ) [116, 102]: The
coherent part gk (P ; hk ) depends on the current channel realization hk ; the
new component is a scaling β which is common to all receivers and assumed
to depend on the same CSI available at the transmitter. Formally, β adds an
additional degree of freedom which enables an explicit solution for P , if we
optimize P jointly with β given gk (hk ). In the end, we show that β → 1 after
convergence of the iterative algorithm, i.e., the influence of β disappears in
the solution.
For C-CSI, it is proved in [102, 99] that this alternating procedure reaches
the global optimum (if not initialized at the boundary), although the conver-
gence rate may be slow.
To ensure convergence of this alternating optimization it is necessary to
optimize precoder and receivers consistently, i.e., based on the same type of
cost function. Although only convergence to a stationary point is ensured for
P-CSI, the advantages of an alternating optimization are as follows:
• Explicit solutions are obtained, which allow for further insights at the
optimum.
• This paradigm can also be applied to nonlinear precoding (Chapter 5).
• An improvement over any given sub-optimum solution, which is used as
initialization, is achieved.
Optimization Problem (4.75)
(m−1) (m−1)
Assuming {gk }K
k=1 with gk = gmmse
k (P (m−1) ; hk ) (4.25) for the re-
ceivers given by the previous iteration m − 1, the optimum precoder together
with the common scaling β at the receivers is obtained from
K
X h i
min Ehk |yT MSEk (P , β gmmse
k (P (m−1)
; hk ); h k ) s.t. kP k2F ≤ Ptx .
P ,β
k=1
(4.77)
This cost function can be written explicitly as

K
X h i
Ehk |yT MSEk (P , β gmmse
k (P (m−1)
; h k ); h k )
k=1
K
X
2

mmse (m−1)
= Ehk |yT Erk ,dk β gk (P ; hk )rk − dk (4.78)
k=1
which yields
K
" K
!#
X X
2
K+ Ehk |yT |β| |gmmse
k (P (m−1) ; hk )|2 |hkT pi |2 + cnk
k=1 i=1
K
X h i
− Ehk |yT 2Re β gmmse
k (P (m−1) ; hk )hkT pk
k=1
h h i i
2 (m−1)
= K + |β| Ḡ + |β|2 tr P H Eh|yT H H G(m−1)(h)H G(m−1)(h)H P
h i
(m−1)
− 2Re β tr H G P . (4.79)
It depends on the conditional mean estimate of the effective channel

(m−1)
HG = Eh|yT G(m−1)(h)H with receivers
h i
G(m−1)(h) = diag gmmse
1 (P (m−1)
; h 1 ), . . . , g mmse
K (P (m−1)
; h K )
and its Gram matrix

h i
R(m−1) = Eh|yT H H G(m−1)(h)H G(m−1)(h)H , (4.80)
which cannot be computed in closed form. A numerical approximation of the

multidimensional integrals can be performed in every step, which is another
advantage of this alternating procedure.
The mean sum of noise variances after the receive filters with mean (power)
(m−1)
gain Gk = Ehk |yT [|gmmse
k (P (m−1) ; hk )|2 ] is
K
X
(m−1) (m−1)
Ḡ = cnk Gk . (4.81)
k=1
In Appendix D.1 the solution of (4.77) is derived as (cf. (D.13))

!−1
Ḡ(m−1) h iH
P (m)
=β (m),−1
R (m−1)
+ IM Eh|yT G(m−1)(h)H (4.82)
Ptx
with the real-valued scaling β (m) in P (m)

−2
(m−1),H Ḡ(m−1) (m−1)
tr H G R(m−1) + Ptx I M HG
(m),2
β = (4.83)
Ptx
to ensure the transmit power constraint with equality.
Given the precoder P (m) (4.82), the optimum receivers based on instan-
taneous CSI
(m)
(m) (hT
k pk )
∗
gk = gmmse
k (P (m) ; hk ) = β (m),−1 PK (m) 2
(4.84)
i=1 |hT
k pi | + cnk
achieve the minimum MSE
MMSEk (P (m) ; hk ) = min MSEk (P (m) , β (m) gk ; hk ). (4.85)

gk
The scaling β (m) introduced artificially when optimizing P (m) is canceled in

this second step of every iteration.
In every iteration, the algorithm computes P (m) (4.82) and
(m) K
{gk }k=1 (4.84), which are solutions of the optimization prob-
lems (4.77) and (4.85), respectively. Therefore, the mean sum of MMSEs
PK (m)

k=1 Ehk |y T MMSEk (P ; hk ) is decreased or stays constant in every
iteration
K
X h i XK h i
Ehk |yT MMSEk (P (m) ; hk ) ≤ Ehk |yT MMSEk (P (m−1) ; hk ) .
k=1 k=1
(4.86)
Because it is bounded by zero from below it converges. But convergence to

the global minimum is not ensured for P-CSI (for C-CSI it is proved in [99]).
From convergence follows that β (m) → 1 for m → ∞.30
The initialization is chosen as described in the next subsection for prob-
lem (4.76).
The numerical complexity for the solution of the system of linear equa-
tions (4.82) is of order O M 3 . The total computational load is dominated
by the multidimensional numerical integration to compute the conditional
mean with a complex Gaussian probability density: K + KM + KM 2 inte-
grals in 2M real (M complex) dimensions (unlimited domain) with closely
related integrands
have to be evaluated. This involves a matrix product with
order O KM 2 and evaluations of G(m−1)(h) per integration point. It can
be performed by
30 (m−1)
If β (m) 6= 1, then gk necessarily changed in the previous iteration leading to
a new P (m) 6= P (m−1) : β (m) ensures that the squared norm of the updated P (m)
equals Ptx . On the other hand, if the algorithm has converged and P (m) = P (m−1) ,
then β (m) = 1.
• Monte-Carlo integration [62], which chooses the integration points pseudo-

randomly, or
• monomial cubature rules (a multi-dimensional extension of Gauss-Hermite
integration [172]) [38, 137].
Monomial cubature rules choose the integration points deterministically and
are recommended for medium dimensions (8−15) of the integral [37]. They are
designed to be exact for polynomials of a specified degree in the integration
variable with Gaussian PDF; methods of high accuracy are given in [202].
Optimization Problem (4.76)
The second optimization problem (4.76) is solved choosing scaled matched

filters as receiver models and the receivers’ corresponding cost functions
CORk (P , gk ; hk ) ((4.49) in Section 4.3.2). From (4.52) we know that the
conditional mean of the minimum is identical to MMSEk (P ).
In the first step, we minimize the conditional mean estimate of the sum of
receivers’ cost functions CORk given gcork (P
(m−1)
; hk ) (4.47) from the previ-
ous iteration m − 1:
K
X h i
min Ehk |yT CORk P , β gcor
k (P
(m−1)
; hk ); hk s.t. kP k2F ≤ Ptx .
P ,β
k=1
(4.87)
The scaling β at the receivers is introduced for the same reason as above.
The cost function is
K
X h i
Ehk |yT CORk (P , β gcor
k (P
(m−1)
; hk ); hk )
k=1
K
" K
!#
X 2 X
2
=K+ Ehk |yT |β| gcor
k (P
(m−1)
; hk ) pH ∗
i Rhk |y T pi + cnk
k=1 i=1
K
X h i
− Ehk |yT 2Re β gcor
k (P (m−1)
; h )h T
k k k p
k=1
" K
! #
X (m−1) ∗
2 (m−1) 2 H
= K + |β| Ḡ + |β| tr P Gk Rhk |yT P
k=1
h h i i
− 2Re β tr Eh|yT G(m−1)(h)H P (4.88)
PK (m−1)
with Ḡ(m−1) = k=1 cnk Gk , [G(m−1)(h)]k,k = gcor
k (P
(m−1)
; hk ), and
(m−1),H ∗ (m−1)
(m−1)
2 pk Rhk |yT pk
Gk = Ehk |yT [G(m−1)(h)]k,k = (m−1),2
(4.89)
c̄rk
i R (m−1),∗
hk |y T pk
h
Ehk |yT [G(m−1)(h)]k,k hk = (m−1)
. (4.90)
c̄rk
All expectations can be computed analytically for this receiver model. More-
over, only the conditional first and second order moments are required and no
assumption about the pdf ph|yT (h|y T ) is made in contrast to problem (4.75).
The solution to (4.87) is derived in Appendix D.1 comparing (4.88)
with (D.2). It reads
K
!−1
X (m−1) ∗ Ḡ(m−1) h iH
P (m)
=β (m),−1
Gk Rhk |yT + IM Eh|yT G(m−1)(h)H
Ptx
k=1
(4.91)
with β (m) chosen to satisfy the power constraint with equality (cf. (D.13)).
In the second step of iteration m we determine the optimum receive filters
by
min CORk (P (m) , β (m) gk ; hk ) = MCORk (P (m) ; hk ), (4.92)

gk
which are the scaled matched filters (4.47)
(m)
(m) (hT
k pk )
∗
gk = gcor
k (P
(m)
; hk ) = β (m),−1 (m)
(4.93)
c̄rk
(m)
based on the mean
variance of the receive signal c̄rk =
PK (m),H ∗ (m)
i=1 pi Rhk |yT pi + cnk . In fact, we determine the receivers as a
function of the random vector hk , which we require for evaluating (4.88) in
the next iteration.
In every iteration the cost function (4.76) decreases or stays constant:
K
X K
X
MMSEk (P (m) ) ≤ MMSEk (P (m−1) ). (4.94)
k=1 k=1
Asymptotically β (m) → 1 for m → ∞.31
31
Problem (4.76) with an equality
√ constraint can be written as an unconstrained
optimization problem in P = P̄ Ptx /kP̄ kF . For this equivalent problem, the iter-
ation above can be interpreted as a fixed point solution to the necessary condition
on the gradient w.r.t. P̄ to be zero. The gradient for the kth column p̄k of P̄ reads
(cf. (4.52), (4.43), and (4.36))
For P-CSI, the solution to problem (4.70) assuming equal32 gk = β can

serve as initialization, since we would like to obtain an improved performance
w.r.t. this standard approach for P-CSI: This approach is based on the as-
sumption of small errors (compare Section 4.3.4 and Appendix C) in the
prediction ĥ of h from y T .
The order of computational complexity is determined by the system of
linear equations (4.91) and is O M 3 . It is significantly smaller than for
optimizing (4.75), because all conditional mean expressions can be evaluated
analytically.
4.4.2 From Complete to Statistical Channel State

Information
Because we characterize the CSI at the transmitter by ph|yT (h|y T ), a smooth

transition from C-CSI to S-CSI is possible. Accurate estimation of the nec-
essary parameters of ph|yT (h|y T ) is addressed in Chapter 3.
The asymptotic case of C-CSI is reached for C v = 0M (Sec-
tion 4.1.3)
and a time-invariant
channel with constant autocovariance se-
quence E h[q]h[q − i]H = C h . Both optimization problems (4.75) and (4.76)
coincide (cf. (4.69)) for this case. Their iterative solution is
!−1
(m) (m),−1 (m−1),H (m−1) Ḡ(m−1) (m−1),H
P =β HG HG + IM HG (4.95)
Ptx
with sum of noise variance after the receivers

K
X
Ḡ(m−1) = cnk |gmmse
k (P (m−1) ; hk )|2 , (4.96)
k=1
(m−1)
effective channel H G = G(m−1)(h)H, and scaling of P (m)
“ √ ”
PK Ptx
∂ MMSEi P̄ PK K
1 ∗
i=1 kP̄ kF i=1 cni Gi
X
= p̄k + Gi R∗
hi |y T p̄k − R p̄
∂ p̄∗
k Ptx i=1
c̄′rk hk |yT k
′,2
p̄H ∗
and c̄′ri = cni kP̄ k2F /Ptx + K H ∗
P
with Gi = i Rhi |y T p̄i /c̄ri j=1 p̄j Rhi |y T p̄j . The re-
lation to (4.91) is evident. We proved convergence to a fixed point which is a local
minimum of the cost function. In analogy, the same is true for problem (4.75) and
its iterative solution (4.82) (assuming that the derivative and the integration can be
interchanged, i.e., cnk > 0).
32
The explicit solution is given in Appendix D.1.
−2
(m−1) (m−1),H (m−1) Ḡ(m−1) (m−1),H
tr H G HG HG + Ptx I M HG
(m),2
β = .
Ptx
(4.97)
In [102, 99] it is shown that the global optimum is reached if initialized with
non-zero gmmse
k (P (m−1) ; hk ). An algorithm with faster convergence is derived
in [146].
If yT is statistically independent of h, the solutions from the previous
section depend only on S-CSI. The iterative solution to problem (4.76) reads
K
!−1
X (m−1) ∗ Ḡ(m−1) h iH
P (m) = β (m),−1 Gk C hk + IM EH G(m−1)(h)H
Ptx
k=1
(4.98)
with second order moment of the receive filters in iteration m − 1

(m−1),H (m−1)
(m−1) pk C ∗hk pk
Gk = (m−1),2
, (4.99)
c̄rk
h i C p(m−1),∗
hk k
Ehk |yT [G(m−1)(h)]k,k hk = (m−1)
, (4.100)
c̄rk
PK (m−1)
and Ḡ(m−1) = k=1 cnk Gk . The solution only depends on the second
order moment C h of the channel.
For S-CSI, we choose the solution to problem (D.28) in Appendix D.3 as
initialization. It is given in closed form under the assumption of zero-mean
h.
4.4.3 Examples
For insights in the iterative solution to minimizing the sum MSE (4.75)
and (4.76), we consider two cases:
• P-CSI at the transmitter and channels with rank-one covariance matrices
C hk ,
• S-CSI and uncorrelated channels.
Rank-One Channels
If rank[C hk ] = 1, we can write (Karhunen-Loève expansion)

hk = xk v k (4.101)
defining E[|xk |2 ] = E[khk k22 ] and kvk22 = 1. This results in
H = XV T (4.102)
where X = diag [x1 , x2 , . . . , xK ] and v k is the kth column of V ∈ CM ×K .

To apply this channel model to the solution based on the MMSE receiver
model (4.82), we can simplify
h i h i
Ehk |yT gmmse
k (P (m−1) ; hk )hk = Exk |yT gmmse k (P (m−1) ; xk v k )xk v k
| {z }
ak
(4.103)
h i
Eh|yT H H G(m−1)(h)H G(m−1)(h)H =
XK
2

= v ∗k v T E
k xk |y T g mmse
k (P (m−1)
; x v
k k ) |xk |2
= V ∗ BV T . (4.104)
k=1 | {z }
bk
The relevant parameters for the solution based on the scaled MF re-
ceiver (4.91) are
i v H p(m−1),∗ E 2

xk |y T |xk |
h
cor (m−1) k k
Ehk |yT gk (P ; hk )hk = vk (4.105)
c̄rk
| {z }
ak
and
K
X K
X
(m−1) (m−1)
Gk R∗hk |yT = Gk Exk |yT |xk |2 v ∗k v T ∗
k = V BV
T
(4.106)
k=1 k=1
| {z }
bk
(m−1)
with Gk = Exk |yT [|xk |2 ]|v T 2 2
k pk | /c̄rk , A = diag [a1 , . . . , aK ], and B =
diag [b1 , . . . , bK ].
Therefore, both approaches result in
K
!−1
X Ḡ(m−1)
P (m) = β (m),−1 v ∗k v T
k bk + IM V ∗ A∗ (4.107)
Ptx
k=1
!−1
(m),−1 ∗ T Ḡ(m−1) −1
∗
=β V V V + B B −1 A∗ , (4.108)
Ptx
where we applied the matrix inversion lemma (A.20) in the second equation.
For large Ptx and K ≤ M , the overall channel is diagonal
−1
G(m) HP = G(m) XV T P (m) = β (m),−1 G(m) XV T V ∗ V T V ∗ B −1 A∗
(4.109)
= β (m),−1 G(m) XB −1 A∗ , (4.110)
because enough degrees of freedom are available to avoid any interference

(zero-forcing).
We assume convergence of the iterative precoder optimization, i.e., β (m) →
1 for m → ∞. For an MMSE receiver (and any zero-forcing precoder),
G(m) HP → I K for high Ptx , because gmmse k (P (m) ; hk ) = bk /(xk a∗k ).
For a scaled MF gcor
k (P (m)
; h k ) = x∗
k /(E 2 T
xk |y T [|xk | ](v k pk ) at the receiver,
the kth diagonal element of G HP converges to |xk | /Exk |yT [|xk |2 ].
(m) 2
The iterative solutions to both optimization problems (4.75) and (4.76)

for rank-one channels differ only by the regularization term B −1 Ḡ(m−1) /Ptx
in the inverse and the allocation of the power, which is determined by a∗k /bk
for large Ptx (cf. (4.108)). Note that performance criteria and solutions are
identical for Ptx → ∞ and rank-one C hk .
Uncorrelated Channels
The other extreme scenario is given by S-CSI with C hk = ch,k I M , i.e., spa-
tially uncorrelated channels with different mean attenuation. Defining a nor-
malized noise variance c′nk = cnk /ch,k and assuming kP k2F = Ptx , the MMSE
for the AWGN BC model (Figure 4.11) reads
kpk k22 kpk k22

MMSEk = 1 − = 1 − . (4.111)
kP k2F + c′nk Ptx + c′nk
The corresponding sum rate in (4.44) and (4.39) is

K
X K
X
AWGN
Rk = log2 Ptx + c′nk − log2 Ptx + c′nk − kpk k22 . (4.112)
k=1 k=1
Both performance measures depend only on the norm kpk k22 , i.e., the power
allocation to every receiver.
It can be shown that the maximum sum rate is reached serving only the
receiver with smallest c′nk , which yields
K
X AWGN Ptx
Rk = log2 1 + ′ . (4.113)
cnk
k=1
The choice of pk is only constrained by kpk k22 = Ptx and any vector is equiv-
alent on average, which is clear intuitively.
Minimizing the sum MSE
K K
X X kpk k22
MMSEk = K − (4.114)
Ptx + c′nk
k=1 k=1
yields the same solution as problem (4.112) for unequal c′nk with minimum
K
X Ptx
K −1≤ MMSEk = K − ≤ K. (4.115)
Ptx + c′n
k=1
Note that the sum MSE (4.114) depends only on kP k2F = Ptx for identical
c′nk= cn . Thus, every initialization of the iterative algorithm in Section 4.4.1
is a stationary point. This is not true for the sum rate (4.112), which is max-
imized for selecting only one arbitrary receiver. This observation illustrates
the deficiency of using this lower bound to the sum rate.33
4.4.4 Performance Evaluation
The proposed algorithms, which minimize the sum MSEs in (4.75) and (4.76),
are evaluated for the mean sum rate Eh,yT [Rsum (P(yT ); h)] (4.27) and differ-
ent degrees of CSI (Table 4.1). In Section 5.5 we present a comparison with
nonlinear precoding w.r.t. the mean uncoded bit error rate (BER).
The following linear precoders based on sum MSE are considered here and
in Section 5.5: 34
• For C-CSI, we solve (4.69) assuming MMSE receivers. The solution is given
in [99, 102] and is a special case of Section 4.4.1 assuming C-CSI (“MMSE-
Rx (C-CSI)”). As initialization we use the MMSE precoder from [116] (cf.
Appendix D.1 with G(m−1)(h) = I K ) for identical receivers gk = β and
C-CSI.
• The heuristic in (4.70), where a channel estimate is plugged into the cost
function for C-CSI as if it was error-free (Appendix C), is applied assuming
two different types of CSI: Outdated CSI ĥ = Eh[q′ ]|y[q′ ][h[q ′ ]] with q ′ =
q tx − 1 (O-CSI), i.e., the channel estimate from the last reverse link time
slot is used without prediction, and P-CSI with the predicted channel
ĥ = Eh|yT [h] (“Heur. MMSE-Rx (O-CSI)” and “Heur. MMSE-Rx (P-
CSI)”). It is initialized in analogy to C-CSI but with O-CSI and P-CSI,
respectively.
33
A possible compensation would be to select the active receivers before designing
the precoding P .
34
The names employed in the legends of the figures are given in quotation marks.
• For P-CSI, the transmitter’s cost function is given by the CM estimate of

the receivers’ remaining cost (cf. (4.75) and (4.76)). The iterative algorithm
in (4.82) and (4.84) for problem (4.75) with MMSE receivers (“MMSE-Rx
(P-CSI)”) is implemented using a straightforward Monte Carlo integra-
tion with 1000 integration points. The steps in the iterative algorithm
in (4.91) and (4.93) assuming scaled matched filter (MF) receivers for
problem (4.76) (“MF-Rx (P-CSI)”) are already given in closed form. The
same initialization is used for both, which is described in Section 4.4.1.
• For S-CSI, the iterative algorithm based on the CM estimate of the cost
functions is a special case of P-CSI for both receiver models (“MMSE-Rx
(S-CSI)” and “MF-Rx (S-CSI)”); see also (4.98) for the scaled matched
filter receiver. It is initialized by the precoder from Appendix D.3.
For all MSE based algorithms, we choose a fixed number of 10 iterations to
obtain a comparable complexity. We refer to the latter two solutions as robust
linear precoding with P-CSI or S-CSI, respectively.
For comparison, we include algorithms which optimize sum rate directly:
• For C-CSI, we choose the precoder based on zero-forcing with a greedy
receiver selection [59] (“Max. sum rate”). The comparison in [183] shows
that it performs very close to the maximum sum rate with complex Gaus-
sian data d [n].
• For P-CSI and S-CSI, the fixed point iteration in [31] finds a stationary
point for maximization of the sum rate in the AWGN BC model (4.45)
(“Max. sum rate (P-CSI)” and “Max. sum rate (S-CSI)”).35 Its complexity
is of order O M 3 per iteration. Contrary to C-CSI (Figure 4.9) we do not
include receiver selection in order to allow for a direct comparison with
the sum MSE approaches, which have the same order of complexity.36
We choose the same initialization as for the sum MSE algorithms from
Section 4.4.1 and limit the number of iterations to 40.
The alternative to linear precoding, which serves a maximum of K receivers in
every time slot, is a time division multiple access (TDMA) strategy. In TDMA
only a single receiver is served in every time slot. We choose TDMA with
maximum throughput (TP) scheduling and a maximum SNR beamformer
based on P-CSI (“TDMA max. TP (P-CSI)”).
P-CSI is based on the observation of Q = 5 reverse link time slots with
N = 32 training symbols per receiver (orthogonal training sequences with
S H S = N I M K ). The variances of the uncorrelated noise at the transmitter
C v = cn I M (Section 4.1.3) and receivers are identical cnk = cn . The TDD
system has the slot structure in Figure 4.2, i.e., “outdated CSI” (O-CSI)
is outdated by one time slot. The receivers gk are implemented as MMSE
receivers (4.25) which have the necessary perfect CSI.
35
No proof of convergence is given in [31].
36
Including the receiver selection proposed in [31] the total algorithm has a numerical
complexity of O(K 2 M 3 ).
Fig. 4.14 Comparison of maximum sum rate and minimum sum MSE precoding
based on the AWGN BC model with fmax = 0.2 (Scenario 1).
The temporal channels correlations are described by Clarke’s model (Ap-

pendix E and Figure 2.12) with a maximum Doppler frequency fmax , which
is normalized to the slot period Tb . For simplicity, the temporal correlations
are identical for all elements in h (2.12). Two different stationary scenarios
for the spatial covariance matrix with M = 8 transmit antennas are com-
pared. For details and other implicit assumption see Appendix E. The SNR
is defined as Ptx /cn .
Scenario 1: We choose K = 6 receivers with the mean azimuth directions
ϕ̄ = [−45◦ , −27◦ , −9◦ , 9◦ , 27◦ , 45◦ ] and Laplace angular power spectrum with
spread σ = 5◦ .
Scenario 2: We have K = 8 receivers with mean azimuth directions
ϕ̄ = [−45◦ , −32.1◦ , −19.3◦ , −6.4◦ , 6.4◦ , 19.3◦ , 32.1◦ , 45◦ ] and Laplace angu-
lar power spectrum with spread σ = 0.5◦ , i.e., rank[C hk ] ≈ 1.
Discussion of Results
Because the lower bound to the sum rate, which depends on the sum MSE,
is generally not tight, we expect a performance loss when minimizing the
sum MSE compared to a direct maximization of the sum rate. For C-CSI,
the loss is larger for increasing K/M and Ptx /cn (Figure 4.12). This is also
true for P-CSI and S-CSI (Figure 4.14). But the gain compared to TDMA is
significant for all methods.
The algorithm for maximizing the sum rate of the AWGN BC model con-
verges slowly and requires 30-40 iterations, i.e., the overall complexity is sig-
nificantly larger than for the sum MSE approaches. But the latter are already
outperformed in the first iterations, because the sum MSE is a lower bound
to the sum rate (4.45) of the AWGN BC (compare with C-CSI in Figure 4.9).
Fig. 4.15 Mean sum rate vs. Ptx /cn for P-CSI with fmax = 0.2 (Scenario 1).
Fig. 4.16 Mean sum rate vs. fmax at 10 log10 (Ptx /cn ) = 20 dB (Scenario 1).
Still the MSE is often considered as practically more relevant. For example,
when transmitting data with a fixed constellation size, the MSE achieves a
better performance.
With respect to the mean sum rate, the algorithms based on the scaled
matched filter receiver model and the MMSE receiver show almost identical
performance. Due to its simplicity, we only consider the solution based on
the scaled matched filter.
The heuristic optimization of P based on sum MSE and P-CSI only applies
a channel estimate to the cost function for C-CSI. For a maximum Doppler
frequency below fmax = 0.15, almost the whole gain in mean sum rate of
robust linear precoding with P-CSI compared to the heuristic with O-CSI
is due to channel prediction (Figures 4.15 and 4.16): Concerning the sum
MSE as the cost function, the errors in CSI are still small enough and can
be neglected (Appendix C); there is no performance advantage when we con-
Fig. 4.17 Convergence in mean sum rate of alternating optimization with matched
filter (MF) receiver model for C-CSI, P-CSI, and S-CSI (P-CSI with fmax = 0.2,
10 log10 (Ptx /cn ) = 30 dB, Scenario 1).
Fig. 4.18 Mean sum rate vs. Ptx /cn for P-CSI with fmax = 0.2 (Scenario 2).
sider the error covariance matrix in the optimization problem. Thus, accurate
estimation of the channel predictor’s parameters or its robust optimization
is crucial (see Chapters 2 and 3). On the other hand, a systematic robust
precoder design with P-CSI is important for fmax > 0.15, where the heuristic
optimization performs close to TDMA.
A meaningful precoder solution for S-CSI is obtained taking the expected
cost (robust precoder). About twice the sum rate of TDMA can be achieved
in this scenario. As fmax increases, a smooth transition from C-CSI to S-CSI
is achieved when minimizing the CM estimate of the receivers’ cost.
For this scenario, the robust precoder for P-CSI converges within three it-
erations (Figure 4.17). The initialization of the iterative algorithms for C-CSI
and S-CSI is already close to the (local) optimum. For C-CSI, more iterations
are required as K/M increases, but typically 10 iterations are sufficient.
If C hk has full rank (as in scenario 1), the system is interference-limited
and the sum rate saturates for high Ptx /cn . Eventually, it is outperformed by
TDMA, which is not interference-limited; therefore, its performance does not
saturate. A proper selection of the receivers, which should be served in one
time slot, would allow for convergence to TDMA at high Ptx /cn . In [128] the
conjecture is made that the multiplexing gain of the broadcast sum capacity
for P-CSI is one,
PKi.e., only one receiver is served at the same time. On the
other hand, if k=1 rank[C hk ] ≤ M and the eigenvectors corresponding to
non-zero eigenvalues are linearly independent, the interference can be avoided
by linear precoding and the sum rate does not saturate.
In scenario 2, the covariance matrices C hk are close to rank one. Therefore,
the performance for P-CSI and S-CSI is very close to C-CSI (Figure 4.18)
for M = K. The heuristic degrades only at high Ptx /cn because the channel
estimates ĥk lie in the correct (almost) one-dimensional subspace.
In the next chapter (Section 5.5), we show the performance results for the
mean uncoded bit error rate (BER), which is averaged over the K receivers,
for comparison with nonlinear precoding. A fixed 16QAM (QAM: quadra-
ture amplitude modulation) modulation is chosen. Minimizing the sum MSE
corresponds to an optimization of the overall transmission quality without
fairness constraints. Maximization of the sum rate is not a useful criterion
to optimize precoding for a fixed modulation D. Figures 5.10 with 5.16 give
the uncoded BER for robust linear precoding with P-CSI and S-CSI in both
scenarios.
Performance for Estimated Covariance Matrices
All previous performance results for linear precoding and P-CSI have assumed
perfect knowledge of the channel covariance matrix C hT and the noise covari-
ance matrix C v = cn I M . Now we apply the algorithms for estimating channel
and noise covariance matrices which are derived in Chapter 3 to estimate the
parameters of ph|yT (h|y T ).
Only B = 20 correlated realizations h[q tx − ℓ] are observed via y[q tx − ℓ]
for ℓ ∈ {1, 3, . . . , 2B − 1} , i.e., with a period of NP = 2. They are generated
according to the model (2.12); the details are described above.
For estimation, C hT is assumed to have the structure in (3.98), i.e., the K
receivers’ channels are uncorrelated and have different temporal and spatial
correlations.37 We estimate the Toeplitz matrix C T,k , which describes the
37
The real model (2.12) with block diagonal C h is a special case of (3.98).
Fig. 4.19 Mean sum rate vs. Ptx /cn for P-CSI with fmax = 0.1 for estimated co-
variance matrices (Scenario 1, B = 20).
Fig. 4.20 Mean sum rate vs. fmax at 10 log10 (Ptx /cn ) = 20 dB for estimated covari-
ance matrices (Scenario 1, B = 20).
temporal correlations, and C hk separately as proposed in Section 3.3.6.38

Two algorithms are compared:
• The heuristic estimator (“P-CSI, heur. est.”) determines Ĉ hk according
to (3.96) and Ĉ T,k with (3.86) for I = 1 and P = 1. Ĉ T,k is scaled to have
ones on the diagonal as required by (3.98).
• The advanced estimator (“P-CSI, ML est.”) exploits the conclusions from
Chapter 3: Ĉ hk is obtained from (3.144) which is considerably less com-
plex than the ML approach (Section 3.3); Ĉ T,k is the ML estimate from
Section 3.3.5 with 10 iterations, which is also scaled to have ones on the
diagonal.
38
For example, when estimating C hk we assume implicitly that C T,k = I B and vice
versa.
4.5 Mean Square Error Dualities of BC and MAC 167
Finally, the samples of the autocovariance sequence39 in Ĉ T,k are interpolated

by solving the minimum norm completion problem (3.104) assuming fmax =
0.25.
The noise variance cn at the transmitter is estimated by (3.67) for I = 1.
“P-CSI, perfect” denotes the performance for perfectly known covariance
matrices.
At moderate to high SNR the advanced estimator for the channel corre-
lations yields a significantly improved rate compared to the heuristic (Fig-
ure 4.19). For a maximum Doppler frequency fmax close to zero, the mean
rate of all algorithms is comparable because this case is not very challenging
(Figure 4.20). But for fmax ≈ 0.1 the advanced estimator yields a 10% gain
in rate over the heuristic. For high fmax , good channel prediction is impos-
sible and the precoder relies only on the estimation quality of Ĉ hk , which is
sufficiently accurate also for the heuristic estimator (Figure 4.20).
With only B = 20 observations the channel covariance matrices can al-
ready be estimated with sufficient accuracy to ensure the performance gains
expected by prediction and a robust precoder design. For fmax < 0.15, predic-
tion is the key component of the precoder to enhance its performance which
requires a ML estimate of the temporal channel correlations. For high fmax ,
the robust design based on an estimate of C hk achieves a significantly larger
mean sum rate than the heuristic precoder, which is also simulated based on
the advanced estimator of the covariance matrices.
Comparing the maximum Doppler frequency fmax for a given sum rate
requirement, the allowable mobility in the system is approximately doubled
by the predictor and the robust design.
4.5 Mean Square Error Dualities of Broadcast and

Multiple Access Channel
One general approach to solve constrained optimization problems defines a

dual problem which has the same optimal value as the original problem.
For example, the Lagrange dual problem can be used to find a solution to
the original problem, if it is convex and feasible (strong duality and Slater’s
condition) [24].
To solve optimization problems for the broadcast channel (BC) it is ad-
vantageous to follow a similar idea. The construction of a dual problem is
based on the intuition that the situation in the BC is related to the multi-
ple access channel (MAC). Therefore, a MAC model is chosen which has the
same achievable region for a certain performance measure as the BC model
of interest. Examples for BC optimization problems whose solution can be
obtained from the corresponding dual MAC are introduced in Section 4.6. For
39
They are only given for a spacing of NP = 2.
Section 4.6, only a basic understanding of the BC-MAC duality is necessary

and the details of the proofs for duality in this section can be skipped.
Consider the case of linear precoding (Figure 4.21) and C-CSI. For C-CSI,
it is well known that the MAC model in Figure 4.22 has the same achievable
region forPthe following performance measures under a sum power constraint
K ,2
kP k2F = k=1 pmack ≤ Ptx : Achievable rate (4.18) [230], SINR (4.19) [185],
and MSE (4.24) [7, 146, 221]. The rate for the data (complex Gaussian
distributed) from the kth transmitter in the MAC is
Rmac
k (p
mac
, umac
k ;h
mac
) = I(dkmac ; d̂kmac )
= log2[1 + SINRmac
k (p
mac
, umac
k ;h
mac
)] , (4.116)
which depends on
,2 mac ,T mac 2
pmac |uk hk |
SINRmac
k (p
mac
, umac
k ;h
mac
)= K
k
. (4.117)
P ,2 mac ,T mac 2
pmac
i |uk hi | + kumac
k k2
2
i=1
i6=k
Similarly, MSEmac
k (p
mac
, umac
k , gk
mac
; hmac ) = Enkmac ,d mac [|d̂kmac − dkmac |2 ] can
be defined, which results in the relation
.
mac mac
min
mac
MSE k (p , u mac mac
k , gk ; h mac
) = 1 (1 + SINRmac k (p
mac
, umac
k ;h
mac
)).
gk
(4.118)
The BC-MAC duality is valid under the following assumptions:

PK ,2
• The total transmit power is constrained by k=1 pmac k ≤ Ptx .
mac −1/2
• The dual MAC channel is defined as hk = hk cnk with uncorrelated
noise n mac of variance one, where hmac
k is the kth column of H mac ∈ CM ×K
mac mac
and h = vec[H ].
• Each data stream is decoded separately at the receiver.
This shows that the necessary MAC model is only a mathematical tool, i.e.,
a virtual MAC. A true MAC does not have a total power constraint, but indi-
vidual power constraints, and the channel H mac would be H T if reciprocal.
To summarize we define MAC-BC duality as follows:
Definition 4.1. A BC and MAC model are dual regarding a set of BC and
MAC performance measures if all values for the BC performance measures
are achievable if and only if the same values are achievable for the MAC
performance measures in the dual MAC model using the same total transmit
power.
The important advantage in choosing a dual MAC model is that the K
performance measures for the K data streams are only coupled in pmac and
Fig. 4.21 Broadcast channel (BC) model with linear receivers (as in Figure 4.3).
Fig. 4.22 Dual model for P-CSI with multiple access (MAC) structure and linear
receivers corresponding to the BC model in Figure 4.21.
decoupled in umac
k (cf. (4.117)). In the BC they are decoupled in gk , but
coupled in pk .
The following proofs of BC-MAC duality for P-CSI are carried out in
analogy to Utschick and Joham [221], who show the MSE duality for the
MIMO-BC and C-CSI in a more general setting.
4.5.1 Duality for AWGN Broadcast Channel Model
To prove duality for the BC (Figure 4.21) and the performance

measure Ehk |yT [CORk (P , gk (hk ); hk )] (4.49), we introduce the
MAC model in Figure 4.22: Only the scalar filters Gmac (hmac ) =
diag [gmac
1 (hmac
1 ), gmac
2 (hmac
2 ), . . . , gmac mac
K (hK )] at the receiver have knowl-
mac
edge about the realization of H , all other parameters are optimized based
on P-CSI, i.e., the probability distribution of H mac given y T . The dual MAC
performance measure is Ehkmac |yT [CORmac k (p
mac
, umac mac mac
k , gk (hk ); hkmac )]
where
CORmac
k (p
mac
, umac mac mac mac
k , gk (hk ); hk ) =
K
!
X
mac 2 mac ,T ,2 ,∗
=1+ |gmac
k (hk )| uk IM + pmac
i Rhimac |yT umac
k
i=1
mac mac ,T mac mac

− 2Re gk (hmac
k )uk hk pk . (4.119)
We define the linear receive filter U mac = [umac 1 , umac

2 , . . . , umac
K ] ∈
CM ×K based on P-CSI and the power allocation at the transmitters
pmac = [pmac
1 , pmac
2 , . . . , pmac T
K ] with a constraint on the total transmit power
PK mac ,2
k=1 pk ≤ Ptx . The noise process n mac at the receivers is zero-mean with
covariance matrix C n mac = I M .
The minimum
MCORmac
k (p
mac
, umac mac
k ; hk ) =
,2 mac ,T mac 2
pmac
k |uk hk |
=1− PK mac ,2 (4.120)
,T ,∗
umac
k I M + i=1 pi Rhimac |yT umac
k
w.r.t. gkmac = gmac mac

k (hk ) is achieved for
,T mac mac ∗

umac
k hk pk
gkmac = gmac mac
k (hk ) = PK . (4.121)
,T ,2 mac ,∗
umac
k IM + i=1 pmac
i R hi
mac |y
T
u k
The conditional mean of the minimum

1
Ehkmac |yT [MCORmac
k (p
mac
, umac mac
k ; hk )] = mac
1+ SINRk (umac
k ,p
mac
)
(4.122)
depends on the common SINR definition40 for P-CSI
mac pmac ,2 umac ,T

Rhkmac |yT umac ,∗
SINRk = k k k
 . (4.123)
PK
,T  ,2 ,∗
umac
k IM + pmac
i Rhimac |yT  umac
k
i=1
i6=k
Therefore, a duality for the cost functions Ehk |yT [CORk (P , gk (hk ); hk )] and
Ehkmac |yT [CORmac
k (p
mac
, umac mac mac
k , gk (hk ); hkmac )] also proves a duality for the
40
This SINR definition is also assumed in the duality theory by Schubert et al.[185].
Moreover it is often considered when optimizing beamforming at a receiver based on
S-CSI, which is derived differently. Here we see that this criterion can be justified: It
is the optimization criterion for a beamformer based on P-CSI which is followed by
the scaled matched filter (4.121) with knowledge of hmac
k .
AWGN
AWGN BC model (Figure 4.11) w.r.t. its information rate Rk (4.39) and
SINRk (4.36).41
For the proof of duality, we have to constrain the possible filters to be
consistent.
Definition 4.2. For the BC, a choice of gk (hk ) and pk is consistent, if pk =
0M and gk (hk ) ≡ 0 are only zero simultaneously. For the MAC, a choice of
umac
k and pmac
k is consistent, if pmac
k = 0 and umac
k = 0M (or gmac mac
k (hk ) ≡ 0)
only simultaneously.42
With this restriction we can summarize the BC-MAC duality in a theorem.
Theorem 4.1. Assume a consistent choice of gk (hk ) and pk for
all k and, alternatively, of umac k and pmack . Consider only the ac-
tive receivers and transmitters with pk 6= 0M or pmac k 6= 0.
Then all values of Ehk |yT [CORk (P , gk (hk ); hk )], ∀k ∈ {1, 2, . . . , K},
are achievable if and only if the same values are achievable for
Ehkmac |yT [CORmac mac mac
k (gk (hk ), umac
k ,p
mac
; hmac
k )], ∀k ∈ {1, 2, . . . , K}, with
kpmac k22 = kP k2F .
To prove this result we construct a transformation of the filters from MAC
to BC [221], which yields identical performance measures
Ehk |yT [CORk (P , gk (hk ); hk )] =

(4.127)
= Ehkmac |yT [CORmac
k (p
mac
, umac mac mac
for the same total power kpmac k22 = kP k2F .

As the choice of a transformation is not unique [221], we introduce several
constraints on the transformation. Firstly, the desired symbols dk and dkmac
experience the same total channel in the mean
41
The dual AWGN MAC model is defined by
“q ”
d̂kmac = gmac (hmac ) Ehkmac |yT [|umac ,T h mac |2 ]pmac d mac + i mac + nmac (4.124)
k k k k k k k k
and
K q
X
ikmac = Ehimac |yT [|umac ,T h mac |2 ]pmac d mac . (4.125)
k i i i
i=1
i6=k
Assuming complex Gaussian signaling this yields the dual MAC rate
h i
mac mac
Rk (pmac , umac
k ) = log2 1 + SINRk (pmac , umac
k ) (4.126)
mac
which is a function of SINRk (pmac , umac k ). Implicitly, we assume a separate de-
coding of the K data streams in the dual MAC.
42
If the transmit and receive filters are zero simultaneously, the receiver’s perfor-
mance measure is equal to one and duality is obvious. Therefore, we can exclude
these data streams from the following duality and treat them separately.
,T mac mac

Ehkmac |yT Re gmac mac
k (hk )umac
k hk p k = Ehk |yT Re gk (hk )hkT pk .
(4.128)
Moreover, the power control in BC and MAC is different (kpk k22 6=

,2
pmac
k ) in general and the same transmit power is used. Therefore, we
introduce additional degrees of freedom ξk ∈ R+,0 which satisfy ξk2 =
,2
pmac
k /(Ehk |yT [|gk (hk )|2 ]cnk ) = kpk k22 /kumac 2
k k2 . This results in the con-
straints
,2
ξk2 Ehk |yT |gk (hk )|2 cnk = pmac
k (4.129)
ξk2 kumac 2 2
k k2 = kpk k2 , (4.130)
which yield a simple system of linear equations in {ξk2 }K k=1 below.

A straightforward choice of the MAC parameters satisfying (4.129)
and (4.130) is

pmac
k = c1/2 2 1/2
nk ξk (Ehk |y T |gk (hk )| )
(4.131)
umac
k = ξk−1 pk .
To ensure (4.128) we set
hmac
k = hk cn−1/2
k
(4.132)

2 −1/2
gmac mac
k (hk ) = gk (hk )Ehk |y T |gk (hk )| , (4.133)
−1/2
i.e., the dual MAC channel H mac = H T C n is not reciprocal to the BC
channel H.
Now, we have to choose {ξk }K
k=1 in order to guarantee (4.127). First, we
rewrite
Ehkmac |yT [CORmac

k (p
mac
, umac mac mac
K
!
X
=1+ ξk−2 pT
k IM + ξi2 Ehi |yT |gi (hi )|2 Rhi |yT p∗k
i=1

− 2Ehk |yT Re gk (hk )hkT pk (4.134)
and apply it to (4.127) which yields (cf. (4.49))

K
!
X
Ehk |yT |g(hk )|2 c̄rk = ξk−2 pH
k IM + ξi2 Ehi |yT |gi (hi )|2 R∗hi |yT pk
i=1
(4.135)
PK
for k = 1, 2, . . . , K. With c̄rk = cnk + i=1 pH ∗
i Rhk |y T pi we obtain a system
2 2 2 T
of linear equations in ξ = [ξ1 , ξ2 , . . . , ξK ]
W ξ = l, (4.136)
T
where l = kp1 k22 , kp2 k22 , . . . , kpK k22 and
(
−Ehv |yT |gv (hv )|2 pH ∗
u Rhv |y T p , u 6= v
[W ]u,v = 2
u 2
H ∗ .
Ehv |yT |gv (hv )| c̄rv − Ehv |yT |gv (hv )| pv Rhv |yT pv , u = v
(4.137)
Because of
X X
[W ]v,v > [W ]u,v = Ehv |yT |gv (hv )|2 pH ∗
u Rhv |y T pu , (4.138)
u6=v u6=v
W is (column) diagonally dominant if cnk > 0, i.e., it is always invertible

[95, p. 349].43 Additionally, the diagonal of W has only positive elements,
whereas all other elements are negative. This yields an inverse W −1 with non-
negative entries [21, p. 137]. Because l is positive and W −1 is non-negative,
ξ is always positive. We conclude that we can always find a dual MAC with
parameters given in (4.131) to (4.133) and by (4.136) which achieves the same
performance (4.127) as the given BC.
The transmit power in the BC is given by klk1 .44 It is equal
to taking the sum over all equations in (4.136). This yields klk1 =
PK 2 2
PK mac ,2
k=1 ξk Ehk |y T [|gk (hk )| ]cnk , which equals k=1 pk due to (4.131).
Therefore, a MAC performance identical to the BC is reached with the same
total transmit power.
In Appendix D.4 we prove the reverse direction: If some
Ehkmac |yT [CORmack (p
mac
, umac mac mac
k , gk (hk ); hkmac )] are achieved in the MAC,
then the same values can be achieved for Ehk |yT [CORk (P , gk (hk ); hk )] in
the BC with the same total power.
Example 4.4. For given power allocation pmac and a scaled MF (4.121), the
optimum beamformers umac k are the generalized eigenvectors maximizing
mac
SINRk (4.123). Due to (4.131), the corresponding precoding vectors pk
in the BC are the scaled generalized eigenvectors umac
k . ⊓
⊔
4.5.2 Duality for Incomplete Channel State

Information at Receivers
For incomplete CSI at the receivers due to a common training channel, we

proposed a BC model in Section 4.3.3, which we summarize in Figure 4.23.
It is based on a cascaded receiver: gk (hk ) is independent of P and relies on
43
If Ehv |yT [|gv (hv )|2 ] = 0 for some v, this receiver has to be treated separately.
44
The 1-norm of a vector l = [l1 , l2 , . . . , lK ]T is defined as klk1 = K
P
k=1 |lk |.
Fig. 4.23 BC model for incomplete CSI at receivers.
Fig. 4.24 Dual model with MAC structure corresponding to the BC model in Fig-
ure 4.23.
knowledge of hkT q k , whereas βk depends on the transmitter’s CSI and P mod-

eling a slow automatic gain control. The MSE for this model reads (4.65)45

Ehk |yT [MSEk (P , βk gk (hk ); hk )] = 1 + |βk |2 Ehk |yT |gk (hk )|2 cnk
XK h i
2 T
+ |βk |2 pH ∗ T
i Ehk |y T hk hk |gk (hk )| pi − 2Re βk Ehk |yT [gk (hk )hk ] pk .
i=1
(4.139)
As dual MAC we choose the structure in Figure 4.24 with P-CSI at the
receiver Gmac = [g mac
1 , g mac
2 , . . . , g mac
K ]
T
and transmitter. The conditional
mac mac
mean estimate of MSEk (p , g mac k ; h mac
) = En mac ,d mac [kd̂ mac − d mac k22 ]
45
Here we assume P-CSI at the transmitter instead of only S-CSI as in (4.65).
Eh mac |yT [MSEmac

k (p
mac
, g mac
k ;H
mac
)] = 1 + kg mac
k k2
2
K
!
X h i
mac ,H mac ,2 mac ,∗ mac ,T
+ gk pi Ehimac |yT hi hi g mac
k
i=1

,T
− 2Re g mac
k Ehkmac |yT [hkmac ] pmac
k (4.140)
is chosen as dual performance measure (C n mac = I M ).

Again we have to restrict the transmit/receiver filters to the following
class.
Definition 4.3. For the BC, a choice of βk , gk (hk ) 6≡ 0, and pk is consistent,
if pk = 0M and βk = 0 are only zero simultaneously. For the MAC, a choice of
g mac
k and pmac
k is consistent, if the same is true for pmac
k = 0 and g mac
k = 0M .
Duality is obvious for those performance measures corresponding to pk = 0M
and βk = 0 or pmac
k = 0 and g mac
k = 0M , respectively.
With this definition we can prove the following theorem:
Theorem 4.2. Assume a consistent choice of βk , gk (hk ), and pk for
all k and, alternatively, of g mack and pmac
k . Consider only the ac-
tive receivers and transmitters with pk 6= 0M or pmac k 6= 0.
Then all values of Ehk |yT [MSEk (P , βk gk (hk ); hk )], ∀k ∈ {1, 2, . . . , K},
are achievable if and only if the same values are achievable for
Eh mac |yT [MSEmac
k (p
mac
, g mac
k ;H
mac
)], ∀k ∈ {1, 2, . . . , K}, with kpmac k22 =
kP kF .
For given BC parameters, we construct a dual MAC with equal values of
its performance measures
Ehk |yT [MSEk (P , βk gk (hk ); hk )] = Eh mac |yT [MSEmac

k (p
mac
, g mac
k ;H
mac
)]
(4.141)
in analogy to Section 4.5.1.

First we require the conditional mean of the total channel gain to be equal

,T
Re βk Ehk |yT gk (hk )hkT pk = Re g mac
k Ehkmac |yT [hkmac ] pmac
k . (4.142)
The power allocation is linked to the noise variance after the receive filters
as

ξk2 cnk |βk |2 Ehk |yT |gk (hk )|2 = |pmac
k |
2
(4.143)
ξk2 kg mac 2 2
k k2 = kpk k2 . (4.144)
These three conditions are satisfied choosing

1/2
pmac
k = cn1/2
k
βk ξk Ehk |yT |gk (hk )|2
g mac
k = ξk−1 pk (4.145)
−1/2 −1/2
hmac
k = gk (hk )hk Ehk |yT |gk (hk )|2 cnk .
The detailed proof and the linear system of equations to compute {ξk2 }K
k=1 ∈
R+,0 are given in Appendix D.5.
4.6 Optimization with Quality of Service Constraints
Maximizing throughput (sum rate) or minimizing sum MSE does not pro-
vide any quality of service (QoS) guarantees to the receivers. For example,
receivers with small SINR are not served for small Ptx . But very often a QoS
requirement is associated with a data stream and it may be different for every
receiver. We consider QoS measures which can be formulated or bounded in
terms of the mean MSE (see Section 4.3) and give a concise overview of the
related problems. The most common algorithms for their solution rely on the
BC-MAC dualities from Section 4.5.
4.6.1 AWGN Broadcast Channel Model
In Section 4.3, the minimum MSE, i.e., MMSEk (P ) (4.43), for the AWGN
BC model turned out to be the most convenient measure among all presented
alternatives.
One standard problem is to minimize the required resources, i.e., the total
transmit power, while providing the desired QoS γk to every receiver. This
power minimization problem reads
min kP k2F s.t. MMSEk (P ) ≤ γk , k = 1, 2, . . . , K. (4.146)

P
Because of the one-to-one relation (4.43) with SINRk (P ) (4.36), we can trans-
form the QoS requirements γk on MMSEk (P ) to its equivalent SINR require-
ment γksinr = 1/γk −1. There are two standard approaches for solving (4.146):
• The problem is relaxed to a semidefinite program [19], whose solution is
shown to coincide with the argument of (4.146).
• The dual MAC problem is solved iteratively alternating between power
allocation and beamforming at the receiver U mac . The algorithm by Schu-
bert and Boche [185] converges to the global optimum.
In the sequel, we focus on the second solution based on the MAC-BC duality.
4.6 Optimization with Quality of Service Constraints 177
Because of the MAC-BC duality in Theorem 4.1, we can formulate the

dual (virtual) MAC problem
mac
min kpmac k22 s.t. SINRk (umac
k ,p
mac
) ≥ γksinr , k = 1, 2, . . . , K,
pmac ,U mac
(4.147)
which can now be solved with the algorithm in [185]. Note that (4.146) may
be infeasible, which is detected by the algorithm.
Alternatively, we can maximize the margin to the QoS requirement γk for
given total power constraint Ptx
min γ0 s.t. kP k2F ≤ Ptx ,

P ,γ0
MMSEk (P ) ≤ γ0 γk , k = 1, 2, . . . , K. (4.148)
If this problem is feasible, the optimum variable γ0 describing the margin lies
in 0 ≤ γ0 ≤ 1. Otherwise, the problem is infeasible. The solution is equivalent
to minimizing the normalized performance of the worst receiver
MMSEk (P )
min max s.t. kP k2F ≤ Ptx . (4.149)
P k∈{1,...,K} γk
which can be shown [185] to yield a balancing among the relative MSEs
MMSEk (P )/γk = γ0 . For unequal γk , its optimum differs from balancing
SINRk (P ).
The corresponding problem in the dual MAC is
min γ0 s.t. kpmac k22 ≤ Ptx , (4.150)

pmac ,U mac ,γ0
mac
MMSEk (umac
k ,p
mac
) ≤ γ0 γk , k = 1, 2, . . . , K
mac
with 0 ≤ γ0 ≤ 1 if feasible and MMSEk (umac k ,p
mac
) = 1/(1 +
mac mac mac
SINRk (uk , p )) (4.122). If γk = γ, balancing of MMSEk is identi-
cal to balancing of SINRk . Therefore, the algorithm by [185] gives a solution
for equal γk = γ, which yields the optimum beamforming vectors for the
BC based on the dual MAC problem and computes the optimum power al-
location in the BC in a final step. Alternatively, we can use the BC-MAC
transformation from Section 4.5.1.
4.6.2 Incomplete CSI at Receivers: Common Training

Channel
For incomplete CSI at the receivers, which results from a common train-
ing channel in the forward link (Sections 4.2.2), we proposed to model the
receivers’ signal processing capabilities by βk gk (hk ) in Section 4.5.2.
For S-CSI, the corresponding power minimization problem is
min kP k2F s.t. Ehk [MSEk (P , βk gk (hk ); hk )] ≤ γk , k = 1, . . . , K.

P ,{βk }K
k=1
(4.151)
The constraints in the BC are coupled in P which complicates its direct

solution.
The formulation in the dual MAC results in constraints which are decou-
pled in g mac
k (g mac
k is related to the kth column pk of P via (4.145)). The
problem in the dual MAC with pmac = [pmac 1 , . . . , pmac T
K ] is
min kpmac k22 s.t.

mac }K
pmac ,{gk k=1
Ehkmac [MSEmac
k (p
mac
, g mac mac
k ; hk )] ≤ γk , k = 1, . . . , K. (4.152)
Because of the decoupled constraints in g mac mac

k , the optimum receiver g k is
mac 2
necessary to minimize kp k2 and we can substitute the constraints by the
minimum46
min Ehkmac [MSEmac

k (p
mac
, g mac mac
k ; hk )] =
g mac
k
K
!−1
H
X
,2 ,2
=1− pmac
k Ehkmac [hkmac ] IM + pmac
i Rhimac Ehkmac [hkmac ] (4.153)
i=1
which is achieved for

K
!−1
H
X
,T ,2
g mac
k = Ehkmac [hkmac ] IM + pmac
i Rhimac pmac
k .
i=1
With the argumentation from [236, App. II], we identify an interference func-
tion as defined by Yates [239]
46
The dual MAC channel hkmac for the common training channel defined in (4.145)
is not zero-mean.
4.6 Optimization with Quality of Service Constraints 179
,2 K
1 − γk
Jk {pmac
i }i=1 ; γk = −1 .
K
P
H ,2
Ehkmac [hkmac ] IM + pmac
i Rhimac E hkmac [hkmac ]
i=1
(4.154)
The MAC problem (4.152) after optimization w.r.t. {g mac K

k }k=1 reduces to the
power allocation problem
,2 K
,2
min
mac
kpmac k22 s.t. Jk {pmac
i }i=1 ; γk ≤ pmac
k , k = 1, . . . , K. (4.155)
p
This is a classical problem in the literature.

Because all constraints are satisfied with equality in the optimum, the
optimum power allocation is the solution of the K nonlinear equations
,2 K
,2
Jk {pmac
i }i=1 ; γk = pmac
k . (4.156)
Yates [239] showed that the fixed point iteration on this equation con-
verges to a unique fixed point and minimizes the total power kpmac k22 if
,(m−1) K ,(m),2
the problem is feasible: Based on {pmac
i }i=1 we compute pmac
k =
(m−1),2
Jk ({pmac
i }K
;
i=1 kγ ) in the mth iteration. But its convergence is slow
and, for example, an approach similar to [185] as in [7] is recommended.
Similar to the AWGN BC model, we can also maximize the MSE margin
in the BC channel under a total power constraint
min γ0 s.t. kP k2F ≤ Ptx , (4.157)

P ,{βk }K
k=1 ,γ0
Ehk [MSEk (P , βk gk (hk ); hk )] ≤ γ0 γk , k = 1, . . . , K.
In the dual MAC the dual problem is
min γ0 s.t. kpmac k22 ≤ Ptx ,

pmac ,Gmac ,γ0
Ehkmac [MSEmac
k (p
mac
, g mac mac
k ; hk )] ≤ γ0 γk , k = 1, . . . , K
with 0 ≤ γ0 ≤ 1 for feasibility. Again, the algorithm for its solution is in

analogy to [185] (see [7] for details).
Alternatively to QoS constraints in terms of MSE expressions we can em-
ploy SINR expressions such as in (4.62) and (4.67). In Example 4.3, we illus-
trate the difference between both criteria. But the optimum precoders turn
out to be identical in an interesting case despite the different values of the
underlying SINR expressions: For µh = 0, the power minimization problem
in analogy to (4.151) and the SINR-balancing in analogy to (4.157) for iden-
tical γk = γ yield the identical precoders P no matter if the constraints
icsi phase
are based on SINRk (P ) (4.62) or SINRk (P ) (4.67). This can be proved
investigating the eigenvectors of the matrix in the power balancing step in
[185] and [36], which are identical for both SINR definitions.
Chapter 5
Nonlinear Precoding
with Partial Channel State Information
Linear precoding with independent coding of the receivers’ data does not
achieve the capacity of the vector broadcast channel (BC) for C-CSI at the
transmitter. It is achieved by successive encoding with dirty paper coding
[232]. At high SNR, the achievable sum rate of linear precoding has the
same slope (multiplexing gain) as dirty paper coding (e.g., [32, 106]). But
there is an additional power offset, which increases with the ratio of the
number of receivers to the number of transmit antennas K/M [106]. Perhaps
surprisingly, the sum capacity of the broadcast channel with non-cooperative
receivers is asymptotically (high transmit power) identical to the capacity of
the same channel, but with cooperating receivers (point-to-point multiple-
input multiple-output (MIMO) channel).
For dirty paper coding, the data streams for all receivers are jointly and
sequentially coded: The interference from previously encoded data streams is
known causally at the transmitter; thus, this knowledge can be used to code
the next data stream such that its achievable data rate is the same as if the
interference from previously coded data streams was not present (Writing on
dirty paper [39]). Some practical aspects and an algorithm with performance
close to capacity, which relies on dirty paper coding, is discussed in [210].
An implementation of writing on dirty paper for a single-input single-output
(SISO) system with interference known at the transmitter is given by Erez
and ten Brink [65].
The nonlinear precoding schemes considered in this chapter attempt to
fill the performance gap between linear precoding and dirty paper coding,
but with a smaller computational complexity than dirty paper coding. They
are suboptimum, because they do not code the data sequence in time, i.e.,
they are “one-dimensional” [241].
The most prominent representative is Tomlinson-Harashima precoding
(THP), which is proposed for the broadcast channel by [82, 241, 71]; orig-
inally, it was invented for mitigating intersymbol interference [213, 89]. In
181
182 5 Nonlinear Precoding with Partial Channel State Information
[241] it is considered as the one-dimensional implementation of dirty paper

coding. The first approach aims at a complete cancelation of the interference
(zero-forcing) [71], which does not consider the variance of the receivers’
noise. Joham et al.[109, 115, 108] propose a mean square error (MSE) based
optimization, which includes optimization of the precoding order, a formula-
tion for frequency selective channels based on finite impulse response filters,
and optimization of the latency time. It is extended to a more general receiver
model [100, 102] and to multiple antennas at the receivers [145]. An efficient
implementation can be achieved using the symmetrically permuted Cholesky
factorization, whose computational complexity is of the same (cubic) order as
for linear MSE-based precoding [126, 127]. Optimization with quality service
constraints is introduced in [187] for SINR and in [145] based on MSE. Du-
ality between the multiple access channel (MAC) and the broadcast channel
is proved in [230] for SINR and in [145] for MSE.
In [93] a more general structure of the precoder is introduced: A pertur-
bation signal is optimized which is element of a multidimensional (in space)
rectangular lattice. According to [184] we refer to this structure as vector
precoding. As for THP, zero-forcing precoders were discussed first and the
regularization which is typical for MSE-based schemes was introduced in
an ad-hoc fashion. Typically, these zero-forcing precoders are outperformed
in bit error rate (BER) by the less complex MMSE THP [184] for small to
medium SNR. Finally, Schmidt et al.[184] introduce minimum MSE (MMSE)
vector precoding. Contrary to THP, the general vector precoding achieves the
full diversity order M of the system [209]. The computational complexity is
determined by the algorithm for the search on the lattice; a suboptimum
implementation requires a quartic order [237]. In [174], THP is shown to be
a special case of MMSE vector precoding and a transition between both is
proposed which allows for a further reduction of complexity.
For more information about important related issues such as lattices, shap-
ing, lattice codes, and the capacity of a modulo channel see [69, 70] (they are
not a prerequisite in the sequel).
The effect of small errors in CSI on zero-forcing THP is analyzed in [10].
But a zero-forcing design is known to be more sensitive to parameter errors
than a regularized or MMSE optimization. Moreover, the uncertainty in CSI
at the transmitter may be large because of the mobility in the wireless system
which leads to a time-variant channel. It can be large enough that a robust
optimization (Appendix C) is advantageous.
A robust optimization of zero-forcing THP is introduced by the author
in [98], which includes MMSE prediction of the channel. It is generalized to
MMSE THP in [57]. Both approaches are based on the Bayesian paradigm
(Appendix C) and assume implicitly the same CSI at the transmitter as at
the receiver. But the CSI is highly asymmetric, which has to be modeled
appropriately. In [50, 49] the receivers’ CSI is taken into account explicitly
in the optimization which also yields a THP solution for S-CSI. For the
definition of different categories of CSI see Table 4.1.
5.1 From Vector Precoding to Tomlinson-Harashima Precoding 183
Fig. 5.1 Broadcast channel with nonlinear receivers, which include a modulo oper-
ator M(b̂k [n]) (Figure 5.3).
Besides the development for the broadcast channel, THP based on P-CSI
for cooperative receivers is treated in [135] for a SISO system. A heuristic
solution is proposed for MIMO in [72]; for S-CSI another approach is given
in [190].
Outline of Chapter
First, we derive the structure of the nonlinear vector precoder in Section 5.1.
This includes the derivation of MMSE vector precoding and its relation to
MMSE THP for C-CSI as an example. Optimization of nonlinear precoding
with P-CSI requires new performance measures, which we introduce in Sec-
tion 5.2 using the receiver models in analogy to linear precoding with P-CSI
(Section 4.3). Conceptually, we also distinguish between vector precoding
and THP. For these performance measures, we derive iterative algorithms
minimizing the overall system efficiency, e.g., the sum MSE, and discuss the
results for different examples (Section 5.3). For S-CSI, nonlinear precoding
can be understood as nonlinear beamforming for communications. Precoding
for the training channel to provide the receivers with the necessary CSI is
addressed in Section 5.4. A detailed discussion of the performance for P-CSI
and S-CSI is given in Section 5.5, which is based on the uncoded bit error
rate (BER).
5.1 From Vector Precoding to Tomlinson-Harashima

Precoding
To enable nonlinear precoding we have to introduce nonlinear receivers. The

following modified receiver architecture (Figure 5.1) generates the necessary
Fig. 5.2 Broadcast channel with nonlinear receivers in matrix-vector notation and
representation of modulo operation by an auxiliary vector â [n] ∈ LK .
Fig. 5.3 Graph of the modulo-operator M(x) for real-valued arguments x ∈ R.
additional degrees of freedom, which can be exploited for nonlinear precod-

ing:1
• A modulo operator M(b̂k [n]) with parameter τ (Figure 5.3 plots the func-
tion for real-valued input) is applied to the output of the linear processing
(scaling) gk ∈ C at the receivers. This nonlinearity of the receivers, which
creates a modulo channel to every receiver, offers additional degrees of
freedom for precoder design.2
• A nearest-neighbor quantizer Q(x) = argminy∈D |x − y|2 serves as a
simple example3 for the decision process on the transmitted data. It
delivers hard decisions d̂k [n] ∈ D on the transmitted symbols d[n] =
[d1 , [n], d2 [n], . . . , dK [n]]T ∈ DK . It is the ML detector for uncoded data in
D and a linear channel (without modulo operator) with gk = 1/(hT k pk ) as
in Figure 4.3, but it is not the optimum detector for the nonlinear modulo
channel anymore due to the non-Gaussianity of the noise after the modulo
operator. To simplify the receiver architecture we live with its suboptimal-
1
The presentation in this section follows closely that of Joham et al.[114, 113].
2
An information theoretic discussion of the general modulo channel compared to a
linear AWGN channel is given in [70, 42].
3
The decoding of coded data is not considered here.
Fig. 5.4 Voronoi region V (shaded area), lattice L (filled circles), QPSK constel-
lation D with variance one (4 different markers), and periodically repeated QPSK
constellation for perturbations a ∈ L.
ity. Contrary to the modulo operation, the quantizer will not influence our
precoder design.4
The transmit signal xtx [n] ∈ CM is a nonlinear function of the modulated
data d[n] ∈ DK . To restrict the functional structure of the nonlinear precoder
and to introduce a parameterization, we have to analyze the properties of the
modulo operators at the receivers generating a nonlinear modulo channel.
The modulo operation (Figure 5.3) on x ∈ C can also be expressed as

Re(x) 1 Im(x) 1
M(x) = x − τ + − jτ + ∈ V, (5.1)
τ 2 τ 2
where the floor operation ⌊•⌋ gives the largest integer smaller than or equal
to the argument and V = {x + jy|x, y ∈ [−τ /2, τ /2)} is the fundamental
Voronoi region5 of the lattice L = τ Z + jτ Z corresponding to the modulo
operation (Figure 5.4).
In the matrix-vector notation of the BC (Figure 5.2), the modulo op-
eration on a vector M(b̂[n]) is defined element-wise, i.e., M(b̂[n]) =
[M(b̂1 [n]), . . . , M(b̂K [n])]T ∈ VK . As in Figure 5.2, the modulo operation can
also be formulated with an auxiliary vector â[n] ∈ LK on the 2K-dimensional
lattice LK = τ ZK + jτ ZK .
The representation with an auxiliary variable a ∈ L shows an important
property of the modulo operator: For any x ∈ C, we have M(x + a) = M(x)
4
In Figure 4.3 the symbol estimates d̂k [n] ∈ C are complex numbers; the quantizer is
omitted, because the linear precoder based on MSE is optimized for the linear model.
5
We introduce the common terminology for completeness, but do not require a deeper
knowledge of lattice theory.
if a ∈ L. The precoder can generate any signal b̂k [n] + ak [n] at the receiver
with ak [n] ∈ L, which yields the same same signal at the modulo operator’s
output as b̂k [n]. For example, the estimates of the transmitted data dˆk [n] are
identical to dk [n], if the input of the modulo operator is b̂k [n] = dk [n] + ak [n]
for ak [n] ∈ L. Because of this property, it is sufficient to optimize the precoder
such that b̂k [n] ≈ dk [n]+ak [n] (virtual desired signal [113]) is achieved. These
additional degrees of freedom can be exploited by the precoder.
This observation allows for an optimization of the precoder based on the
linear part of the models in Figures 5.1 and 5.2. This is not necessary, but
simplifies the design process considerably, since the alternative is an opti-
mization including the nonlinearity of the modulo operator or, additionally,
the quantizer.
The corresponding structure of the transmitter is given in Figure 5.5, which
is referred to as vector precoder [184]. Its degrees of freedom are a linear
precoder T ∈ CM ×K and a perturbation vector a[n] ∈ LK on the lattice.
To define the receivers it remains to specify the parameter τ . It is chosen
sufficiently large such that D ⊂ V. Because of this restriction on τ , any input
from D is not changed by the modulo operation and the nearest-neighbor
quantizer Q(•) is applicable, which we have introduced already. We choose τ
such that the symbol constellation seen by the receiver in b̂k [n] without noise
(cnk = 0) and distortions (e.g., GHT = I K ), is a periodic extension of D on
C (Figure 5.4): For √ QPSK symbols √ (D = {exp(jµπ/4), µ ∈ {−3, −1, 1, 3}}),
it requires τ = 2 2 and τ = 8/ 10 for rectangular 16QAM symbols with
variance cdk = 1.
In the remainder of this section, we introduce the system models for non-
linear precoding (vector precoding and Tomlinson-Harashima precoding) of
the data for the example of C-CSI at the transmitter and identical receivers
G = βI K , which allows for an explicit solution [184].
Vector Precoding
Optimization of the nonlinear precoder with structure depicted in Figure 5.5

differs from Chapter 4 in two aspects:
• If d[n] are known at the transmitter for the whole data block, i.e., for
n ∈ {1, 2, . . . , Nd }, it is not necessary to use a stochastic model for d[n]: In
Chapter 4 we assume d[n] to be a zero-mean stationary random sequence
d [n] with covariance matrix C d = I K ; but the optimization problem can
also be formulated for given {d[n]}N n=1 .
d
• The nonlinearity of the receivers induced by the modulo operations is

not modeled explicitly. The additional degrees of freedom available are
exploited considering only the linear part of the model in Figure 5.2 and
choosing dk [n] + ak [n] as the desired signal for b̂k [n].
The MSE for receiver k is defined between the input of the modulo oper-
ators b̂k [n] and the desired signal bk [n] = dk [n] + ak [n]
Nd
1 X 2
MSEVP
k T , {a[n]}N
n=1 , β; hk
d
= En b̂k [n] − bk [n] , (5.2)
Nd n=1 k
where the expectation is performed w.r.t. the noise nk [n] and the arithmetic
mean considers the given data d[n]. The dependency on {d[n]}N n=1 is not
d
denoted explicitly.
To optimize the performance of the total system we minimize the sum
MSE (similar to Section 4.3.4). The optimization problem for the precoder
with a constraint on the average transmit power for the given data block is
K
X
Nd

min MSEVP
k T , {a[n]}n=1 , β; h k s.t. tr T R̃b T H ≤ Ptx ,
Nd ,β
T ,{a[n]}n=1 k=1
(5.3)
PNd
where R̃b = n=1 b[n]b[n]H /Nd is the sample correlation matrix of b[n].
With
K Nd
X 1 X 2
Nd
MSEVP
k T , {a[n]}n=1 , β; h k = En b̂[n] − b[n] (5.4)
Nd n=1
k=1
and b̂[n] = βHT b[n] + βn[n] we obtain the explicit expression for the cost
function
K
X
MSEVP
k T , {a[n]}N 2
n=1 , β; hk = tr R̃b + |β| tr[C n ]
d
k=1

+ |β|2 tr H H HT R̃b T H − 2Re βtr HT R̃b . (5.5)
Solving (5.3) for T and β first, we get (Appendix D.1)

−1
tr[C n ]
T =β −1
H H+ H
IM HH
Ptx
−1 (5.6)
tr[C n ]
= β −1 H H HH H + IK .
Ptx
The scaling is required to be

" −2 #
tr[C ] .
n
β 2 = tr R̃b H H H H + IM HH Ptx (5.7)
Ptx
Fig. 5.5 Nonlinear precoding with a perturbation vector a[n] ∈ τ LK + jτ LK on a

2K dimensional lattice for every d[n] before (linear) precoding with T .
Fig. 5.6 Nonlinear precoding with equivalent representation of T as T = P (I K −

F )−1 Π (O) .
Fig. 5.7 Tomlinson-Harashima Precoding (THP) with ordering Π (O) : Equivalent

representation of Figure 5.6 with modulo operator M(•) instead of a perturbation
vector a[n] for the restricted search on the lattice.
to satisfy the transmit power constraint. The minimum of (5.3) is a spe-

cial case of (D.14) and reads (constraining T to meet the transmit power
constraint)
K Nd
X Nd tr[C n ] X
min MSEVP
k (T , {a[n]}n=1 , β; hk ) = ka[n] + d[n]k2Φ−1 (5.8)
T ,β Nd Ptx n=1
k=1
with weighting matrix Φ = HH H + tr[C n]

Ptx I K .
Optimization w.r.t. {a[n]}Nn=1 is equivalent to a complete search on the
d
2K-dimensional (real-valued) lattice generated by τ Φ−1/2 [184], i.e., the

search for the closest point to −Φ−1/2 d[n]. The computational complexity
for this search is exponential in the number of receivers K, but it can be
approximated resulting in a complexity of O K 4 [174].
Restricted Search on Lattice
We define the Cholesky decomposition of the permuted matrix

Π (O) Φ−1 Π (O),T = LH ∆L (5.9)

PK
with permutation matrix Π (O) = i=1 ei eT pi , Π
(O),T
= Π (O),−1 , diagonal
matrix ∆ = diag [δ1 , . . . , δK ], and lower triangular matrix L with unit di-
agonal. It is referred to as symmetrically permuted Cholesky factorization
[84, p. 148], which can be applied to find a good permutation O efficiently
[126, 127]. The permutation is defined by the tuple O = [p1 , p2 , . . . , pK ] with
pi ∈ {1, 2, . . . , K} (the data of the the pi th receiver is coded in the ith step).
The transformation (5.6) can be factorized as6
T = P (I K − F )−1 Π (O) (5.10)
with
P = β −1 H H Π (O),T LH ∆ (5.11)
−1
F = IK − L . (5.12)
This factorization yields an equivalent representation of the vector precoder

depicted in Figure 5.6.
With (5.9) minimization of (5.8) reads
2
min ∆1/2 LΠ (O) (d[n] + a[n]) . (5.13)
a[n]∈LK 2
An approach to reduce the computational complexity of solving (5.13) is a

restriction of the search on the lattice. We use the heuristic presented in
[175], which will lead us to THP. It exploits the lower triangular structure
of L: The kth element of LΠ (O) a[n] depends only on the first k elements of
a(O) [n] = Π (O) a[n]. We start with i = 1 and optimize for the ith element of
a(O) [n], which corresponds to the pi th element of a[n] due to epi = Π (O),T ei .
It is obtained minimizing the norm of the ith component of the Euclidean
norm in (5.13)
2
api [n] = argmin eT
i ∆
1/2
L Π (O) d[n] + a(O) [n] (5.14)
āpi [n]∈L
given the i − 1 previous elements
a(O) [n] = [ap1 [n], . . . , api−1 [n], āpi [n], ∗, . . . , ∗]T . (5.15)
The last K − i elements are arbitrary due to the structure of L.

We would like to express (5.14) by a modulo operation. Using (5.12), we
can rewrite
6
This factorization is a heuristic, which is inspired by the structure of the solution
for THP (D.68) [108, 115].
1/2 T 1/2
eT
i ∆
1/2
L = δi ei (I K − F )−1 = δi eTi (I K − F + F ) (I K − F )−1
1/2 T −1

= δi ei I K + F (I K − F )
(5.16)
Defining the signal w[n] = (I K − F )−1 Π (O) b[n] problem (5.14) can be ex-
pressed in terms of a quantizer QL (x) to the point on the 2-dimensional lattice
L closest to x ∈ C:
2
api [n] = argmin δi dpi [n] + āpi [n] + eT
i F w[n]
āpi [n]∈L (5.17)

= −QL dpi [n] + eT
i F w[n] .
Note that wi [n] = dpi [n] + āpi [n] + eT

i F w[n], which is the output of the
modulo operator.
Since the first row of F is zero, we have bp1 [n] = dp1 [n] for i = 1, i.e.,
ap1 [n] = 0. Using the relation QL (x) = x − M(x) the perturbation signal is

Π (O) a[n] = M Π (O) d[n] + F w[n] − Π (O) d[n] − F w[n] (5.18)
and the output of feedback loop in Figure 5.6 can be expressed by the modulo
operation

w[n] = Π (O) b[n] + F w[n] = M Π (O) d[n] + F w[n] . (5.19)
This relation yields the representation of the vector precoder in Figure 5.7,
which is the well known structure of THP for the BC. It is equivalent to
Figures 5.5 and 5.6 if the lattice search for a[n] is restricted as in (5.14).
PNd
Note that the sample correlation matrix R̃w = n=1 w[n]w[n]H /Nd of
w[n] is not diagonal in general because of
R̃w = (I K − F )−1 R̃b (I K − F )−H . (5.20)
Traditionally, a diagonal R̃w is assumed to optimize THP [69, 108]: It is

obtained modeling the outputs a priori as statistically independent random
variables which are uniformly distributed on V [69]. With this assumption,
THP based on sum MSE and G = βI K yields a solution for the filters P and
F [115, 108] (see also Appendix D.6) which is identical to (5.11) and (5.12)
except for a different scaling of P .7 In the original derivation of THP the
scaling β is obtained from the stochastic model for w[n]. In the alternative
derivation of THP above this assumption is not necessary and we have a
solution for which the transmit power constraint for the given data holds
exactly.
7
This can be proved comparing (5.11) and (5.12) with the solutions given in [127].
Example 5.1. For large Ptx , the linear filter converges to T =

β −1 H H (HH H )−1 , i.e., it inverts the channel (zero-forcing). The lattice
search is performed for the weight Φ = HH H , i.e., the skewness of the
resulting lattice is determined by the degree of orthogonality between the
rows of H.
This high-SNR solution can also be determined from (5.3) including the
additional zero-forcing constraint βHT = I K . Applying the constraint to the
cost function (5.5) directly, the problem is equivalent to minimizing β 2 under
the same constraints (transmit power and zero-forcing): The necessary scaling
β of T to satisfy the transmit power constraint is reduced; therefore the noise
variance in b̂[n] (Figure 5.1) at the receivers is minimized. The minimum
scaling is β 2 = tr[R̃b (HH H )−1 ]/Ptx . Note that without a transmit power
constraint, this solution is equivalent to minimizing the required transmit
power (see also [93]).
For small Ptx , the diagonal matrix in Φ dominates and we have approx-
imately L = I K and F = 0. Thus, for diagonal Φ the lattice search yields
a[n] = 0, i.e., linear precoding of the data. ⊓
⊔
Optimization of Precoding Order
Optimization of the precoding order would require to check all K! possibilities

for O to minimize (5.13) for the restricted search on the lattice (5.14), which
is rather complex. The standard suboptimum solution is based on a greedy
algorithm which can be formulated in terms of the symmetrically permuted
Cholesky factorization (5.9). The first step of this factorization (5.9) (cf. [84,
p. 148] and [127]) with an upper triangular LH reads

(O) −1 (O),T Ψ K lK I K−1 lK /δK Φ−1 K−1 0 I K−1 0
Π ΦK Π = H =
l K δK 0 1 0 δK l H K /δK 1
(5.21)
with Φ−1K =Φ
−1
and Φ−1 H
K−1 = Ψ K − lK lK /δK .
In the first iteration we are free to choose pK in the permutation matrix
Π (O) , i.e., δK corresponding to the maximum or minimum element on the
diagonal of Φ−1 . Thus, in the first step we decide for the data stream to be
precoded last. We decide to choose pK as the index of the smallest diagonal
element of Φ−1 , which has the smallest contribution (5.14) in (5.13): The
best data stream is precoded last. Similarly, we proceed with Φ−1 K−1 .
This choice of the best data stream becomes clear in the following example:
Example 5.2. The first column of P (5.11) is p1 = β −1 h∗p1 δ1 , which is a
matched filter w.r.t. the channel of the p1 th receiver precoded first. On the
other hand, the last column of P is pK = β −1 H H Π (O),T LH eK δK . At high
Ptx we have HpK = β −1 Π (O),T L−1 eK = β −1 epK , because
pK = β −1 H H Π (O),T LH ∆eK
= β −1 H H Φ−1 Π (O),T L−1 eK
= β −1 H H (HH H )−1 Π (O),T L−1 eK
from (5.9) for K ≤ M : Interference from the pK th data stream to all other
receivers is avoided; the performance gain in MSE is largest if we choose the
receiver with the best channel to be precoded last, i.e., it creates the smallest
contribution to the MSE. ⊓
⊔
Because the precoding order is determined while computing the symmet-
rically permuted Cholesky factorization (5.9), the computational complexity,
−1
which results from computing the inverse Φ and the Cholesky factoriza-
tion (5.9), is of order O K 3 . Note that this is the same order of complexity
as for linear precoding in Section 4.4 (see [127] for details).
5.2 Performance Measures for Partial Channel State

Information
In the previous section we introduce the system models for nonlinear precod-
ing (vector precoding in Figure 5.5 and THP in Figure 5.7) assuming C-CSI
at the transmitter and identical receivers G = βI K . In the sequel we consider
the general case:
• We assume P-CSI at the transmitter (Table 4.1) based on the model of
the reverse link training channel in a TDD system (Section 4.1) and
• different scaling at the receivers G = diag [g1 , . . . , gK ].
The MSE criterion for optimizing the scaling gk prior to the modulo operation
at the receivers is shown to be necessary to reach the Shannon capacity for
the AWGN channel on a single link with a modulo receiver (modulo channel)
as in Figure 5.1 [42]. Of course, although the MSE criterion is necessary for
optimizing the receivers, a capacity achieving precoding (coding) is not based
on MSE.
Because only few information theoretic results about the broadcast channel
with modulo receivers or P-CSI at the transmitter are known, the following
argumentation is pursued from the point of view of estimation theory based
on MSE. The steps are motivated by linear precoding in Section 4.3.
We first discuss the cost functions for optimization of nonlinear precod-
ing with P-CSI assuming the optimum MMSE scaling at the receivers (Sec-
tion 5.2.1). To obtain a performance criterion which can be given explicitly,
we assume a scaled matched filter at the receivers in Section 5.2.2 for op-
timization of precoding. Optimization of vector precoding (Figure 5.5) and
THP (Figure 5.7) are treated separately.
Note that a relation between vector precoding and THP, which we present
in Section 5.1, has not been found yet for the general cost functions below.
5.2.1 MMSE Receivers
Vector Precoding
Given hk , the MSE between the signal at the input of the modulo operator
b̂k [n] = gk hTk T b[n] + gk nk [n] (Figure 5.1) and the (virtual) desired signal
bk [n] = dk [n] + ak [n] (Figure 5.5) is
Nd
1 X 2
MSEVP
k T , {a[n]}N
n=1 , gk ; hk
d
= En b̂k [n] − bk [n] (5.22)
Nd n=1 k
= r̃bk + |gk |2 cnk + |gk |2 hT H ∗
k T R̃b T hk −

− 2Re gk hT k T r̃ bbk
with sample crosscorrelation r̃ bbk = R̃b ek and second order moment

r̃bk = [R̃b ]k,k given the data {d[n]}N n=1 to be transmitted (R̃b =
d
PNd H
n=1 b[n]b[n] /Nd ).
Because the receivers are assumed to have perfect knowledge of the neces-
sary system parameters, given this knowledge the minimum of (5.22) w.r.t.
gk is

MMSEVP k T , {a[n]}Nd VP
n=1 ; hk = min MSEk T , {a[n]}N
n=1 , gk ; hk
d
(5.23)
gk
|hTk T r̃ bbk |
2
= r̃bk − T ∗
.
hk T R̃b T H hk + cnk
The MMSE receiver

(hTk T r̃ bbk )
∗
gk = gmmse
k (hk ) = (5.24)
hT H ∗
k T R̃b T hk + cnk
is realizable if hT
k T r̃ bbk can be estimated at the receivers which is discussed
in Section 5.4. The denominator in (5.24) is the variance of the receive signal
crk .
Optimization of vector precoding is based on the conditional mean esti-
mate of (5.23) (compare Section 4.3.1 and Appendix C)
h i
Ehk |yT MMSEVP
k T , {a[n]}N
n=1 ; hk
d
. (5.25)
An explicit expression is not known due to the ratio of quadratic forms with
correlated non-zero mean random vectors hk in (5.23).8
Tomlinson-Harashima Precoding
The MSE for THP (Figure 5.7) is derived from its equivalent representation
in Figure 5.6. The nonlinearity of the precoder, i.e., the modulo operators or,
equivalently, the restricted lattice search, is described by a stochastic model
for the output w [n] of the modulo operators at the transmitter. In [69] it
is shown that the outputs can be approximated as statistically independent,
zero-mean, and uniformly distributed on V, which results in the covariance
matrix
C w = diag [cw1 , . . . , cwK ] (5.26)
with cw1 = cd1 = 1 and cwk = τ 2 /6, k > 1 (wk [n] is the kth element of w [n]).
It is a good model for C-CSI and zero-forcing THP (MMSE THP at high Ptx )
[108, 69], where the approximation becomes more exact as the constellation
size of the modulation alphabet increases. We make this assumption also for
the MSE criterion and P-CSI, where it is ad-hoc. Its validity is discussed in
Section 5.5.
The feedback filter F is constrained to be lower triangular with zero diag-
onal to be realizable; this requirement is often referred to as spatial causality.
For the kth receiver, the MSE is defined as

2
MSETHP
k P , f̃ ,
ok kg ; h k = Ew ,nk b̂k [n] − bk [n] , (5.27)
where the expectation is now also taken w.r.t. w [n]. We define the inverse
function of k = pi as i = ok , i.e., the kth receiver’s data stream is the ok th
data stream to be precoded, and the response of the permutation matrix
T
eok = Π (O) ek and epi = Π (O),T ei . Moreover, the ith row of F is denoted f̃ i .
With the expressions for the signals related to the kth receiver’s data
stream
b[n] = Π (O),T (I K − F )w [n] (5.28)

T
bk [n] = (eT
ok − f̃ ok )w [n] (5.29)
b̂k [n] = gk hT
k P w [n] + gk nk [n] (5.30)
the MSE (5.27) reads
8
This is also the case for linear precoding (cf. (4.31)).

T 2
MSETHP
k
2
P , f̃ ok , gk ; hk = |gk | cnk + Ew gk hT
kP − eT
ok + f̃ ok w [n]
K
X 2
= |gk |2 cnk + cwi gk hT T
k pi − eok ei + fok ,i
i=1
(5.31)
with fok ,i = [F ]ok ,i (n[n] and w [n] are uncorrelated). Before treating the case
of P-CSI, we discuss the MSE for C-CSI as an example.
Example 5.3. Because transmitter and receiver have the same CSI (C-CSI),
we can start optimizing for f̃ ok keeping O, P , and gk fixed. Minimizing (5.31)
w.r.t. f̃ ok under the constraint on F to be lower triangular with zero diagonal,
we obtain fok ,i = −gk hT k pi , i < ok , and fok ,i = 0, i ≥ ok . The feedback filter
for the kth data stream removes all interference from previously encoded data
streams in the cost function, because
T
f̃ ok = −gk hT
k [p1 , . . . , pok −1 ] , 01×(K−ok +1) . (5.32)
The minimum is
K
X

min MSETHP
k
2
P , f̃ ok , gk ; hk = cwok + |gk | cnk + cwi |gk |2 |hT
k pi |
2
f̃ o
k i=ok

− 2cwok Re gk hT
k pok . (5.33)
Minimization w.r.t. the receiver gk yields

K
!
∗ X
gk = cwok hT
k pok cwi |hT
k pi |
2
+ cnk . (5.34)
i=ok
The minimum can be expressed in terms of
cwok |hT
k pok |
2
SINRTHP
k (P ; hk ) = PK (5.35)
T 2
i=ok +1 cwi |hk pi | + cnk
as
1
min MSETHP
k P , f̃ ok , gk ; hk = cwok . (5.36)
f̃ o ,gk
k
1+ SINRTHP
k (P ; hk )
Therefore, the interference from other data streams, which has been removed
T
by f̃ ok already, is not considered anymore for optimization of the forward
filter P .
Because of the structure of SINRTHPk (P ; hk ), any value of SINRTHP
k is
achievable if Ptx is arbitrary, i.e., Ptx → ∞ results in arbitrarily large
Fig. 5.8 Representation of THP with identical estimation error ŵ[n]−w[n] = b̂[n]−
b[n] as in Figure 5.6.
SINRTHP
k — even for K > M (e.g., [186]). This is impossible for linear pre-
coding. To verify this, consider SINRTHP k (P ; hk ) for ok = K, i.e., the receiver
to be precoded last: Any SINRTHP k (P ; h T 2
k ) = cwK |hk pK | /cnk can be en-
sured only by choosing the necessary power allocation kpK k2 ; For receiver
oi = K −1, we have SINRTHP i (P ; hi ) = cwK−1 |hT 2 T 2
i pK−1 | /(cwK |hi pK | +cni )
and any value can be achieved choosing kpK−1 k2 .
The SINR definition (5.35) also parameterizes the sum-capacity of the BC
with C-CSI, which can be achieved by dirty paper precoding [230] (see also
[186]).9 We will see that it is generally not possible to derive a relation similar
to (5.36) for P-CSI from MSE.
The representation of THP given in Figure 5.8 is equivalent to Figure 5.6
regarding the error signal used for the MSE definition: the error ŵk [n]−wk [n]
in Figure 5.8 is identical to b̂k [n] − bk [n] based on Figure 5.6. Figure 5.8 gives
the intuition for interpreting the solution for THP with C-CSI: The precoder
is optimized as if w[n] was partially known to the receiver via the lower
triangular filter F . ⊓
⊔
For P-CSI at the transmitter, we have to optimize gk first because of the
asymmetry of CSI in the system. Intuitively, we expect that a complete can-
celation of the interference from already precoded data streams is impossible
for P-CSI: From Figure 5.8 we note that F cannot cancel the interference
completely, if it does not know the channel H perfectly.
The MMSE receiver is given by
∗
∗
hT
k P C w eo k
− f̃ ok
gk = gmmse
k (hk ) = (5.37)
crk
PK
with variance of the receive signal crk = i=1 cwi |hT 2
k pi | + cnk . The THP
parameters are optimized based on the conditional mean estimate

Ehk |yT MMSETHPk P , f̃ ok ; hk (5.38)
9
{cwi kpi k22 }K
i=1 are the variances of independent data streams, which have to be
optimized to achieve sum capacity.
of the minimum MSE achieved by the kth receiver’s processing (5.37)

MMSETHP
k P , f̃ ok ; hk = min MSETHP
k P , f̃ ok , gk ; hk
gk

T T C w P h∗k hT
k P Cw ∗

= eok − f̃ ok Cw − eok − f̃ ok . (5.39)
crk
Although the conditional expectation cannot be computed in closed form,

an explicit solution for f̃ ok is possible (constraining its last K − ok + 1 ele-
ments to be zero), because the matrix in the quadratic form (5.39) is inde-
pendent of f̃ ok . But optimization w.r.t. P under the total power constraint
tr[P C w P H ] ≤ Ptx remains difficult.
Example 5.4. For rank[C hk ] = 1, define hk = xk v k (compare Section 4.4.3). If
PK
cnk = 0, crk = |xk |2 i=1 cwi |v T 2
k pi | and the minimum MSE (5.39) becomes
independent of xk : The MMSE receiver (5.37) is inversely proportional to xk ,
i.e., gmmse
k (hk ) ∝ 1/xk , and removes the uncertainty in CSI of the effective
channel gmmse
k (hk )hk , which is now completely known to the transmitter.
The minimum of the cost function (5.38) has the same form as for C-CSI
1
min MMSETHP
k P , f̃ ok ; hk = cwok THP
(5.40)
f̃ o
k 1 + SIRk (P )
for cnk = 0 and SIR (signal-to-interference ratio) defined by
THP cwo |v T
k pok |
2
SIRk (P ) = PK k . (5.41)
T 2
i=ok +1 cwi |v k pi |
THP
As for C-CSI, any SIRk (P ) is feasible and the resulting MSE can be made
arbitrarily small at the expense of an increasing Ptx . This is true even for
K > M. ⊓
⊔
5.2.2 Scaled Matched Filter Receivers
Because of the difficulties to obtain a tractable cost function based on the

assumption of MMSE receivers, which are also motivated by results in in-
formation theory, we follow a similar strategy as for linear precoding in Sec-
tion 4.3.2: The receivers gk are modeled by scaled matched filters. This
approximation is made to reduce the computational complexity. In case the
true gk applied at the receivers minimize the MSE (5.22) or (5.27), the scaled
matched filters are only an approximation; this model mismatch results in a
performance loss (Section 5.5).
Vector Precoding
For vector precoding (Figure 5.5), the scaled matched filter (compare
to (4.47))
(hT
k T r̃ bbk )
∗
gcor
k (hk ) = (5.42)
c̄rk
with mean variance of the receive signal c̄rk = tr[T R̃b T H R∗hk |yT ] + cnk mini-
mizes

CORVPk T , {a[n]}Nd 2 T
n=1 , gk ; hk = r̃bk + |gk | c̄rk − 2Re gk hk T r̃ bbk . (5.43)
Maximizing −CORVP k can be interpreted as maximization of the real part

of the (sample) crosscorrelation gk hT k T r̃ bbk between b̂k [n] and bk [n] with
regularization of the norm |gk |2 and regularization parameter c̄rk .
The minimum of (5.43) is
|hT
k T r̃ bbk |
2
MCORVP
k T , {a[n]}N
n=1 ; hk = r̃bk −
d
, (5.44)
c̄rk
which may be negative. But its conditional mean estimate
h i r̃ H H ∗
bbk T Rhk |y T T r̃ bbk
Ehk |yT MCORVP
k T , {a[n]}N
n=1 ; hk
d
= r̃bk − ≥0
c̄rk
(5.45)
is always positive. It is the transmitter’s estimate of the receiver’s cost re-

sulting from the scalar matched filter (5.42).
With the same model for w [n] as in Section 5.2.1, we define the cost function
(Figure 5.6)
T
∗

CORTHP
k P , f̃ ok , gk ; hk = |gk |2 c̄rk + eTok − f̃ ok C w eok − f̃ ok
∗

− 2Re gk hTk P C w eok
− f̃ ok , (5.46)
which yields the matched filter receiver (compare to the MMSE receiver
in (5.37))
∗
∗
hT
k P C w eok − f̃ ok
gcor
k (hk ) = (5.47)
c̄rk
scaled by the mean variance of the receive signal.

K
X h i
c̄rk = cwi pH ∗ H ∗
i Rhk |y T pi + cnk = tr P C w P Rhk |y T + cnk . (5.48)
i=1

The conditional mean estimate of the minimum MCORTHP
k P , f̃ ok ; hk =

mingk CORTHP
k P , f̃ ok , gk ; hk is

Ehk |yT MCORTHP
k P , f̃ ok ; hk =
C w P R∗hk |yT P C w

T T ∗
= eok − f̃ ok Cw − eok − f̃ ok . (5.49)
c̄rk
Minimization w.r.t. the feedback filter parameters f̃ ok can be performed

explicitly, which results in an explicit cost function for optimizing P and O.
To obtain f̃ ok we minimize Ehk |yT [CORTHP k P , f̃ ok , gk (hk ); hk ] for an
arbitrary gk (hk ), which yields
T
f̃ ok = −Ehk |yT gk (hk )hkT P ok (5.50)
with P ok = [p1 , . . . , pok −1 , 0M ×(K−ok +1) ]. We apply the optimum re-

ceiver (5.47) parameterized in terms of f̃ ok to (5.50) and solve for the feedback
filter
T
−1
f̃ ok = −cwok pH R ∗
ok hk |y T P ok
c̄r k
I K − C w P H ∗
R hk |y T P ok
.
By means of the matrix inversion lemma (Appendix A.2.2)10 , it can be writ-

ten as11
oX
!−1
k −1
T H ∗ H
f̃ ok = −cwok pok c̄rk I M − Rhk |yT cwi pi pi R∗hk |yT P ok . (5.51)
i=1
Contrary to linear precoding with P-CSI (4.43) and THP with C-

CSI (5.36), a parameterization of (5.49) in terms of an SINR-like parameter,
i.e., a ratio of quadratic forms, is not possible without further assumptions.
10
It yields B(αI + AB)−1 = (αI + BA)−1 B for some matrices A, B. Here, we
set A = C w P H and B = R∗
hk |y T P ok .
11
Pok −1
Note that P ok C w P H = i=1 cwi pi pH
i because the last columns of P ok are
zero.
Example 5.5. If rank[C hk ] = 1, we have Rhk |yT = v̄ k v̄ H k where v̄ k is a scaled

version of the eigenvector v k in C hk = E[|xk |2 ]v k v H
k . Thus, (5.51) simplifies
to
T
f̃ ok = −gk′ v̄ T
k P ok (5.52)
PK
where we identify gk′ = (v̄ T pok )∗ cwok /( i=ok cwi |v̄ H 2
k pi | + cnk ) similar to
C-CSI (5.34).12 The minimum
1
min Ehk |yT MCORTHP
k P , f̃ ok ; hk = cwok THP
(5.53)
f̃ o
k 1 + SINRk (P )
can be formulated in terms of
THP cwok |v̄ T

k pok |
2
SINRk (P ) = PK T
. (5.54)
2
i=ok +1 cwi |v̄ k pi | + cnk
The solution of the feedback filter as well as the minimum of the cost function
THP
show a structure identical to C-CSI (cf. (5.32) and (5.36)). Any SINRk (P )
is achievable for an arbitrary Ptx . This is not the case for rank[C hk ] > 1.
For an optimum MMSE receiver, this behavior is only achieved for cnk = 0.
Therefore, for large Ptx and rank-one C hk , the achievable cost (5.53) with a
scaled matched filter receiver is identical to (5.40) for the optimum MMSE
receiver. But if Ptx is finite and the true receiver is an MMSE receiver, the
performance criterion (5.38) is more accurate than (5.53) and gives an ap-
propriate model for the uncertainty in CSI: This example shows that the
uncertainty in CSI is not described completely by the cost function (5.49).
⊓
⊔
5.3 Optimization Based on Sum Performance Measures
The proposed performance measures for P-CSI are either based on the op-
timum MMSE receiver or a suboptimum scaled matched filter receiver. To
optimize the performance of the overall system (similar to maximizing the
mean sum rate for linear precoding in Section 4.4), we minimize the condi-
tional mean estimate of the sum MSE resulting from an MMSE receiver or
of the sum performance measure for a scaled matched filter receiver under a
total transmit power constraint.
For vector precoding, the optimization problem is
12
We use the relation (αI + abT )−1 a = a(α + bT a)−1 , for some vectors a, b, and
Pok −1
choose a = v̄ ∗ T
k and b = v̄ k
H
i=1 pi pi and apply the definition of c̄rk .
5.3 Optimization Based on Sum Performance Measures 201
K
X h i
Nd

min Ehk |yT MMSEVP
k T , {a[n]}n=1 ; hk s.t. tr T R̃b T H ≤ Ptx
d N
T ,{a[n]}n=1 k=1
(5.55)
based on (5.25) and

K
X h i
Nd

min Ehk |yT MCORVP
k T , {a[n]}n=1 ; hk s.t. tr T R̃b T H ≤ Ptx
d N
T ,{a[n]}n=1 k=1
(5.56)
for (5.45). For THP, we have (cf. (5.38))

K
X
min Ehk |yT MMSETHP
k P , f̃ ok ; hk s.t. tr P C w P H ≤ Ptx
P ,F ,O
k=1
(5.57)
and
K
X
min Ehk |yT MCORTHP
k P , f̃ ok ; hk s.t. tr P C w P H ≤ Ptx
P ,F ,O
k=1
(5.58)
from (5.49).
Both performance measures proposed for THP in Section 5.2 yield an
explicit solution for the feedback filter F in terms of P and O. But the
resulting problem (5.57) cannot be given in explicit form and an explicit
solution of (5.58) seems impossible. A standard numerical solution would
not allow for further insights. For vector precoding, the situation is more
severe, since the lattice search is very complex and it is not known whether
a heuristic as in Section 5.1 can be found.
Therefore, we resort to the same principle as for linear precoding (Sec-
tion 4.4): Given an initialization for the precoder parameters, we alternate
between optimizing the nonlinear precoder and the receiver models. Thus,
we improve performance over the initialization, obtain explicit expressions
for every iteration, and reach a stationary point of the corresponding cost
function. In contrast to linear precoding the global minimum is not reached
for C-CSI and an arbitrary initialization. This procedure also allows for a
solution of (5.55) and (5.57) by means of numerical integration.

Transmitter
Vector Precoding
For solving (5.55) as well as (5.56) iteratively, it is sufficient to consider one

cost function

FVP,(m) T , {a[n]}Nn=1 , β = tr R̃b + |β| Ḡ
d 2 (m−1)
h i (5.59)
(m−1)
+|β|2 tr R(m−1) T R̃b T H − 2Re βH G T R̃b .
It is equivalent to
K
X h i
Nd mmse,(m−1)
Ehk |yT MSEVP
k T , {a[n]}n=1 , βg k (h k ); h k (5.60)
k=1
for
h i
(m−1)
HG = Eh|yT G(m−1)(h)H
h i
R(m−1) = Eh|yT H H G(m−1)(h)H G(m−1)(h)H
XK
2
(5.61)
mmse,(m−1)
Ḡ(m−1) = Ehk |yT gk (hk ) cnk
k=1
h i
(m−1) mmse,(m−1) mmse,(m−1)
G (h) = diag g1 (h1 ), . . . , gK (hK ) .
mmse,(m−1)
Minimization of (5.60) w.r.t. the function gk (hk ), e.g., before taking
the conditional expectation, yields (5.55).
The cost function for scaled matched filter receivers
K
X h i
Nd cor,(m−1)
Ehk |yT CORVP
k T , {a[n]}n=1 , βg k (h k ); hk (5.62)
k=1
can also be written as (5.59) defining

h i
(m−1)
HG = Eh|yT G(m−1)(h)H
XK
2

cor,(m−1)
R(m−1) = Ehk |yT gk (hk ) R∗hk |yT
k=1
(5.63)
K
X
2

(m−1) cor,(m−1)
Ḡ = Ehk |yT gk (hk ) cnk
k=1
h i
cor,(m−1) cor,(m−1)
G(m−1)(h) = diag g1 (h1 ), . . . , gK (hK ) .
For this performance measure, all parameters can be given in closed form.
cor,(m−1)
For the optimum function gk (hk ) from below (cf. (5.71)), the cost
function (5.62) is identical to (5.56).
In (5.60) and (5.62) we introduced the additional degree of freedom β
to obtain an explicit solution for T and enable an optimization alternating
between transmitter and receivers.
First we set the receiver model G(m−1)(h) equal to the optimum receiver
from the previous iteration. Minimization of (5.59) w.r.t. T and β subject to
the total power constraint tr[T R̃b T H ] ≤ Ptx yields
!−1
Ḡ(m−1) (m−1),H
T (m) = β (m),−1 R(m−1) + IM HG
Ptx
!−1
Ḡ(m−1) (m−1),H
= β (m),−1 X (m−1) + IM HG Φ(m−1),−1
Ptx
 !−2 
(m−1) Ḡ(m−1) (m−1),H 
β (m),2 = tr R̃b H G R(m−1) + IM HG Ptx
Ptx
(5.64)
with
!−1
(m−1) (m−1) (m−1) Ḡ(m−1) (m−1),H
Φ = HG X + IM HG + IK. (5.65)
Ptx
and13
(m−1),H (m−1)
X (m−1) = R(m−1) − H G HG . (5.66)
This result is obtained directly from Appendix D.1 comparing (5.59)

with (D.2).
13
X (m−1) can be considered as the covariance matrix of the additional correlated
noise due to errors in CSI for P-CSI.
The minimum of (5.59)

FVP,(m) {a[n]}N
n=1 = min F
d VP,(m)
T , {a[n]}Nn=1 , β
d
T ,β

s.t. tr T R̃b T H ≤ Ptx (5.67)
is given by (D.14)
Nd
1 X 2
F VP,(m)
{a[n]}Nd
n=1 = ka[n] + d[n]kΦ(m−1),−1
Nd n=1
Nd
1 X 1 2
= ∆(m−1), 2 L(m−1) Π (O) (a[n] + d[n]) .
Nd n=1 2
(5.68)
Thus, the optimization w.r.t. the perturbation vectors {a[n]}N n=1 on the lat-
d
K
tice L is equivalent to the case of C-CSI in (5.13) based on the symmetrically
permuted Cholesky factorization
Π (O),T Φ(m−1),−1 Π (O) = L(m−1),H ∆(m−1) L(m−1) , (5.69)
where L(m−1) is lower triangular with unit diagonal.

For optimization of {a[n]}N
n=1 , we propose two algorithms:
d
• If the lattice search is too complex to be executed in every iteration, we

can perform it only in the last iteration.
• The same heuristic as proposed in Equations (5.11) to (5.14) is applied
in every iteration, which yields THP with structure in Figure 5.7. The
argumentation is identical to Equations (5.9) up to (5.21).
We follow the second approach in the sequel.
Given T (m) from (5.64) and, possibly, an update of the perturbation vec-
(m) PNd (m)
tors {a(m) [n]}N
n=1 , which is included in R̃b
d
= n=1 b [n]b(m) [n]H /Nd
(m) (m)
(b [n] = d[n] + a [n]), we optimize the receivers in the next step. Note
that we are actually looking for the optimum receiver as a function of the
unknown channel hk , which is a random vector for P-CSI at the transmitter.
For cost function (5.60), it is given by (5.24)
∗
(m)
hkT T (m) r̃ bbk
mmse,(m)
gk (hk ) = β (m),−1 (m)
(5.70)
hkT T (m) R̃b T (m),H hk∗ + cnk
and for (5.62) by (5.42)

∗
(m)
hkT T (m) r̃ bbk
cor,(m)
gk (hk ) = β (m),−1 h (m)
i . (5.71)
tr T (m) R̃b T (m),H R∗hk |yT + cnk
As for linear precoding in Section 4.4, the scaling β (m) , which was introduced
artificially in the first step of every iteration (5.64), is removed by the opti-
mum receivers. Based on the improved receivers we compute the parameters
of the general cost function (5.59) and continue with (5.64) followed by the
optimization of a[n].
The computational complexity per iteration is similar to the situation of
C-CSI (Section 5.1): With a lattice search based on the permuted Cholesky
factorization (5.69) according to Equations (5.11) up to (5.14) (in analogy
to
[127]), the total number of operations is in the order of O K 3 + M 3 . The
cubic order in M results from the inverse in Φ(m−1) (5.65).
As for linear precoding (Section 4.4.1), the expressions for MMSE receivers
require a multidimensional numerical integration, which increases the com-
plexity considerably. This is not necessary for (5.62), which can be given in
closed form as a function of the second order moments of ph|yT (h|y T ).
In summary, our iterative algorithm proceeds as follows:
• We require an initialization for the receivers ((5.70) or (5.71)) or equiva-
lently for T (0) and {a(0) [n]}N
n=1 .
d
• In every iteration, we compute the precoder parameters T (m) (5.64) and

{a(m) [n]}Nn=1 (5.68) first. The suboptimum (THP-like) computation of
d
(m) Nd
{a [n]}n=1 involves an optimization of the ordering.
• It follows an update of the receivers (5.70) or (5.71), which yields the new
cost function (5.59) for the next iteration.
• Convergence can be controlled by the distance of β (m) to 1. Alternatively,
we stop after a fixed number of iterations, i.e., for a fixed amount of com-
putational complexity.
Because the cost functions are bounded by zero, the algorithm converges
to a stationary point. An improved or equal performance is guaranteed in
every step, i.e.,
K
X h i
Ehk |yT MMSEVP
k T (m) , {a(m) [n]}N
n=1 ; hk
d
k=1
K
X h i
≤ Ehk |yT MMSEVP
k T (m−1) , {a(m−1) [n]}N
n=1 ; hk
d
(5.72)
k=1
for (5.60) and for the suboptimum matched filter receiver (5.62)
K
X h i
Nd
Ehk |yT MCORVP
k T (m)
, {a (m)
[n]}n=1 ; h k
k=1
K
X h i
≤ Ehk |yT MCORVP
k T (m−1) , {a(m−1) [n]}N
n=1 ; hk
d
. (5.73)
k=1
In both cases we have β (m) → 1 as m → ∞.

For the proposed algorithm, we have to choose an initialization of the
receivers (5.70) and (5.71) for P-CSI. We choose them based on the vector
precoder derived in Section 5.1 (G = βI K ), where we simply plug in the
conditional mean estimate Ĥ from y T for H as if it was perfectly known
(Appendix C). For S-CSI and zero mean h, we take the linear precoder in
(0)
Appendix D.3 for T (0) and set R̃b = I K .
For THP, the iterative algorithm is very similar. The general cost function
h i
FTHP,(m)(P , F , β, O) = |β|2 Ḡ(m−1) + |β|2 tr P C w P H R(m−1)
h i
(m−1)
−2Re β tr H G P C w (I K − F )H Π (O) +tr (I K − F )H C w (I K − F )
(5.74)
can be written as
h i
FTHP,(m)(P , F , β, O) = |β|2 Ḡ(m−1) + |β|2 tr P H X (m−1) P C w
2
(5.75)
(O) (m−1)
+Ew βΠ H G P − (I K − F ) w [n]
2
with definitions
h i
(m−1)
HG = Eh|yT G(m−1)(h)H (5.76)
(m−1),H (m−1)
X (m−1) = R(m−1) − H G HG , (5.77)
which are valid for both cost functions below. With appropriate definitions
of R(m−1) , Ḡ(m−1) , and G(m−1)(h), it includes the cost functions
K
X h i
mmse,(m−1)
Eh|yT MSETHP
k P , f̃ ok , βg k (h k ); h k and (5.78)
k=1
K
X h i
cor,(m−1)
Eh|yT CORTHP
k P , f̃ ok , βgk (hk ); hk , (5.79)
k=1
which result in (5.57) and (5.58), respectively, when minimizing w.r.t. the
receiver functions gk (hk ) (before taking the conditional expectation).
For (5.78), the parameters in (5.75) are defined as
h i
R(m−1) = Eh|yT H H G(m−1)(h)H G(m−1)(h)H (5.80)
K
X
2

mmse,(m−1)
Ḡ(m−1) = cnk Ehk |yT gk (hk ) (5.81)
k=1
h i
(m−1) mmse,(m−1) mmse,(m−1)
G (h) = diag g1 (h1 ), . . . , gK (hK ) . (5.82)
In this case X (m−1) (5.77) is the sum of (complex conjugate) error covari-
ance matrices for the conditional mean estimate of the effective channel
mmse,(m−1)
gk (hk )hk .
For (5.79), we obtain the explicit expressions for the parameters in (5.74)
K
X
2

cor,(m−1)
R(m−1) = Ehk |yT gk (hk ) R∗hk |yT (5.83)
k=1
K
X
2

cor,(m−1)
Ḡ(m−1) = cnk Ehk |yT gk (hk ) (5.84)
k=1
h i
cor,(m−1) cor,(m−1)
G(m−1)(h) = diag g1 (h1 ), . . . , gK (hK ) . (5.85)
Comparing the new cost function (5.75) with the cost function derived
from C-CSI, where we plug in Eh|yT G(m−1)(h)H for H, we have an addi-
tional regularization term for P . It regularizes the weighted norm of P with
weight given by X (m−1) .
The solution for F (m) , P (m) , and β (m) for a fixed ordering O and for given
receivers G(m−1)(h) is derived in Appendix D.6 with assumption (5.26). The
kth columns of F (m) and P (m) are

(m) 0k×M (m)
f k = −β (m) (O),(m−1) pk , (5.86)
Bk
and
!−1
(m) (m),−1 (m−1) Ḡ(m−1) (O),(m−1),H
pk =β Rk + IM Ak ek (5.87)
Ptx
(m−1) (O),(m−1),H (O),(m−1)

with Rk = X (m−1) + Ak Ak . For every receiver, the
permuted channel is partitioned as
" #
(O),(m−1)
(O) (m−1) Ak
Π HG = (O),(m−1) , (5.88)
Bk
(O),(m−1)
where Ak ∈ Ck×M contains the conditional mean estimate of the
(O),(m−1)
effective channels to already precoded receivers and B k ∈ C(K−k)×M
to receivers which are precoded in the k + 1th to Kth precoding step. The
regularization term with X (m−1) in (5.75) yields a non-diagonal loading in
the inverse of P (m) (5.87).
(m) (m)
The precoding order O(m) = [p1 , . . . , pK ] for this iteration is deter-
mined from the minimum FTHP,(m)(O) of (5.75), which is given by (D.72).
We observe that the Kth term in the sum (i = K) depends only on
(O),(m−1) (m−1)
AK = HG and pK and is independent of pi , i < K. Therefore,
(m)
we start with i = K and choose the best receiver to be precoded last: pK is
the argument of
!−1
(m−1) (m−1),H (m−1) Ḡ(m−1) (m−1),H
max eT
k HG X (m−1)
+ HG HG + IM HG ek ,
k Ptx
(5.89)
which minimizes the contribution of the i = Kth term to the total remaining
MSE (D.72). Similarly, we continue with the (i − 1)th term, which depends
(m)
only on pi , i ∈ {K − 1, K}, but not on pi for i < K − 2.
In summary, O(m) is found by the arguments of
!−1
(O),(m−1) (m−1) Ḡ(m−1) (O),(m−1),H
max eT
k Ai Ri + IM Ai ek (5.90)
k Ptx
for i = K, K − 1, . . . , 1, where the maximization is over k ∈ {1, . . . , K} \

(m) (m) (m)
{pi+1 , . . . , pK } and yields pi . This requires computing an inverse in every

step, which yields a computational complexity of order O K 4 .
But the minimum FTHP,(m)(O) can also be expressed in terms of the sym-
metrically permuted Cholesky factorization (5.69):
h i XK
(m−1)
FTHP,(m)(O) = tr ∆(m−1) C w = δk cwk . (5.91)
k=1
The derivation of this expression requires a different notation of the optimiza-

tion problem and is omitted here because it can be carried out in complete
analogy to [127], where it is given for C-CSI. The resulting efficient algo-
rithm yields the same ordering as (5.90). Its computational complexity for
(m)
computing F (m) , P (m) , and O(m) is O K 3 + M 3 . With pk we also obtain
(m)
its inverse function ok implicitly.
In the last step of iteration m we compute the optimum receivers as a
function of hk . They are given by (5.37)
(m),∗
∗
hkT P (m) C w eo(m) − f̃ o(m)
mmse,(m)
gk (hk ) = β (m),−1 K
k k
(5.92)
P 2
T (m)
cwi hk pi + cnk
i=1
and (5.47)
(m),∗
∗
hkT P (m) C w eo(m) − f̃ o(m)
cor,(m)
gk (hk ) = β (m),−1 h k k
i . (5.93)
tr P (m) C w P (m),H R∗hk |yT + cnk
Again, convergence to a stationary point is guaranteed and the total cost

decreases or stays constant in every iteration:
K
X h i
(m)
Ehk |yT MMSETHP
k P (m)
, f̃ o
(m) ; hk
k
k=1
K
X h i
(m−1)
≤ Ehk |yT MMSETHP
k P (m−1) , f̃ o(m−1) ; hk (5.94)
k
k=1
K
X h i
(m)
Ehk |yT MCORTHP
k P (m) , f̃ o(m) ; hk
k
k=1
K
X h i
(m−1)
≤ Ehk |yT MCORTHP
k P (m−1) , f̃ o(m−1) ; hk . (5.95)
k
k=1
Asymptotically, β (m) → 1 as m → ∞.
The initialization is chosen similar to vector precoding above: For P-CSI,
we first compute F (0) , P (0) and O(0) applying the conditional mean estimate
Ĥ from y T to the THP for C-CSI with G = βI K , which is a special case of
Appendix D.6; for S-CSI we set F (0) = 0K , Π (O) = I K , and take P (0) equal
to the linear precoder from Appendix D.3.
5.3.2 From Complete to Statistical Channel State

Information
The algorithms for optimization of vector precoding with the restricted lattice
search and THP which we derive in the previous section are closely related.
We summarize their differences and similarities:
• The optimized parameters of vector precoding depend on {d[n]}N n=1 ,
d
whereas THP is independent of the data and relies on the stochastic

model (5.26) for w [n].
• The cost functions (5.59) and (5.75) are only equivalent for non-diagonal
C w = (I K − F )−1 R̃b (I K − F )−H as in (5.20) and identical receiver
functions G(m−1)(h). For the assumption of diagonal C w and identical
G(m−1)(h), (5.75) is an approximation of (5.59) for large Ptx and large con-
stellation size D if the suboptimum THP-like lattice search is performed
for vector precoding.
• The formulation of vector precoding is more general: The lattice search
can be improved as described in [174].
• Comparing the solutions for vector precoding with the THP-like lattice
search and THP for identical G(m−1)(h), i.e., same initialization or solution
from the previous iteration, we conclude:
– For the given data, vector precoding satisfies the transmit power con-
straint exactly.
– If the solution for P (m) in THP is scaled to satisfy the transmit power
constraint, THP is identical to vector precoding, i.e., yields the same
transmit signal xtx [n]. Note that also the same precoding order is chosen
for both precoders.
Although they are closely related and the vector precoding formulation is
more general, we presented the THP formulation for the following reasons:
On the one hand it is more widely used and on the other hand its solution
allows for additional insights in the effect of the modulo operation, i.e., the
suboptimum THP-like lattice search. In the sequel, we focus on the interpre-
tation of both nonlinear precoders.
Complete Channel State Information
For C-CSI, the MMSE receiver and scaled matched filter are identical. There-
fore, the corresponding optimization problems for vector precoding or THP
are also identical. The necessary parameters depend on the channel realiza-
tion H
(m−1)
HG = G(m−1)(h)H (5.96)
(m−1) H (m−1) H (m−1) (m−1),H (m−1)
R =H G (h) G (h)H = HG HG (5.97)
K
X 2
mmse,(m−1)
Ḡ(m−1) = gk (hk ) cnk (5.98)
k=1
!
(m−1) Ptx (m−1) (m−1),H Ḡ(m−1)
Φ = HG HG + IK . (5.99)
Ḡ(m−1) Ptx
The matrix X (m−1) = 0 determining the regularization of the solution van-

ishes. For MMSE receivers, X (m−1) is the sum of the error covariance matrices
for the effective channels, which is zero for C-CSI by definition.
For vector precoding, the linear filter reads
!−1
(m−1),H (m−1) Ḡ(m−1) (m−1),H
T (m) = β (m),−1 HG HG + IM HG (5.100)
Ptx
!−1
(m−1),H (m−1) (m−1),H Ḡ(m−1)
= β (m),−1 H G HG HG + IK (5.101)
Ptx
with β (m) chosen to satisfy tr[T (m) R̃b T (m),H ] = Ptx .14 The nonlinearity of
the precoder results from the optimization of a[n].
For THP, the effect of the modulo operators is described by C w . If as-
sumption (5.26) holds, e.g., for large Ptx and size of D, THP allows for an
interpretation of the solutions and the effect of the perturbation vector.
For C-CSI, the THP solution (5.86) and (5.87) for an optimized ordering
O = O(m) simplifies to

(m) (m) 0k×M (m)
f k = −β (O),(m−1) pk (5.102)
Bk
!−1
(m) (m),−1 (O),(m−1),H (O),(m−1) Ḡ(m−1) (O),(m−1),H
pk = β Ak Ak + IM Ak ek ,
Ptx
(5.103)
(O),(m−1)
where β (m) is determined by tr[P (m) C w P (m),H ] = Ptx and Ak ∈
(O),(m−1)
Ck×M contains the channels to already precoded receivers and B k ∈
C(K−k)×M to receivers which are precoded in the (k + 1)th to Kth step.
Therefore, our choice of a[n], i.e., the modulo operations in Figure 5.7, can
be interpreted as follows:
(m)
• f k cancels the interference from the kth precoded receiver to all subse-
(O),(m−1)
quent receivers with channel B k .
14
For G(m−1)(h) = I K we obtain our result (5.6) in the introduction.
(m)
• Therefore, the forward filter pk for the kth precoded data stream has
to avoid interference only to the already precoded receivers with channel
(O),(m−1)
Ak , because the interference from the kth receiver’s data stream
has not been known in precoding steps 1 to k − 1 due to causality. Numer-
ically, this reduces the condition number of the matrix to be “inverted” in
(m)
pk .15 For k = 1, the full diversity and antenna gain is achieved because
(m)
p1 ∝ h∗p1 .
For small Ptx , this interpretation is not valid, because a[n] = 0, i.e., vector
precoding and THP are equivalent to linear precoding. If the modulo op-
erators are not removed from the receivers, which would require additional
signaling to the receivers, performance is worse than for linear precoding due
to the modulo loss [70, 108].
Partial and Statistical Channel State Information
For P-CSI and in the limit for S-CSI, the channel G(m−1)(h)H is not known
perfectly.16 Therefore, the solutions are given in terms of the conditional
(m−1)
mean estimate of the total channel H G = Eh|yT [G(m−1)(h)H] and the
(m−1)
regularization matrix X . For MMSE receivers, X (m−1) is the sum of
mmse,(m−1)
error covariance matrices for the estimates Ehk |yT [gk (hk )hk ].
To judge the effect on nonlinear precoding we discuss the THP solu-
tion (5.86) and (5.87). Contrary to C-CSI, the feedback filter F (m) cannot
cancel the interference and the forward filter also has to compensate the inter-
ference to subsequently precoded data streams, which is described by X (m−1)
for O = O(m) . As long as (5.26) is valid the effect of a[n] can be understood
(O),(m−1)
as a partial cancelation of the interference described by B k . It results
in (cf. (5.87))
(O),(m−1),H (O),(m−1) (O),(m−1),H (O),(m−1)
X (m−1) + Ak Ak = R(m−1) − B k Bk ,
(5.104)
which differs in its last term from the corresponding term in (4.82) for linear
precoding.
For K ≤ M and good channels, i.e., spatially well separated receivers,
it is likely that the precoder chooses a[n] = 0 for S-CSI — even for large
Ptx . If this is the case, our interpretation of THP is not valid and nonlinear
precoding is identical to linear precoding.
15
Note that cwk is close to one for increasing size of the constellation D, i.e., the
input of P (m) has the same variance as to a linear precoder P (Section 4.1).
16
Note that the precoder always operates with a receiver model G(m−1)(h), which
can deviate from the implemented MMSE receiver, for example, if the iterative pro-
cedure has not converged yet or the scaled matched filter model is assumed.
Example 5.6. Let us consider the example of channels with rank-one co-
variance matrices C hk . As in Section 4.4.3, the corresponding channel is
H = X V T with X = diag [x1 , . . . , xK ] and V ∈ CM ×K , whose columns
are the array steering vectors (Appendix E).
For this channel, X (m−1) (5.77) simplifies to
X (m−1) = V ∗ Ξ (m−1) V T
h i
(m−1) (m−1)
with Ξ (m−1) = diag ξ1 , . . . , ξK .
For MMSE receivers and vector precoding, we have

2
(m−1) 2 mmse,(m−1)
ξk = Exk ,hk |yT |xk | gk (hk )
h i2
mmse,(m−1)
− Exk ,hk |yT xk gk (hk ) .
From (5.24)
∗
(m−1) (m−1)
xk∗ v T
kT r̃ bbk
mmse,(m−1)
gk (hk ) = β (m−1),−1 (m−1)
,
|xk |2 v T
kT
(m−1) R̃
b T (m−1),H v ∗k + cnk
we observe that the effective channel G(m−1)(h)H is known to the transmitter

for large Ptx (small cnk ), because the receivers remove xk and the rank-1
(m−1)
channel becomes deterministic (cf. Example 5.4). From ξk = 0 follows
that the error covariance matrix X (m−1) = 0 is zero.
For scaled matched filter receivers (5.42), X (m−1) is not equal to the sum of
cor,(m−1)
error covariance matrices of the estimated channel Ehk |yT [gk (hk )hk ],
(m−1)
because R in (5.63) is not the sum of correlation matrices of
cor,(m−1) (m−1)
gk (hk )hk .17 In this case, we have ξk = 0 for all Ptx , which re-
(m−1) (m−1) (m−1),H (m−1)
sults in X = 0 and R = HG HG . But note that the
total channel is not known perfectly, since the error covariance matrix of its
estimate is not zero.
If X (m−1) = 0M ×M , the search on the lattice (5.68) is determined by
!
(m−1) Ptx (m−1) T ∗ (m−1),∗ Ḡ(m−1)
Φ = XG V V XG + IK
Ḡ(m−1) Ptx
17
With [148], X (m−1) can be shown to be equal to one of the terms in the expression
for the sum of error covariance matrices. On the other hand, R(m−1) in (5.61) for
mmse,(m−1)
MMSE receivers is the sum of correlation matrices of gk (hk )hk ; with its
(m−1)
definition in (5.66), X is identical to the sum of error covariance matrices.
(m−1)
with X G = Eh|yT [G(m−1)(h)X ] and mainly influenced by the spatial
separation of the receivers.
For rank-1 channels, we conclude:
• The channel estimation errors are modeled correctly when considering
MMSE receiver models. This yields an improved interference cancelation
at medium to high Ptx compared to the optimization based on a scaled
MF receiver.
• For Ptx → ∞, the solutions for MF receivers and MMSE receivers con-
verge.
• As for C-CSI (5.103), the advantage of nonlinear precoding for scaled
matched filter receivers compared to linear precoding becomes more ev-
(m)
ident considering THP: pk only considers interference to already pre-
coded data streams, which results in an improved condition number of the
matrix to be “inverted”. But in contrast to C-CSI, the interference cannot
be canceled completely by F (m) for finite Ptx . Still this leads to the full
antenna gain for the p1 th receiver because p1 ∝ v ∗p1 . Note that this inter-
pretation is not valid anymore, if a[n] = 0 due to the “good” properties
of H, i.e., V . ⊓
⊔
5.4 Precoding for the Training Channel
Optimization of vector precoding is based on the assumption of the re-

ceivers (5.24) and (5.42). Their realization requires the knowledge of hT k T r̃ bbk
as well as crk and c̄rk , respectively. The variance of the receive signal can be
easily estimated. The crosscorrelation hT k T r̃ bbk can be signaled to the re-
ceivers by precoding of the training sequence (Section 4.1.2) with
Q = T R̃b . (5.105)
Now the necessary crosscorrelation terms can be estimated at the receivers

(cf. (4.11)). Contrary to linear precoding, where Q = P , the required trans-
mit power is not equal to the transmit power in the data channel, i.e.,
kQk2F 6= Ptx = tr[T R̃b T H ]. Moreover, kQk2F is not constant, but depends
on the channel and the data. A cumulative distribution of kQk2F is shown in
Figure 5.9.
The same is true for THP, where precoding of the training sequence with
Q = P C w (I K − F H )Π (O) (5.106)
ensures the realizability of the assumed receivers (5.37) and (5.47).

We can scale Q by a constant to satisfy the transmit power constraint
t
Ptx in the “worst case”. If this scaling is known to the receivers, it can be
compensated. But the channel estimation quality at the receivers degrades.
5.5 Performance Evaluation 215
Fig. 5.9 Cumulative distribution function of kQk2F (5.105) at 10 log10 (Ptx /cn ) =
20 dB and fmax = 0.2 for P-CSI (Scenario 1).
As in Figure 5.9 the worst case kQk2F can exceed Ptx t

= Ptx = 1 (cf. (4.7))
2
significantly. Note also that the distribution of kQkF depends critically on the
scenario and SNR: For small and large SNR, the distribution is concentrated
t
around Ptx = 1; it is more widespread for larger K/M or increased angular
spread.
5.5 Performance Evaluation
For a complexity comparable to linear precoding, we implement vector pre-

coding with the THP-like lattice search, which is equivalent to THP with
the appropriate scaling of P . Therefore, we refer to the nonlinear precoders
as THP and the solution from Section 5.3 as robust THP. The following
variations are considered:
• For C-CSI, the iterative solution in Section 5.3.2 with MMSE receivers
is chosen (“THP: MMSE-Rx (C-CSI)”). It is initialized with the THP
solution for identical receivers gk = β from [115], which is obtained for
G(m−1)(h) = I K . Contrary to linear precoding, it does not converge to
the global minimum in general.
• Heuristic nonlinear precoders are obtained from the solution for C-CSI
with outdated CSI (O-CSI) ĥ = Eh[q′ ]|y[q′ ][h[q ′ ]] with q ′ = q tx − 1 (“THP:
Heur. MMSE-Rx (O-CSI)”) or the predicted CSI ĥ = Eh|yT [h] (“THP:
Heur. MMSE-Rx (P-CSI)”) instead of h, i.e., the CSI is treated as if it
was error-free. The initialization is also chosen in analogy to C-CSI but
with O-CSI and P-CSI, respectively.
• As an example for the systematic approach to robust nonlinear precoding
with P-CSI, we choose the iterative solution to (5.55) assuming MMSE
receivers (“THP: MMSE-Rx (P-CSI)”) and to (5.56) assuming scaled

matched filter (MF) receivers (“THP: MF-Rx (P-CSI)”). More details and
the initialization are described in Section 5.3.1.
• For both receiver models, the algorithms for S-CSI are a special case of
P-CSI (“THP: MMSE-Rx (S-CSI)” and “THP: MF-Rx (S-CSI)”) with
initialization from Appendix D.3.
A fixed number of 10 iterations is chosen for all precoders, i.e., the com-
putational complexity of all nonlinear and linear MSE-related precoders is
comparable. The necessary numerical integration for precoding with P-CSI
or S-CSI assuming MMSE receivers is implemented as a Monte-Carlo inte-
gration with 1000 integration points.
The nonlinear precoders are compared with the linear MSE precoders from
Section 4.4. The system parameters and the first two simulation scenarios are
identical to Section 4.4.4. We repeat the definitions for convenience.
The temporal channels correlations are described by Clarke’s model (Ap-
pendix E and Figure 2.12) with a maximum Doppler frequency fmax , which
is normalized to the slot period Tb . For simplicity, the temporal correlations
are identical for all elements in h (2.12). Four different stationary scenarios
for the spatial covariance matrix are compared. For details and other implicit
assumptions see Appendix E. The SNR is defined as Ptx /cn .
Scenario 1: We choose M = 8 transmit antennas and K = 6 receivers with
the mean azimuth directions ϕ̄ = [−45◦ , −27◦ , −9◦ , 9◦ , 27◦ , 45◦ ] and Laplace
angular power spectrum with spread σ = 5◦ .
Scenario 2: We have M = 8 transmit antennas and
K = 8 receivers with mean azimuth directions ϕ̄ =
[−45◦ , −32.1◦ , −19.3◦ , −6.4◦ , 6.4◦ , 19.3◦ , 32.1◦ , 45◦ ] and Laplace angular
power spectrum with spread σ = 0.5◦ , i.e., rank[C hk ] ≈ 1.
Scenario 3: We restrict rank[C hk ] = 2 with M = 8, K = 4, i.e.,
PK
k=1 rank[C hk ] = M . The two azimuth directions (ϕk,1 , ϕk,2 ) per receiver
from (E.3) with W = 2 are
{(−10.7◦ , −21.4◦ ), (−3.6◦ , −7.2◦ ), (3.6◦ , 7.2◦ ), (10.7◦ , 21.4◦ )}.
Scenario 4 simulates an overloaded system with M = 2 and K = 3 for

ϕ̄ = [−19.3◦ , −6.4◦ , 6.4◦ ] and different angular spread σ (Appendix E).18
The optimum MMSE receivers (5.24) are implemented with perfect chan-
nel knowledge. As modulation alphabet D for the data signals dk we choose
rectangular 16QAM for all receivers.
18
The mean angles correspond to the central three receivers of Scenario 2.
Fig. 5.10 Mean uncoded BER vs. Ptx /cn with fmax = 0.2 for P-CSI (Scenario 1).
Fig. 5.11 Mean uncoded BER vs. fmax for 10 log10 (Ptx /cn ) = 20 dB (Scenario 1).
Discussion of Results
The mean uncoded bit error rate (BER), which is an average w.r.t. the K
receivers, serves as a simple performance measure.
In scenario 1, all C hk have full rank. In a typical example the largest
eigenvalue is about 5 times larger than the second and 30 times larger than
the third largest eigenvalue. Therefore, the transmitter does not know the
effective channel (including the receivers processing gk ) perfectly. Because
PK
k=1 rank[C hk ] = KM > M , the channels do not have enough structure
to allow for a full cancelation of the interference as for C-CSI. The BER
Fig. 5.12 Convergence in mean BER of alternating optimization at

10 log10 (Ptx /cn ) = 30 dB (Scenario 1, 16QAM).
Fig. 5.13 Mean uncoded BER vs. Ptx /cn for P-CSI with fmax = 0.2 (Scenario 2).
saturates for high Ptx /cn (Figure 5.10). The heuristic optimization for P-CSI
does not consider the interference sufficiently and the performance can be
improved by the robust optimization of THP based on the CM estimate of
the receivers’ cost, which is based on scaled matched filter receivers. Due
to the insufficient channel structure, robust THP can only achieve a small
additional reduction of the interference and the BER saturates at a lower
BER.
For increasing Doppler frequency, the heuristic design performs worse than
linear precoding (Figure 5.11). The robust solution for P-CSI ensures a per-
formance which is always better than linear precoding at high Ptx /cn . As for
Fig. 5.14 Mean uncoded BER vs. Ptx /cn for S-CSI (Scenario 2).
linear precoding (Section 4.4.4), the prediction errors are small enough as
long as fmax ≤ 0.15 to yield a performance of the heuristic, which is very
close to our systematic approach.
For this scenario, approximately 10 iterations are necessary for convergence
(Figure 5.12). For S-CSI and scenario 2, the algorithms converge in the first
iteration.
The interference can already be canceled sufficiently by linear precoding
in scenario 2, for which the ratio of the first to the second eigenvalue of
C hk is about 300. Because the receivers’ channels are close to rank one,
the probability is very high that the channel realization for a receiver and
its predicted channel lie in the same one-dimensional subspace. Thus, the
performance degradation by the heuristic THP optimization is rather small
(Figure 5.13). But for S-CSI, the THP optimization based on the mean per-
formance measures gains 5 dB at high Ptx /cn compared to linear precoding
from Section 4.4 (Figure 5.14). Figure 5.15 shows the fast transition between
a performance close to C-CSI and S-CSI which is controlled by the knowledge
of the autocovariance sequence.
Optimization based on an MMSE receiver model results in a small per-
formance improvement at high Ptx /cn : For example, the discussion in Sec-
tion 5.3.2 shows that for S-CSI and rank[C hk ] = 1 the estimation error of the
effective channel is not zero, but the regularization matrix X (m−1) is zero for
optimization based on the scaled matched filter receiver model. Therefore,
the small degradation results from the mismatch to the implemented MMSE
receiver and the incomplete description of the error in CSI by X (m−1) .
Fig. 5.15 Mean uncoded BER vs. fmax for 10 log10 (Ptx /cn ) = 20 dB (Scenario 2).
The performance in scenario 3 (Figure 5.16) confirms that the gains of

THP over linear precoding for P-CSI and S-CSI are largest, when the channel
has significant structure. Robust THP is not limited by interference, which is
the case for its heuristic design. Note the cross-over point between heuristic
THP and linear precoding for P-CSI at 10 dB.
For rank[C hk ] = 1 (σ = 0◦ ), we show in Section 5.2.1 that any MSE
is realizable even for S-CSI at the expense of an increased transmit power.
Therefore, the system is not interference-limited. This is confirmed by Fig-
ure 5.17 for scenario 4, i.e., K = 3 > M = 2: Robust THP with scaled MF
receivers does not saturate whereas linear precoding is interference-limited
with a BER larger than 10−1 . With increasing angular spread σ the per-
formance of robust THP degrades and the BER saturates at a significantly
smaller BER than linear precoding. Robust THP requires about 30 iterations
for an overloaded system; this considerably slower convergence rate is caused
by our alternating optimization method. In principle, it can be improved
optimizing the explicit cost function in Section 5.2.2 for P directly.
Whether the optimization of nonlinear precoding chooses an element ak [n]
of the perturbation vector a[n] to be non-zero depends on the system param-
eters and channel properties. The gain of THP over linear precoding is influ-
enced by the probability of ak [n] to be zero, which is shown in Figure 5.18 for
Fig. 5.16 Mean uncoded BER vs. Ptx /cn with fmax = 0.2 for P-CSI (Scenario 3,
M = 8, K = 4).
Fig. 5.17 Mean uncoded BER vs. Ptx /cn with fmax = 0.2 for P-CSI (Scenario 4,
M = 2, K = 3).
different parameters (Scenario 2). Inherently, the THP-like lattice search al-
ways chooses the first element of Π (O) a[n] to be zero, i.e., the probability to
be zero is always larger than 1/K. For small Ptx /cn , the probability converges
to one and the nonlinear precoder is linear: In BER we see a performance
degradation compared to linear precoding due to the modulo operators at the
Fig. 5.18 Probability that a modulo operation at the transmitter is inactive (linear),
i.e., ak [n] = 0 (Scenario 2, M = 8).
receivers. Due to the larger amplitudes of the symbols dk , the probability of

ak [n] being zero is smaller for 16QAM than for QPSK. If we also reduce the
number of receivers to K = 4, which are equally spaced in azimuth between
±45◦ , the optimum precoder becomes linear with a high probability.
To summarize for K/M ≤ 1, gains are only large if
• it is difficult to separate the receivers, e.g., for small differences in the
receivers’ azimuth directions and K close to M ,
• the channels show sufficient structure in the sense that rank[C hk ] ≤ M ,
and
• the size of the QAM modulation alphabet D is large, e.g., |D| ≥ 16.
This is in accordance to the model of C w (5.26), which is more accurate for
larger modulation alphabet and smaller probability for zero elements in a[n].
Moreover, information theoretic results confirm that the gap between linear
precoding and dirty paper precoding increases for larger K/M (cf. [106] for
K ≤ M ).
Fig. 5.19 Mean uncoded BER vs. Ptx /cn of THP with fmax = 0.1 for P-CSI with
estimated covariance matrices (Scenario 1, B = 20).
Fig. 5.20 Mean uncoded BER vs. fmax of THP at 10 log10 (Ptx /cn ) = 20 dB with
estimated covariance matrices (Scenario 1, B = 20).
Performance for Estimated Covariance Matrices
All previous performance results for nonlinear precoding and P-CSI have
assumed perfect knowledge of the channel covariance matrix C hT and the
noise covariance matrix C v = cn I M . As for linear precoding, we apply the
algorithms for estimating channel and noise covariance matrices which are
derived in Chapter 3 to estimate the parameters of ph|yT (h|y T ).
The model and the estimators of the covariance matrices are described
at the end of Section 4.4.4 (Page 165): Only B = 20 correlated realizations
h[q tx − ℓ] are observed via y[q tx − ℓ] for ℓ ∈ {1, 3, . . . , 2B − 1}, i.e., with a
period of NP = 2.
The heuristic and an advanced estimator (“ML est.”) for the channel co-
variance matrix are compared; the details can be found on page 165. Only
the mean uncoded BER for THP is shown in Figures 5.19 and 5.20.
The loss in BER of THP based on the heuristic covariance estimator com-
pared to the advanced method from Chapter 3 is significant for fmax = 0.1
(Figure 5.19): The saturation level is at about 2 · 10−3 compared to (be-
low) 10−4 . THP based on the advanced estimator performs close to a perfect
knowledge of the channel correlations.
With the advanced estimator a performance close to C-CSI is achieved up
to fmax ≈ 0.1 (Figure 5.20). For fmax ≥ 0.15, both estimators yield a similar
BER of THP, because only the knowledge of the spatial channel covariance
matrix is relevant whose estimation accuracy in terms of BER is comparable
for both approaches.
Appendix A
Mathematical Background
A.1 Complex Gaussian Random Vectors
A complex Gaussian random vector x ∈ CN is defined as a random vector

with statistically independent Gaussian distributed real and imaginary part,
whose probability distributions are identical [164, p. 198], [124]. We denote it
as x ∼ Nc (µx , C x ) with
mean µx = E[x] and covariance matrix C x = C xx =
E (x − µx )(x − µx )H . Its probability density reads
1
px (x) = exp −(x − µx )H C −1
x (x − µx ) . (A.1)
π N det[C x]
Consider the jointly complex Gaussian random vector z = [x T , y T ]T . Ad-

ditionally to the parameters of px (x), its probability density
is characterized
by C y , µy , and the crosscovariance matrix C xy = E (x − µx )(y − µy )H
with C xy = C H yx . The parameters of the corresponding conditional probabil-
ity density px|y (x|y) are (e.g., [124])
µx|y = Ex|y [x] = µx + C xy C −1

y (y − µy ) (A.2)
C x|y = C x − C xy C −1
y C yx = KC y (C z ) , (A.3)
where K•(•) denotes the Schur complement (Section A.2.2).

Useful expressions for higher order moments and the moments of some
nonlinear functions of complex Gaussian random vectors are derived in [148]
and [208].
225
226 A Mathematical Background
A.2 Matrix Calculus
A.2.1 Properties of Trace and Kronecker Product
The trace of the square matrix A ∈ CN ×N with ijth element [A]i,j = ai,j is
N
X
△
tr[A] = ai,i . (A.4)
i=1
It is invariant w.r.t. a circulant permutation of its argument
tr[BC] = tr[CB] , (A.5)

PN
where B ∈ CM ×N and C ∈ CN ×M . Thus, we have tr[A] = i=1 λi for the
eigenvalues λi of A.
The Kronecker product of two matrices A ∈ CM ×N and B ∈ CP ×Q is
defined as
 
a1,1 B . . . a1,n B
A ⊗ B =  ... ..  ∈ CM P ×N Q .
△  ..
. .  (A.6)
aM,1 B . . . aM,N B
We summarize some of its properties from [26] (M ∈ CM ×M , N ∈ CN ×N ,

C ∈ CN ×R , and D ∈ CQ×S ):
(A ⊗ B)T = AT ⊗ AT , (A.7)
A ⊗ α = α ⊗ A = αA, α ∈ C, (A.8)
aT ⊗ b = b ⊗ aT ∈ CN ×M , a ∈ CM , b ∈ CN , (A.9)
(A ⊗ B)(C ⊗ D) = (AC) ⊗ (BD), (A.10)
−1 −1 −1
(M ⊗ N ) =M ⊗N , M , N regular, (A.11)
tr[M ⊗ N ] = tr[M ] tr[N ] , (A.12)
N M
det[M ⊗ N ] = det[M ] det[N ] . (A.13)
The operation
 
a1
vec[A] =  ...  ∈ CM N
△  
(A.14)
aN
stacks the columns of the matrix A = [a1 , . . . , aN ] ∈ CM ×N in one vector.

It is related to the Kronecker product by
A.2 Matrix Calculus 227
vec[ABC] = (C T ⊗ A)vec[B] ∈ CM Q (A.15)
with B ∈ CN ×P and C ∈ CP ×Q .
A.2.2 Schur Complement and Matrix Inversion

Lemma
Define the portioning of

A1,1 A1,2
A= ∈ CM +N ×M +N (A.16)
A2,1 A2,2
with submatrices A1,1 ∈ CM ×M , A1,2 ∈ CM ×N , A2,1 ∈ CN ×M , and A2,2 ∈

CN ×N .
The Schur complement of A1,1 in A (e.g., [121]) is defined as
△
KA1,1 (A) = A2,2 − A2,1 A−1
1,1 A1,2 ∈ C
N ×N
, (A.17)
similarly, the Schur complement of A2,2 in A

△
KA2,2 (A) = A1,1 − A1,2 A−1
2,2 A2,1 ∈ C
M ×M
. (A.18)
Two important properties regarding the positive (semi-) definiteness of

the Schur complement are stated by the following theorems where B is par-
titioned as A.
Theorem A.1 (Theorem 3.1 in [132]). If A B, then KA1,1 (A)
KB 1,1 (B).
In [24, p.651] equivalent characterizations of a positive (semi-) definite Schur
complement in terms of the positive (semi-) definiteness of the block matrix
A is given:
Theorem A.2.
1. A ≻ 0 if and only if KA1,1 (A) ≻ 0 and A1,1 ≻ 0.
2. If A1,1 ≻ 0, then A 0 if and only if KA1,1 (A) 0.
Moreover, throughout the book we require the matrix inversion lemma for
the matrices A ∈ CN ×N , B ∈ CN ×M , C ∈ CM ×M , and D ∈ CM ×N :
(A + BCD)−1 = A−1 − A−1 B(DA−1 B + C −1 )−1 DA−1 . (A.19)
It can be used to prove [124]
(C −1 + B H R−1 B)−1 B H R−1 = CB H (BCB H + R)−1 (A.20)

with B ∈ CM ×N and invertible C ∈ CN ×N and R ∈ CM ×M .
A.2.3 Wirtinger Calculus and Matrix Gradients
To determine the gradient of complex-valued functions f(z) : CM → C with

a complex argument z = x + jy ∈ CM and x, y ∈ RM , we could proceed
considering it is as a function from R2M → R2 . But very often it is more
efficient to define a calculus which works with z directly. In the Wirtinger
calculus [177] the partial derivative w.r.t. z and its complex conjugate z ∗
are defined in terms of the partial derivative w.r.t. to the real and imaginary
part x and y, respectively (see also [108] and references):

∂ △ 1 ∂ ∂
= −j (A.21)
∂z 2 ∂x ∂y

∂ △ 1 ∂ ∂
= +j . (A.22)
∂z ∗ 2 ∂x ∂y
Formally, we treat the function f(z) as a function f(z, z ∗ ) in two variables z

and z ∗ . With this definition most of the basic properties of the (real) partial
derivative still hold: linearity, product rule, and the rule for the derivative
of a ratio of two functions. Additionally we have ∂f(z)∗ /∂z = (∂f(z)/∂z ∗ )∗
and ∂f(z)∗ /∂z ∗ = (∂f(z)/∂z)∗ . Note that the chain rule is slightly different
but still consistent with the interpretation of f(z) as a function in z and z ∗
[177]. If f(z) does not depend on z ∗ , then ∂f(z)/∂z ∗ = 0.
For functions f : CM ×N → C whose argument is a matrix X ∈ CM ×N with
elements xm,n = [X]m,n , it is convenient to work directly with the following
matrix calculus, which follows from the definition of the Wirtinger derivative
above:
 ∂f(X) ∂f(X)

∂x1,1 . . . ∂x1,N
∂f(X) △  .. ..  M ×N
= . ∈C . (A.23)

∂X  .
∂f(X) ∂f(X)
∂xM,1 . . . ∂xM,N
Defining A ∈ CM ×N , B ∈ CN ×M , C ∈ CM ×M , and R ∈ CN ×N the partial

derivatives for a number of useful functions are [124, 182]:
A.3 Optimization and Karush-Kuhn-Tucker Conditions 229
∂tr[AB]
= BT (A.24)
∂A
∂tr ARAH
= AR (A.25)
∂A∗
∂tr[AB]
= 0M ×N (A.26)
∂A∗
∂tr[ABAC]
= C T AT B T + B T AT C T . (A.27)
∂A
For M = N , we get

∂tr A−1 B T
= − A−1 BA−1 . (A.28)
∂A
If A is invertible and A = AH ,
∂det[A]
= det[A] A−T (A.29)
∂A
∂ln[det[A]]
= A−T . (A.30)
∂A
For Hermitian and regular C(θ) with θ ∈ C, it can be shown that [124]

∂ln[det[C(θ)]] ∂C(θ)
= tr C −1 (θ) (A.31)
∂θ ∂θ
∂C −1 (θ) ∂C(θ) −1
= −C −1 (θ) C (θ). (A.32)
∂θ ∂θ
A.3 Optimization and Karush-Kuhn-Tucker Conditions
Consider the general nonlinear optimization problem
min F(x) s.t. g(x) ≤ 0S , h(x) = 0P (A.33)

x
with optimization variables x ∈ CM , cost function F(x) : CM → R, inequal-

ity constraints g(x) : CM → RS , and equality constraints h(x) : CM → CP .
The inequality is defined elementwise for a vector, i.e., gi (x) ≤ 0 for
g(x) = [g1 (x), . . . , gS (x)]T .
The corresponding Lagrange function L : CM × RS+,0 × CP → R reads [24]

L(x, λ, ν) = F(x) + λT g(x) + 2Re ν T h(x) (A.34)
with Lagrange variables λ ∈ RS+,0 and ν ∈ CP . If L is differentiable, the

Karush-Kuhn-Tucker (KKT) conditions
g(x) ≤ 0S
h(x) = 0P
λ ≥ 0S (A.35)
λi gi (x) = 0, i = 1, 2, . . . , S
∂
L(x, λ, ν) = 0M
∂x
are necessary for a minimal point of problem (A.33). Alternatively, the partial
∂
derivative of the Lagrange function can be evaluated as ∂x ∗ L(x, λ, ν) = 0M
according to the Wirtinger calculus (Appendix A.2.3) for complex variables

x.1
If the optimization problem is given in terms of a matrix X ∈ CM ×N with
equality constraint H(X) : CM ×N → CP ×Q , it is often more convenient to
work with the equivalent Lagrange function
L(X, λ, N ) = F(X) + λT g(X) + 2Re(tr[N H(X)]) (A.36)
with Lagrange variables N ∈ CQ×P and the corresponding KKT conditions.
1 ∂ ∂
( ∂x L(x, λ, ν))∗ = ∂x∗
L(x, λ, ν) because L is real-valued.
Appendix B
Completion of Covariance Matrices
and Extension of Sequences
B.1 Completion of Toeplitz Covariance Matrices
Matrix completion problems deal with partial matrices, whose entries are
only specified for a subset of the elements. The unknown elements of these
matrices have to be completed subject to additional constraints [130]. An
important issue is the existence of a completion. If it exists and is not unique,
the unknown elements can be chosen according to a suitable criterion, e.g.,
maximum entropy. Here, we are interested in partial covariance matrices and
their completions which have to satisfy the positive semidefinite constraint.
A fully specified matrix is positive semidefinite only if all its principal
submatrices are positive semidefinite. A partial positive semidefinite matrix
is Hermitian regarding the specified entries and all its fully specified principal
submatrices1 are positive semidefinite [117]. From this follows that a partial
matrix has a positive semidefinite completion only if it is partially positive
semidefinite.
The covariance matrix
 
c[0] c[1] . . . c[P (B − 1)]
 . .. 
 c[1]∗ c[0] . . . 
CT =  ..
 (B.1)
 .. .. 
 . . . c[1] 
c[P (B − 1)]∗ . . . c[1]∗ c[0]
of interest in Section 3.4 is Hermitian Toeplitz with first row

[c[0], c[1], . . . , c[P (B − 1)]]. Only every P th sub-diagonal including the main
(P )
diagonal is given and the corresponding partial matrix is denoted C T . For
P = 2 and B = 3, the partial matrix is (question marks symbolize the un-
1
A principal submatrix is obtained deleting a subset of the columns and the cor-
responding rows of a matrix. The main diagonal of a principal submatrix is on the
diagonal of the original matrix.
231
232 B Completion of Covariance Matrices and Extension of Sequences
known elements)
 
c[0] ? c[2] ? c[4]
? c[0] ? c[2] ? 
(2)  ∗

CT  c[2]
= ? c[0] ? c[2] 
. (B.2)
? c[2]∗ ? c[0] ? 
c[4]∗ ? c[2]∗ ? c[0]
(P )
All fully specified principal submatrices of C T are the principal submatrices
of
 
c[0] c[P ] . . . c[P (B − 1)]
 . .. 
 c[P ]∗ c[0] . . . 
C̄ T =  ..
, (B.3)
 .. .. 
 . . . c[P ] 
c[P (B − 1)]∗ . . . c[P ]∗ c[0]
(P )
which itself is a principal submatrix of C T . Thus, the known elements of
C T have to form a positive semidefinite Toeplitz matrix C̄ T . This necessary
condition makes sense intuitively, because its elements {c[P i]}B−1
i=0 form the
decimated sequence w.r.t. c[ℓ].
A set of specified positions in a partial matrix is called a pattern. For the
Toeplitz matrix C T , it is defined by
P = {ℓ|c[ℓ] is specified for ℓ > 0} ∪ {0} (B.4)
and includes the main diagonal to avoid the trivial case2 . A pattern P is said
to be positive semidefinite completable if for every partial positive semidefi-
nite Toeplitz matrix with pattern P a completion to a positive semidefinite
Toeplitz matrix exists.
Considering all possible patterns P for C T the following result is given by
Johnson et al.[117]:
Theorem B.1 (Johnson et al.[117]). A pattern P ⊆ {0, 1, . . . , P (B−1)} is
positive semidefinite completable if and only if P = {0, P, P 2, . . . , P (B − 1)}.
Alternatively to the proof given in [117], graph-theoretic arguments
(e.g. [131]) lead to the same result. In general the completion is not
unique. For example, the completion with zeros, i.e., c[P i − ℓ] = 0 for
ℓ ∈ {1, 2, . . . , P − 1} and i ∈ {1, 2, . . . , B − 1}, yields a positive definite
Toeplitz covariance matrix [117].
2
If the main diagonal was not specified, a positive definite completion could always
be found.
B.2 Band-Limited Positive Semidefinite Extension of Sequences 233
B.2 Band-Limited Positive Semidefinite Extension of

Sequences
Given a partial sequence c[ℓ] on the interval I = {0, 1, . . . , N }, its extension

to Z is a sequence ĉ[ℓ] with ĉ[ℓ] = c[ℓ] on I. This is also called an extrapolation.
We are interested in extensions to positive semidefinite sequences which

have a positive real Fourier transform, i.e., a (power) spectral density
S(ω) ≥ 0, (B.5)
or in general a spectral distribution s(ω) with S(ω) = ds(ω)/dω. Neces-

sarily, we assume that the given partial sequence c[ℓ] = c[−ℓ]∗ is conjugate
symmetric as is its positive semidefinite extension.
Conditions for the existence and uniqueness of a positive semidefinite ex-
tension are summarized in the following theorem (e.g. [6]).
Theorem B.2. A positive semidefinite extension of {c[ℓ]}N ℓ=0 exists if and
only if the Toeplitz matrix C T with first row [c[0], c[1], . . . , c[N ]] is positive
semidefinite. The extension is unique, if C T is singular. For singular C T
with rank R ≤ N , it is composed of R zero-phase complex sinusoids.
A band-limited extension with maximum frequency ωmax = 2πfmax has a
Fourier transform, which is only non-zero on Ω = [−ωmax , ωmax ] ⊆ [−π, π] .
It is easy to construct an extension which is band-limited and not positive
semidefinite: Given any band-limited sequence a linear FIR filter can be con-
structed, which creates a (necessarily) band-limited sequence that is identical
to c[ℓ] on I [181]. This sequence is not necessarily positive semidefinite even
if the filter input is a band-limited positive semidefinite sequence.
Necessary and sufficient conditions for the existence and uniqueness of
a band-limited positive semidefinite extension are presented by Arun and
Potter [6]. They propose a linear filter to test a sequence {c[ℓ]}∞ ℓ=−∞ for
being positive semidefinite and band-limited:
c′ [ℓ] = c[ℓ − 1] − 2 cos(ωmax )c[ℓ] + c[ℓ + 1]. (B.6)
The frequency response of this filter is only positive on [−ωmax , ωmax ] (Fig-
ure B.1), i.e., if the input sequence is not band-limited to ωmax or indefinite
on Ω, the output is an indefinite sequence. A test for band-limited positive
semidefinite extendibility can be derived based on the in- and output of this
filter. It results in the following characterization.
Theorem B.3 (Arun and Potter [6]).
1. The sequence {c[ℓ]}N ℓ=0 has a positive semidefinite extension band-limited
to ωmax if and only if the (N + 1) × (N + 1) Toeplitz matrix C T with first
row [c[0], c[1], . . . , c[N ]] and the N × N Toeplitz matrix C ′T with first row
Fig. B.1 Frequency response H(2πf ) = 2 cos(2πf ) − 2 cos(2πfmax ) of (B.6) for

fmax = 0.2 (ωmax = 0.4π).
[c′ [0], c′ [1], . . . , c′ [N − 1]] are both positive semidefinite, where c′ [ℓ] is given
by (B.6).
2. If this partial sequence is extendible in this sense, the extrapolation is
unique if and only if C T or C ′T , or both, is singular.
We illustrate this theorem with two examples: The first is required for our
argumentation in Section 2.4.3.2, the second example for Section 3.4.
Example B.1. Given the sequence c[0] = 1 and c[ℓ] = 0, ℓ ∈ {1, 2, . . . , B}, for
which maximum frequencies fmax does a positive semidefinite band-limited
extension exist?
The output in (B.6) is
c′ [0] = −2 cos(ωmax ) (B.7)

c′ [1] = 1 (B.8)
c′ [ℓ] = 0, ℓ ≥ 2, (B.9)
which yields a tridiagonal matrix C ′T . C T from Theorem B.3 is always posi-

tive semidefinite. Necessary conditions for a positive semidefinite C ′T are
• c′ [0] = −2 cos(ωmax ) ≥ 0, which requires ωmax ≥ π/2 (fmax ≥ 0.25) and
• c′ [0] = −2 cos(ωmax ) ≥ |c′ [1]| = 1, which is given for ωmax ≥ 2π/3 (fmax ≥
1/3).
The exact value of fmax for sufficiency depends on B: For B = 1, we have
C ′T = −2 cos(ωmax ) ≥ 0 (B.10)
and fmax ≥ 0.25 is also sufficient. For B = 2, it yields

B.3 Generalized Band-Limited Trigonometric Moment Problem 235

′ −2 cos(ωmax ) 1
CT = , (B.11)
1 −2 cos(ωmax )
which is also positive semidefinite for fmax ≥ 1/3. The necessary and sufficient
fmax increases further for B = 3: fmax ≥ 0.375. For B → ∞, we have
fmax → 0.5, i.e., no band-limited positive semidefinite extension exists in the
limit. ⊓
⊔
Example B.2. The partial decimated sequence c[0], c[P ], . . . , c[P (B − 1)] is
given. Does the sequence c[0], 0, . . . , 0, c[P ], . . . , 0, . . . , 0, c[P (B − 1)], which is
the given sequence interpolated with P − 1 zeros, have a positive semidefinite
and band-limited extension?
For P = 2, the output in (B.6) is
c′ [0] = −2 cos(ωmax )c[0] (B.12)

c′ [1] = c[0] + c[2] (B.13)
c′ [2] = −2 cos(ωmax )c[2], etc. (B.14)
Due to c′ [0] ≥ 0 a necessary condition is fmax ≥ 0.25, which coincides with

the condition of the sampling theorem for the reconstruction of c[ℓ] from the
sequence decimated by P = 2.
For P = 3, we obtain a second necessary condition c′ [1] = c[0] ≤ c′ [0],
which is always true for fmax ≥ 1/3. ⊓
⊔
B.3 Generalized Band-Limited Trigonometric Moment

Problem
Combining the results on completion of partial positive semidefinite Toeplitz

matrices (Section B.1) and the band-limited extension of conjugate symmet-
ric sequences (Section B.2), we can solve the following generalized trigono-
metric moment problem.
Problem Statement: What are the necessary and sufficient conditions for
the existence of a (band-limited) spectral distribution s(ω), which satisfies
the interpolation conditions
Z
1
c[ℓ] = exp(jωℓ)ds(ω), ℓ ∈ P ⊆ {0, 1, . . . , N }, {0, N } ⊆ P (B.15)
2π Ω
for Ω = [−ωmax , ωmax ] with ωmax = 2πfmax ? And is it unique?

The constraint N ∈ P is imposed without loss of generality to simplify
the discussion: Else the existence has to be proved for an effectively shorter
(P)
partial sequence. Moreover, we define C T as the partial and Hermitian
Toeplitz matrix for c[ℓ], ℓ ∈ P.
The band-limited spectral distribution s(ω) is constant outside Ω with

s(−ωmax ) = 0 and s(ωmax ) = 2πc[0]. If s(ω) is differentiable, this corre-
sponds to the existence of a band-limited positive real (power) spectral den-
sity S(ω) = ds(ω)
dω ≥ 0, which is non-zero only on Ω, with
Z ωmax
1
c[ℓ] = exp(jωℓ)S(ω)dω, ℓ ∈ P ⊆ {0, 1, . . . , N }, {0, N } ⊆ P.
2π −ωmax
(B.16)
Given a suitable spectral distribution the extension of the sequence c[ℓ], ℓ ∈ Z

can be computed and vice versa.
We summarize the conditions in a theorem.
Theorem B.4. Given c[ℓ], ℓ ∈ P, as in (B.15) and a maximum frequency
(P)
fmax . For every partially positive semidefinite C T , a spectral distribution
s(ω) satisfying the interpolation conditions (B.15) for Ω = [−ωmax , ωmax ]
with ωmax = 2πfmax ≤ π exists if and only if
1. P is a periodic pattern
P = {ℓ|ℓ = P i, i ∈ {0, 1, . . . , B − 1}, N = P (B − 1)} , (B.17)
1 ′
2. and the following additional condition holds for the case fmax < 2P : C̄ T
with first row [c′ [0], c′ [P ], . . . , c′ [P (B − 2)]] is positive semidefinite, where
c′ [P i] is defined as c′ [P i] = c[P (i − 1)] − 2 cos(2πfmax ′
)c[P i] + c[P (i + 1)]
′
for fmax = fmax P .
Under these conditions, the spectral distribution is unique if and only if the
′ 1
Toeplitz matrix C̄ T or C̄ T , or both, is singular and fmax ≤ 2P .
Proof. The proof follows directly from Theorems B.1, B.3, and a reasoning
similar to the sampling theorem. We distinguish four cases of fmax :
1) fmax = 1/2: This corresponds to no band-limitation and is related to
Theorem B.1, which shows that a positive definite matrix completion for
(P)
all partially positive semidefinite C T exists if and only if the pattern P is
periodic. Then partial positive semidefiniteness is equivalent to C̄ T 0 (B.3).
Given a positive semidefinite completion, a positive semidefinite extension of
this sequence can be found [6]. The extension is unique given a singular
completion C T , but the matrix completion itself is not unique.
A periodic pattern is also necessary for fmax < 1/2, because a pos-
itive semidefinite C T is necessary for existence of a (band-limited) posi-
tive semidefinite sequence extension. Therefore, we restrict the following ar-
gumentation to P = {ℓ|ℓ = P i, i ∈ {0, 1, . . . , B − 1}, N = P (B − 1)}. (Note
that Theorem B.4 requires existence for every partially positive semidefinite
(P)
C T .) For the remaining cases with fmax < 1/2, we first proof (band-limited)
extendibility of the decimated sequence followed by arguments based on in-
terpolation because of the periodicity of P.
B.3 Generalized Band-Limited Trigonometric Moment Problem 237
2) 1/(2P ) < fmax < 1/2: If and only if C̄ T is positive semidefinite, a

positive semidefinite extension of c[P i], i ∈ {0, 1, . . . , B − 1}, exists for i ∈ Z.
We can generate a band-limited interpolation c[ℓ], ℓ ∈ Z, which is not unique:
For example, the interpolated sequence c[ℓ] with maximum frequency 1/(2P ),
which is trivially band-limited to fmax > 1/(2P ).
3) fmax = 1/(2P ): The condition of the sampling theorem is satisfied. A
positive semidefinite extension c[P i], i ∈ Z, exists for positive semidefinite
C̄ T . It is unique if and only if C̄ T is singular. The interpolated sequence
c[ℓ], ℓ ∈ Z, can be reconstructed from the band-limited spectral distribution
of the extended decimated sequence c[P i], i ∈ Z, and is unique.
′
4) fmax < 1/(2P ): If and only if C̄ T and C̄ T based on c[P i], i ∈
′ ′
{0, 1, . . . , B − 1}, and c [P i] for maximum frequency fmax = fmax P < 1/2
3
(as defined in the Theorem) are positive semidefinite, a band-limited ex-
tension c[P i], i ∈ Z, exists. It is unique if and only if one of the matrices, or
both, is singular. From its spectral distribution follows a unique interpolation
c[ℓ], ℓ ∈ Z. ⊓
⊔
3
To test the decimated sequence for positive semidefinite extendibility, we have to
′
consider the maximum frequency fmax of the P -fold decimated sequence c[P i].
Appendix C
Robust Optimization from the
Perspective of Estimation Theory
Consider the optimization problem
xopt (h) = argmin F(x; h) s.t. x ∈ X, (C.1)

x
where the cost function depends on an unknown parameter h and the con-
straint set X is independent of h. Furthermore, an observation y with the
probability density py (y; h) is available.
The question arises, whether we should estimate the unknown parameter
h and plug this estimate into the cost function or estimate the cost function
itself. In the first case an additional question arises: What is the optimum
estimator for h in the context of this optimization problem?
The Maximum Likelihood Invariance Principle
Based on a stochastic model of y we can obtain the maximum likelihood

(ML) estimate for h:
ĥML = argmax py (y; h) . (C.2)

h
But we are only interested in F(x; h) and not explicitly in h. According to

the ML invariance principle, the ML estimate of a function of h is simply
obtained by plugging ĥML into this function [124], because the probability
distribution of the observation is invariant w.r.t. our actual parameter of
interest, i.e., whether we want to estimate h or F(x; h). Therefore, the ML
estimate of the cost function is
F̂ML (x; y) = F(x; ĥML ) (C.3)
and yields the optimization problem
239
240 C Robust Optimization from the Perspective of Estimation Theory
(a) Heuristic Approach: Plug in estimate ĥ as if it was

error-free.
(b) Robust optimization (C.6): The interface between pa-

rameter estimation and the application is enhanced and pro-
vides also a probability distribution pε (ε) or covariance ma-
trix C ε of the estimation error ε.
(c) Combined estimation: Param-

eters h are not estimated explicitly
and the observation y is directly
applied to cost function following
the Bayesian principle (C.5).
Fig. C.1 Optimization for an application with cost function F(x; h) and unknown
parameters h which can be estimated from the observations y.
xML
opt (ĥML ) = argmin F(x; ĥML ). (C.4)
x∈X
Many results in the literature follow this approach — but some follow this
principle, although their estimate ĥ does not maximize the likelihood (see
first heuristic approach below). What are alternative methods?
The Bayesian Principle
Given the a posteriori density ph|y (h|y), the estimate minimizing the MSE
w.r.t. h is the conditional mean (CM) ĥCM = Eh|y [h] (Section 2.2.1) [124].
Because we are not interested in h directly, we optimize for the cost function
with minimum MSE. This yields the CM estimate of the cost function
F̂CM (x; y) = Eh|y [F(x; h)] . (C.5)

C Robust Optimization from the Perspective of Estimation Theory 241
It is only identical to plugging the CM estimate of h into the cost function

(see first heuristic below), i.e., F(x; ĥCM ), if F(x; h) is an affine Function
in h.
Obviously the CM estimator of h is a necessary step in this approach,
but it is not sufficient: For example, also the error covariance matrix C h|y is
taken into account in the generally nonlinear estimator of F (C.5).
We can view (C.5) as a combined optimization of parameter estimation and
the corresponding application described by F(x; h) and X (Figure C.1(c)).
If ph|y (h|y) is complex Gaussian, additionally to the estimate ĥCM its error
covariance matrix has to be passed from the parameter estimator to the
application which is represented by F(x; h) (Figure C.1(b)).
There are numerous applications for this principle in wireless com-
munications: Examples are a receiver design under channel uncertainties
[217, 43, 51, 55] and the optimization of the transmitter (Chapters 4 and 5;
see also [179, 119, 245])
Robust Optimization
Another approach is known from static stochastic programming [171, 24]:

Given an estimate ĥ and a stochastic model for its estimation error ε = h − ĥ
the expected cost is minimized

min Eε F(x; ĥ + ε) . (C.6)
x∈X
This is identical to (C.5), if ε is statistically independent of y

Eε|y F(x; ĥ + ε) = Eε F(x; ĥ + ε) , (C.7)
i.e., all a priori information included in pε|y (ε|y) has already been exploited
in ĥ.
Heuristic Approach: Plug Estimate into Cost Function
As mentioned above, the standard method for dealing with an optimization

problem such as (C.1) applies the estimated parameters ĥ as if they were
the true parameters (Figure C.1(a)). Thus, if knowledge about the size and
structure of the error in ĥ is available, it is neglected.
Consider the Taylor expansion of F(x; ĥ + ε) at ε = E[ε]. Assuming an
error ε small enough such that the expansion up to the linear terms
242 C Robust Optimization from the Perspective of Estimation Theory
∂F(x; ĥ + ε)
F(x; ĥ + ε) ≈ F(x; ĥ + E[ε]) + (ε − E[ε])
∂εT ε=E[ε]
∂F(x; ĥ + ε) ∗
+ (ε∗ − E[ε] ) (C.8)
∂εH ε=E[ε]
is sufficient, the following approximation is valid:

Eε F(x; ĥ + ε) ≈ F(x; ĥ + Eε[ε]). (C.9)
Similarly, for an error covariance matrix C h|y of sufficiently small norm, a

Taylor expansion of F(x; h) at h = ĥCM can be terminated after the linear
term and yields
Eh|y [F(x; h)] ≈ F(x; ĥCM ). (C.10)
We conclude that the naive heuristic which uses a parameter estimate as if it

was error-free remains valid for small errors. What is considered “small”, is
determined by the properties of F(x; h) in h, i.e., the accuracy of the linear
approximation (C.8).
Heuristic Approach: Estimate Functions of Parameters in Solution
Finding a closed form expression for the expectation of the cost function as
in (C.5) and (C.6) can be difficult. Alternatively we could also estimate the
argument of the original optimization problem (C.1)
Eh|y [xopt (h)] , (C.11)
which is usually not less complex (e.g., [104]). If a numerical approximation

is not an option for practical reasons and the cost function or the solution
xopt (h) are cascaded functions in h, we can also estimate a partial function of
the cascade: For example, if F(x; h) or xopt (h) depends on h via a quadratic
form, we may estimate the quadratic form directly and plug the solution
into the cost function or solution, respectively [52]. This will also reduce the
estimation error of the cost function.
Appendix D
Detailed Derivations
for Precoding with Partial CSI
D.1 Linear Precoding Based on Sum Mean Square Error
For the solution of the optimization problem

min F(P , β) s.t. tr P C d P H ≤ Ptx (D.1)
P ,β
with cost function1 (non-singular C d )

F(P , β) = tr[C d ] + |β|2 Ḡ + |β|2 tr C d P H RP − 2Re(tr[βH G P C d ])
(D.2)
and effective channel H G = Eh|yT [G(h)H], we form the Lagrange function

L(P , β, µ) = F(P , β) + µ tr P H P C d − Ptx . (D.3)
The definitions of the parameters Ḡ and R vary depending on the specific

problem and are defined in Chapter 4. The necessary KKT conditions (Ap-
pendix A.3) for this non-convex problem are
∂L
= |β|2 RP C d − β ∗ H H
G C d + µP C d = 0M ×K (D.4)
∂P ∗
∂L
∗
= β Ḡ + βtr P H RP C d − tr P H H H GC d = 0 (D.5)
∂β

tr P H P C d ≤ Ptx (D.6)

µ(tr P H P C d − Ptx ) = 0. (D.7)
As in [116, 108] we first solve (D.5) for β
1
For example, it is equivalent to the sum MSE Ed ,n,h|yT [kd̂ − d k22 ] with d̂ =
βG(h )H P d + βG(h )n based on the model in Figure 4.4.
243
244 D Detailed Derivations for Precoding with Partial CSI
H
β = tr P H H H C
G d Ḡ + tr P RP C d (D.8)
and substitute it in (D.4). The resulting expression is multiplied by P H from

the left. Applying the trace operation we obtain
2 2
tr P H H H
GC d
H tr P H H H
GC d
2 tr P RP C d −
Ḡ + tr[P H RP C d ] Ḡ + tr[P H RP C d ]

+ µtr P H P C d = 0. (D.9)
Identifying expression (D.8) for β it simplifies to

|β|2 Ḡ = µtr P H P C d . (D.10)
Excluding the meaningless solution P = 0M ×K and β = 0 for the KKT

conditions we have µ > 0. With tr[P C d P H ] = Ptx the Lagrange multiplier
is given by
|β|2 Ḡ
µ= . (D.11)
Ptx
Application to the first condition (D.4) gives

!−1
β∗ Ḡ
P = R+ IM HH
G. (D.12)
|β|2 Ptx
Assuming β ∈ R the solution to (D.1) reads

!−1
Ḡ
P =β −1
R+ IM HH
G
Ptx
  !−2 1/2 (D.13)
Ḡ .
1/2
β = tr H G R+ IM HH
GC d
 Ptx .
Ptx
The expression for the minimum of (D.1) can be simplified factoring out
H G (R + PḠtx I M )−1 to the left and (R + PḠtx I M )−1 H H
G C d to the right in the
trace of the second, third, and fourth term in (D.2). Finally, minP ,β F(P , β)
can be written as
D.2 Conditional Mean for Phase Compensation at the Receiver 245
 !−1 
Ḡ
tr[C d ] − tr H G R+ IM HH
GC d

Ptx
  !−1 −1 
Ḡ
HH  

= tr C d H G X+ IM G + IK  (D.14)
Ptx
with X = R − H H
G H G , where we applied the matrix inversion lemma (A.20)
with C −1 = X + PḠtx I M and B = H G in the second step.
D.2 Conditional Mean for Phase Compensation at the

Receiver
For the receiver model from Section 4.3.3

(hkT q s )∗
gk (hk ) = β, (D.15)
|hkT q s |
we derive an explicit expression of

(hkT q s )∗
Ehk |yT [gk (hk )hk ] = βEhk |yT hk . (D.16)
|hkT q s |
Defining the random variable zk = hkT q s we can rewrite it using the properties
of the conditional expectation
T ∗ ∗ ∗
(hk q s ) zk zk
Ehk |yT hk = E hk ,zk |y T h k = Ezk |y T Eh |z ,y [h k .
]
|hkT q s | |zk | |zk | k k T
(D.17)
Because phk |zk ,yT (hk |z k , y T ) is complex Gaussian the conditional mean
Ehk |zk ,yT [hk ] is equivalent to the (affine) LMMSE estimator

Ehk |zk ,yT [hk ] = Ehk |yT [hk ] + chk zk |yT c−1
zk |y T zk − Ezk |y T [zk ] (D.18)
with covariances

chk zk |yT = Ehk ,zk |yT hk − Ehk |yT [hk ] (zk − Ezk |yT [zk ])∗
= C hk |yT q ∗s (D.19)

czk |yT = Ezk |yT |zk − Ezk |yT [zk ] |2 = q H ∗
s C hk |y T q s . (D.20)
The first order moment in (D.18) is

T
µzk |yT = Ezk |yT [zk ] = ĥk q s (D.21)
with ĥk = Ehk |yT [hk ]. The estimator (D.18) has to be understood as the
estimator of hk given zk under the “a priori distribution” phk |yT (hk |y T ).
Defining gk (zk ) = zk∗ /|zk | and applying (D.18) to (D.17) we obtain
Ehk ,zk |yT [gk (zk )hk ] = Ezk |yT [gk (zk )] Ehk |yT [hk ] +

+ chk zk |yT c−1
zk |y T Ezk |y T [|zk |] − Ezk |y T [gk (zk )] Ezk |y T [zk ] . (D.22)
With [148] the remaining terms, i.e., the CM of the receivers’ model gk (zk )
and the magnitude |zk |, are given as
ĝk = Ezk |yT [gk (zk )]

√ ∗
π |µzk |yT | µzk |yT 1 |µzk |yT |2
= F
1 1 , 2, − , (D.23)
2 c1/2 |µzk |yT | 2 czk |yT
zk |y T
where 1F1 (α, β, z) is the confluent hypergeometric function [2], and

√
π 1/2 1 |µzk |yT |2
Ezk |yT [|zk |] = c 1F1 − , 1, − . (D.24)
2 zk |yT 2 czk |yT
In special cases, the confluent hypergeometric functions can be written in

terms of the modified Bessel functions of the first kind and order zero I0 (z)
and first order I1 (z) [147]:
1
1F1 (, 2, −z) = exp(−z/2) (I0 (z/2) + I1 (z/2)) (D.25)
2
1
1F1 (− , 1, −z) = exp(−z/2) ((1 + z)I0 (z/2) + zI1 (z/2)) . (D.26)
2
Asymptotically the modified Bessel functions are approximated by [2]
1
In (z) = √ exp(z)(1 + O(1/z)), z → ∞. (D.27)
2πz
D.3 An Explicit Solution for Linear Precoding

with Statistical Channel State Information
We assume S-CSI at the transmitter with E[h] = 0 and cascaded receivers

with a perfect phase compensation (knowing hkT pk ) in the first stage and a
common scaling β (gain control), which is based on the same CSI as available
at the transmitter.
D.3 Linear Precoding with Statistical Channel State Information 247
Minimization of the CM estimate of the corresponding average sum

MSE (4.24) reads
K
X (hkT pk )∗
min Ehk MSEk P , β T ; hk s.t. kP k2F ≤ Ptx . (D.28)
P ,β
k=1
|hk pk |
Its explicit solution is derived in the sequel [25]. We write the cost function
explicitly
K
(h T pk )∗ X
Ehk MSEk P , β kT ; hk = 1 + |β|2 pH ∗ 2
i C hk pi + |β| cnk
|hk pk | i=1
T T !
(hk pk )∗
− 2Re β Ehk hk pk . (D.29)
|hkT pk |
With (D.24) the cross-correlation term is (with I0 (0) = 1 and I1 (0) = 0)

T √
(hkT pk )∗ π H ∗ 1/2
Ehk hk pk = Ehk hkT pk = pk C hk pk . (D.30)
|hkT pk | 2
The Lagrange function for problem (D.28) is
K
X K
X
H
L(P , β, µ) = K + |β|2 pH
k Eh H H pk + |β|
2
cnk
k=1 k=1
K K
!
√ X 1/2 X
− πRe(β) pH ∗
k C hk pk +µ kpk k22 − Ptx . (D.31)
k=1 k=1
The necessary KKT conditions are
∂L(P , β, µ) 1√ −1/2 ∗
∗ = |β|2 Eh H H H pk − πRe(β) pH ∗
k C hk pk C hk pk
∂pk 2
+ µpk = 0M (D.32)
K K √ X K
∂L(P , β, µ) X H X π 1/2
∗
=β pH
i Eh H H pi + β cnk − pH ∗
k C hk pk =0
∂β i=1
2
k=1 k=1
(D.33)

tr P H P ≤ 0 (D.34)

µ(tr P H P − Ptx ) = 0. (D.35)
Proceeding as in Appendix D.1 we obtain

PK
−2 k=1 cnk
ξ = |β| µ= . (D.36)
Ptx
The first condition (D.32) yields

√
π H ∗ −1/2 ∗
Eh H H H + ξI M pk = β −1 pk C hk pk C hk pk , (D.37)
2
which is a generalized eigenvalue problem. We normalize the precoding vector
pk = αk p̃k (D.38)
such that kp̃k k2 = 1 and αk ∈ R+,0 . For convenience, we transform the

generalized eigenvalue problem to a standard eigenvalue problem2
−1 ∗ 2 1/2
Eh H H H + ξI M C hk p̃k = αk β √ p̃H C ∗ p̃ p̃k . (D.39)
π k hk k
The eigenvalue corresponding to the eigenvector p̃k is defined by

2 1/2
σk = αk β √ p̃H ∗
k C hk p̃k , (D.40)
π
which allows for the computation of the power allocation

√
−1 π
−1/2
αk = σk β p̃H ∗
k C hk p̃k . (D.41)
2
The scaling in P by
√ PK !1/2
π k=1 σk2 (p̃H ∗
k C hk p̃k )
−1
β= (D.42)
2 Ptx
PK
results from the total transmit power Ptx = k=1 αk2 .
To derive the minimum MSE we apply pk = αk p̃k , (D.40), and the
solution for β to the cost function
(D.28). Additionally we use the defi-
nition C ∗hk p̃k = σk (Eh H H H + ξI M )p̃k of σk (D.39), which yields σk =
H
p̃H ∗ H
k C hk p̃k /(p̃k (Eh H H + ξI M )p̃k ), and can express the minimum in terms
of σk
K K
X (hkT pk )∗ πX
Ehk MSEk P , β T ; hk =K− σk . (D.43)
|hk pk | 4
k=1 k=1
2
Although the direct solution of the generalized eigenvalue problem is numerically
more efficient.
D.4 Proof of BC-MAC Duality for AWGN BC Model 249
Therefore, choosing p̃k as the eigenvector corresponding the largest eigenvalue

σk yields the smallest MSE.
D.4 Proof of BC-MAC Duality for AWGN BC Model
We complete the proof in Section 4.5.1 and show the converse: For a given
dual MAC model, the same performance can be achieved in the BC with
same total transmit power. Using the transformations (4.131) to (4.133) the
left hand side of (4.127) reads (cf. (4.49))

Ehk |yT [CORk (P , gk (hk ); hk )] = 1+Ehkmac |yT |gmack (hk )| Ehk |yT |gk (hk )|2
mac 2
K
!
X
,T ,∗
× cnk + cnk ξi2 umac
i Rhkmac |yT umac
i
i=1
h i
mac mac ,T mac mac
− 2Ehkmac |yT Re gmac
k (hk )hk uk pk . (D.44)
With (4.129), (4.127) results in
K
!
X
Ehkmac |yT |gmac
k (hk )| ξk−2 pmac
mac 2
k
,2
1+ ξi2 umac
i
,T
Rhkmac |yT umac
i
,∗
i=1
K
!
X
,2 mac ,T ,∗
= Ehkmac |yT |gmac mac 2
k (hk )| kumac 2
k k2 + pmac
i uk Rhimac |yT umac
k ,
i=1
(D.45)
which simplifies to
K
!
X
,2 ,2 mac ,H ∗
pmac
k = ξk2 kumac 2
k k2 + pmac
i uk Rh mac |yT umac
k
i
i=1
K
X
,2 ,H ∗
− pmac
k ξi2 umac
i Rh mac |yT umac
i . (D.46)
k
i=1
,2 mac ,2 ,2 T
Defining lmac = [pmac
1 , p2 , . . . , pmac
K ] and
 mac ,2 mac ,H ∗
 −pu
 uv Rhmac |yT umac
v , v 6= u
u
K
[W mac ]u,v = kumac k2 + P
pmac ,2 mac ,H ∗
uv Rh mac |yT umac ,u=v (D.47)
 v
 2
i=1
i i
v
i6=v
we obtain the system of equations

W mac ξ mac = lmac (D.48)
in ξ mac = [ξ12 , ξ22 , . . . , ξK

2 T
] .
If kuv k2 > 0 for all v, W mac is (column) diagonally dominant because
mac 2
of
X X
,2 mac ,H ∗
[W mac ]v,v > [W mac ]u,v = pmac
u uv Rhumac |yT umac
v . (D.49)
u6=v u6=v
It has the same properties as W in (4.136).

Identical performance is achieved in the BC with same total transmit
power as in the dual MAC:
K
X K
X K
X
,2
pmac
k = klmac k1 = ξk2 kumac 2
k k2 = kpk k22 . (D.50)
k=1 k=1 k=1
This concludes the “only-if” part of the proof for Theorem 4.1.
D.5 Proof of BC-MAC Duality for Incomplete CSI at

Receivers
To prove Theorem 4.2 we rewrite (4.141) using the proposed transforma-

tions (4.145)
K
X
∗ T
|βk |2 Ehk |yT |gk (hk )|2 cnk + |βk |2 pH 2
i Ehk |y T hk hk |gk (hk )| pi
i=1
K
X ∗ T
= kpk k22 ξk−2 + |βi |2 ξi2 ξk−2 pH 2
k Ehi |y T hi hi |gi (hi )| pk . (D.51)
i=1
It can be simplified to
K
!
X ∗ T
kpk k22 = ξk2 |βk |2 Ehk |yT |gk (hk )|2 cnk + pH 2
i Ehk |y T hk hk |gk (hk )| pi
i=1
K
X ∗ T
− |βi |2 ξi2 pH 2
k Ehi |y T hi hi |gi (hi )| pk , (D.52)
i=1
which yields the system of linear equations
W ξ = l. (D.53)
We made the following definitions

D.5 Proof of BC-MAC Duality for Incomplete CSI at Receivers 251
T
l = kp1 k22 , kp2 k22 , . . . , kpK k22 (D.54)
2 H
∗ T 2

−|βv | pu Ehv |yT hv hv |gv (hv )| pu , u 6= v
[W ]u,v =
|βv |2 Ehv |yT [crv ] − |βv |2 pH E h ∗ T
v hv |y T v vh |g v (h v )|2
pv , u =v
(D.55)
K
X
∗ T
Ehv |yT [crv ] = Ehv |yT |gv (hv )|2 cnv + pH 2
i Ehv |y T hv hv |gv (hv )| pi .
i=1
(D.56)

If cnk > 0 and Ehk |yT |gk (hk )|2 > 0 for all k, W is (column) diagonally
dominant:
X X ∗ T
[W ]v,v > [W ]u,v = |βv |2 pH 2
u Ehv |y T hv hv |gv (hv )| pu . (D.57)
u6=v u6=v
Hence, it is always invertible. Because the diagonal of W is positive on its

diagonal and negative elsewhere, its inverse has only non-negative entries [21]
and ξ is always positive.
The sum of all equations in (D.53) can be simplified and using (4.145) and
the total transmit power in the BC turns out to be the same as in the MAC:
K
X K
X K
X ,2
kpk k22 = 1 + |βk |2 cnk ξk2 Ehk |yT |gk (hk )|2 = pmac
k . (D.58)
k=1 k=1 k=1
To prove the “only if” part of Theorem 4.2, we rewrite the left hand side
of (4.141) with the transformations (4.145)
Ehk |yT [MSEk (P , βk gk (hk ); hk )] =

K
X h i
pmac
k
,2 −2
ξk + ξi2 ξk−2 pmac
k
,2 mac ,H
gi Ehkmac |yT hkmac ,∗ hkmac ,T g mac
i . (D.59)
i=1
Together with (4.141) this yields
K
!
X h i
pmac
k
,2
= ξk2 kg mac
k k2
2
+ pmac
i
,2 mac ,H
gk Ehimac |yT himac ,∗ himac ,T g mac
k
i=1
K
X h i
,2 2 mac ,H mac ,∗ mac ,T
− pmac
k ξ g
i i Ehk
mac |y
T
h k h k g mac
i . (D.60)
i=1
This system of linear equations can be written in matrix notation

W mac ξ mac = lmac with
 h i
 −pmac ,2 mac ,H mac ,∗ mac ,T
u g v E mac
hu |y T h u hu g mac
v , u 6= v
[W mac ]u,v = h i ,
 cv − pmac
v
,2 mac ,H
gv Ehvmac |yT hvmac ,∗ hvmac ,T g mac
v ,u=v
(D.61)
the variance
K
X h i
cv = kg mac
v k22 + pmac
i
,2 mac ,H
gv Ehimac |yT himac ,∗ himac ,T g mac
v
i=1
,2 mac ,2 ,2 T
of the output of g mac
v , and lmac = [pmac
1 , p2 , . . . , pmac
K ] . If kg mac 2
k k2 > 0
mac
for all k, W is (column) diagonally dominant due to
X
[W mac ]v,v > [W mac ]u,v =
u6=v
X h i
= pmac
u
,2 mac ,H
gv Ehumac |yT humac ,∗ humac ,T g mac
v . (D.62)
u6=v
With the same reasoning as for the transformation from BC to MAC, we

conclude that identical performance can be achieved in the BC as in the
MAC with the same total transmit power.
D.6 Tomlinson-Harashima Precoding with Sum

Performance Measures
For optimization of THP in Section 5.3 by alternating between transmitter

and receiver models, we consider the cost function

FTHP(P , F , β, O) = |β|2 Ḡ + |β|2 tr P H X P C w

2
+ Ew βΠ (O) H G P − (I K − F ) w [n] (D.63)
2
with X = R − H H G H G . The definitions of Ḡ, R, and H G depend on the

specific problem addressed in Section 5.3. We would like to minimize (D.63)
subject to a total transmit power constraint tr[P C w P H ] ≤ Ptx and the
constraint on F to be lower triangular with zero diagonal.
First, we minimize (D.63) w.r.t. F . Only the third term of (D.63) depends
on F and can be simplified to
K
X 2
βΠ (O) H G pk − ek + f k cwk (D.64)
2
k=1
D.6 Tomlinson-Harashima Precoding with Sum Performance Measures 253
because of (5.26). Taking into account the constraint on its structure, the
solution F = [f 1 , . . . , f K ] reads

0k×M
f k = −β (O) pk , (D.65)
Bk
where we require the partitioning of

" #
(O)
(O) Ak
Π HG = (O) (D.66)
Bk
(O) (O)
with Ak ∈ Ck×M and B k ∈ C(K−k)×M .
For minimization of (D.63) w.r.t. P , we construct the Lagrange function

L (P , F , β, O, µ) = FTHP(P , F , β, O) + µ tr P C w P H − Ptx (D.67)
to include the transmit power constraint. Identifying C d = C w and

Eh|yT [G(h)H] = (I K − F )H Π (O) H G in (D.2), its solution is given by (D.4)
!−1
Ḡ
P =β −1
R+ IM HH
GΠ
(O),T
(I K − F ). (D.68)
Ptx
Plugging (D.65) into the kth column of (D.68) we get

!−1
Ḡ 0k×M
pk = β −1
R+ IM HH
GΠ
(O),T
ek + β (O) pk , (D.69)
Ptx Bk
which can be solved for the kth column of P

!−1
−1 (O),H (O) Ḡ (O),H
pk = β X+ Ak Ak + IM Ak ek . (D.70)
Ptx
The scaling β is chosen to satisfy the transmit power constraint

tr[P C w P H ] = Ptx with equality
K
!−2
X (O) (O),H (O) Ḡ (O),H
β2 = cwk eT
k Ak X+ Ak Ak + IM Ak ek Ptx .
Ptx
k=1
(D.71)
For X = 0M and high SNR, the matrix inversion lemma A.2.2 has to be
applied to (D.70) and (D.71) to ensure existence of the solution.
It remains to find the precoding order O. Applying (D.65), (D.70),
and (D.71) to (D.63) we can simplify the expression for its minimum to
FTHP(O) =
K
!−1
X (O) (O),H (O) Ḡ (O),H
= tr[C w ] − cwk eT
k Ak X+ Ak Ak + IM Ak ek .
Ptx
k=1
(D.72)
In Section 5.3.1 a suboptimum precoding order is derived.

Appendix E
Channel Scenarios for Performance
Evaluation
Fig. E.1 Uniform linear array with spacing η and mean azimuth direction ϕ̄k of
propagation channel to terminal k (with angular spread).
For the evaluation of the algorithms presented in this book, the fol-
lowing model for the wireless channel is employed: The Q samples of the
P = M K(L + 1) channel parameters hT [q] ∈ CP Q (2.8) are assumed to
be a zero-mean complex Gaussian distributed Nc (0P Q , C hT ) and stationary
random sequence. They are generated based on the Karhunen-Loève expan-
1/2
sion, i.e., hT [q] = C hT ζ[q] with ζ[q] ∼ Nc (0, I P Q ). The structure of C hT is
exploited to reduce the computational complexity.
We assume identical temporal correlations for every element in h[q] (see
Section 2.1 for details), which results in C hT = C T ⊗ C h (2.12) with C T ∈
TQ P
+,0 and C h ∈ S+,0 .
If not defined otherwise, Clarke’s model [206, p. 40] is chosen for the tempo-
ral correlations ch [ℓ] = ch [0]J0 (2πfmax ℓ).1 Its power spectral density
is shown

in Figure 2.12. To be consistent with the definition of C h = E h[q]h[q]H and
to obtain a unique Kronecker model, the first row of the Toeplitz covariance
matrix C T in the Kronecker model is [ch [0], . . . , ch [Q − 1]]T /ch [0].
1
J0 (•) is the Bessel function of the first kind and of order zero.
255
256 E Channel Scenarios for Performance Evaluation
For frequency flat channels (L = 0), we consider spatially well sepa-

rated terminals with uncorrelated channels, i.e., C h is block diagonal with
C hk ∈ SM 2
+,0 , k ∈ {1, 2, . . . , K}, on the diagonal. To model the spatial corre-
lations C hk we choose a uniform linear array with M elements which have an
omnidirectional characteristic in azimuth and equal spacing η (Figure E.1).3
If the narrowband assumption [125] holds, i.e., the propagation time over
the array aperture is significantly smaller than the inverse of the signal band-
width, the spatial characteristic depends only on the carrier wavelength λc .
Mathematically, it is described by the array steering vector
T
a(ϕ) = [1, exp(−j2πη sin(ϕ)/λc ), . . . , exp(−j2πη(M − 1) sin(ϕ)/λc )] ∈ CM ,
which contains the phase shifts resulting from the array geometry. A common
value of the interelement spacing is η = λc /2. The azimuth angle ϕ is defined
w.r.t. the perpendicular on the array’s orientation (Figure E.1).
Thus, the spatial covariance matrix is modeled as
Z π/2
C hk = ck a(ϕ)a(ϕ)H pϕ (ϕ + ϕ̄k ) dϕ (E.1)
−π/2
with mean azimuth direction ϕ̄k and tr[C hk ] = M . A widely used choice
for the probability density pϕ (ϕ + ϕ̄k ) describing the azimuth spread is a
truncated Laplace distribution
√
pϕ (ϕ) ∝ exp − 2|ϕ|/σ , |ϕ| ≤ π/2, (E.2)
where σ is the standard deviation, e.g., σ = 5◦ , of the Laplace distribution

without truncation [166].4 The mean directions for all terminals are summa-
rized in ϕ̄ = [ϕ̄1 , . . . , ϕ̄K ]T , whereas σ and the mean channel attenuation
ck = 1 (normalized) are equal for all terminals for simplicity. We evaluate
the integral (E.1) numerically by Monte-Carlo integration
W
1 X
C hk ≈ ck a(ϕw + ϕ̄k )a(ϕw + ϕ̄k )H (E.3)
W w=1
with W = 100 pseudo-randomly selected ϕw according to pϕ (ϕ) and ϕk,w =

ϕw + ϕ̄k .
Only in Section 2.2.4 we consider a frequency selective channel of order L =
4 for K = 1: C h is block diagonal (2.7) with C h (ℓ) ∈ SM+,0 , ℓ ∈ {0, 1, . . . , L},
on its diagonal due to uncorrelated scattering [14]. The constant diagonal
of C h (ℓ) is the variance c(ℓ) (power delay spectrum), for which the standard
2
The model is tailored to the broadcast channel considered in Chapters 4 and 5.
3
We restrict the model to be planar.
4
See [66, 240, 73] for an overview over other models and [20] for an interesting
discussion regarding the relevance of the Laplace distribution.
E Channel Scenarios for Performance Evaluation 257
PL
model is c(ℓ) ∝ exp(−ℓ/τ ) and ℓ=0 c(ℓ) = 1. For a symbol period Ts = 1 µs,
τ = 1 is a typical value [166]. Else C h (ℓ) is determined as in (E.1).
The noise variances at the terminals are equal for simplicity: cnk = cn .
Appendix F
List of Abbreviations
AWGN Additive White Gaussian Noise

BC Broadcast Channel
BER Bit-Error Rate
C-CSI Complete Channel State Information
CM Conditional Mean
CSI Channel State Information
DPSS Discrete Prolate Spheroidal Sequences
ECME Expectation-Conditional Maximization Either
EM Expectation Maximization
EVD EigenValue Decomposition
EXIP EXtended Invariance Principle
FDD Frequency Division Duplex
FIR Finite Impulse Response
HDS Hidden Data Space
KKT Karush-Kuhn-Tucker
LMMSE Linear Minimum Mean Square Error
LS Least-Squares
MAC Multiple Access Channel
MF Matched Filter
MIMO Multiple-Input Multiple-Output
ML Maximum Likelihood
MSE Mean Square Error
MMSE Minimum Mean Square Error
O-CSI Outdated Channel State Information
pdf probability density function
P-CSI Partial Channel State Information
psd power spectral density
RML Reduced-Rank Maximum Likelihood
Rx Receiver
259
260 F List of Abbreviations
SAGE Space-Alternating Generalized Expectation maximization

S-CSI Statistical Channel State Information
SDMA Space Division Multiple Access
SISO Single-Input Single-Output
SINR Signal-to-Interference-plus-Noise Ratio
SIR Signal-to-Interference Ratio
SNR Signal-to-Noise Ratio
s.t. subject to
TDD Time Division Duplex
TDMA Time Division Multiple Access
TP ThroughPut
THP Tomlinson-Harashima Precoding
VP Vector Precoding
UMTS Universal Mobile Telecommunications System
w.r.t. with respect to
References
1. 3GPP: Universal Mobile Telecommunications System (UMTS); Physical Chan-

nels and Mapping of Transport Channels onto Physical Channels (FDD). 3GPP
TS 25.211, Version 6.6.0, Release 6 (2005). (Online: http://www.etsi.org/)
2. Abramowitz, M., Stegun, I.A.: Handbook of Mathematical Functions with For-
mulas, Graphs, and Mathematical Tables, 9th edn. Dover, New York (1964)
3. Alkire, B., Vandenberghe, L.: Convex Optimization Problems Involving Finite
Autocorrelation Sequences. Mathematical Programming, Series A 93(3), 331–
359 (2002)
4. Anderson, T.W.: Statistical Inference for Covariance Matrices with Linear Struc-
ture. In: Proc. of the Second International Symposium on Multivariate Analysis,
vol. 2, pp. 55–66 (1968)
5. Anderson, T.W.: Estimation of Covariance Matrices which are Linear Combina-
tions or whose Inverses are Linear Combinations of Given Matrices. In: Essays
in in Probability and Statistics, pp. 1–24. The University of North Carolina
Press (1970)
6. Arun, K.S., Potter, L.C.: Existence and Uniqueness of Band-Limited, Positive
Semidefinite Extrapolations. IEEE Trans. Acoust., Speech, Signal Processing
48(3), 547–549 (1990)
7. Assir, R.E., Dietrich, F.A., Joham, M.: MSE Precoding for Partial Channel
State Information Based on MAC-BC Duality. Tech. Rep. TUM-LNS-TR-05-
08, Technische Universität München (2005)
8. Assir, R.E., Dietrich, F.A., Joham, M., Utschick, W.: Min-Max MSE Precoding
for Broadcast Channels based on Statistical Channel State Information. In:
Proc. of the IEEE 7th Workshop on Signal Processing Advances in Wireless
Communications. Cannes, France (2006)
9. Baddour, K.E., Beaulieu, N.C.: Improved Pilot Symbol Aided Estimation of
Rayleigh Fading Channels with Unknown Autocorrelation Statistics. In: Proc.
IEEE 60th Vehicular Technology Conf., vol. 3, pp. 2101–2107 (2004)
10. Bahng, S., Liu, J., Host-Madsen, A., Wang, X.: The Effects of Channel Estima-
tion on Tomlinson-Harashima Precoding in TDD MIMO Systems. In: Proc. of
the IEEE 6th Workshop on Signal Processing Advances in Wireless Communi-
cations, pp. 455–459 (2005)
11. Barton, T.A., Fuhrmann, D.R.: Covariance Structures for Multidimensional
Data. In: Proc. of the 27th Asilomar Conference on Signals, Systems and Com-
puters, pp. 297–301 (1991)
12. Barton, T.A., Fuhrmann, D.R.: Covariance Estimation for Multidimensional
Data Using the EM Algorithm. In: Proc. of the 27th Asilomar Conference on
Signals, Systems and Computers, pp. 203–207 (1993)
261
262 References
13. Barton, T.A., Smith, S.T.: Structured Covariance Estimation for Space-Time
Adaptive Processing. In: IEEE International Conference on Acoustics, Speech,
and Signal Processing, pp. 3493–3496 (1997)
14. Bello, P.A.: Characterization of Randomly Time-Variant Linear Channels. IEEE
Trans. on Comm. Systems 11(12), 360–393 (1963)
15. Ben-Haim, Z., Eldar, Y.C.: Minimax Estimators Dominating the Least-Squares
Estimator. In: Proc. of the IEEE Int. Conf. on Acoustics, Speech and Signal
Processing, vol. 4, pp. 53–56 (2005)
16. Ben-Tal, A., Nemirovski, A.: Robust convex optimization. Mathematics of Op-
erations Research 23 (1998)
17. Ben-Tal, A., Nemirovski, A.: Robust Optimization - Methodology and Applica-
tions. Mathematical Programming Series B 92, 453–480 (2002)
18. Bengtsson, M., Ottersten, B.: Uplink and Downlink Beamforming for Fading
Channels. In: Proc. of the 2nd IEEE Workshop on Signal Processing Advances
in Wireless Communications, pp. 350–353 (1999)
19. Bengtsson, M., Ottersten, B.: Optimum and Suboptimum Transmit Beamform-
ing. In: Handbook of Antennas in Wireless Communications, chap. 18. CRC
Press (2001)
20. Bengtsson, M., Volcker, B.: On the Estimation of Azimuth Distributions and
Azimuth Spectra. In: Proc. Vehicular Technology Conference Fall, vol. 3, pp.
1612–1615 (2001)
21. Berman, A.: Nonnegative Matrices in the Mathematical Sciences. Academic
Press (1979)
22. Beutler, F.: Error Free Recovery of Signals from Irregular Sampling. SIAM Rev.
8, 322–335 (1966)
23. Bienvenu, G., Kopp, L.: Optimality of High Resolution Array Processing Using
the Eigensystem Approach. IEEE Trans. Acoust., Speech, Signal Processing
31(5), 1235–1248 (1983)
24. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press
(2004)
25. Breun, P., Dietrich, F.A.: Precoding for the Wireless Broadcast Channel based
on Partial Channel Knowledge. Tech. Rep. TUM-LNS-TR-05-03, Technische
Universität München (2005)
26. Brewer, J.W.: Kronecker Products and Matrix Calculus in System Theory. IEEE
Trans. Circuits Syst. 25, 772–781 (1978)
27. Brown, J.L.: On the Prediction of a Band-Limited Signal from Past Samples.
Proc. IEEE 74(11), 1596–1598 (1986)
28. Brown, J.L., Morean, O.: Robust Prediction of Band-Limited Signals from Past
Samples. IEEE Trans. Inform. Theory 32(3), 410–412 (1986)
29. Burg, J.P., Luenberger, D.G., Wenger, D.L.: Estimation of Structured Covari-
ance Matrices. Proc. IEEE 70(9), 500–514 (1982)
30. Byers, G.J., Takawira, F.: Spatially and Temporally Correlated MIMO Chan-
nels: Modeling and Capacity Analysis. IEEE Trans. Veh. Technol. 53(3), 634–
643 (2004)
31. Caire, G.: MIMO Downlink Joint Processing and Scheduling: A Survey of Clas-
sical and Recent Results. In: Proc. of Workshop on Information Theory and its
Applications. San Diego, CA (2006)
32. Caire, G., Shamai, S.: On the Achievable Throughput of a Multiantenna Gaus-
sian Broadcast Channel. IEEE Trans. Inform. Theory 49(7), 1691–1706 (2003)
33. Chaufray, J.M., Loubaton, P., Chevalier, P.: Consistent Estimation of Rayleigh
Fading Channel Second-Order Statistics in the Context of the Wideband CDMA
Mode of the UMTS. IEEE Trans. Signal Processing 49(12), 3055–3064 (2001)
34. Choi, H., Munson, Jr., D.C.: Analysis and Design of Minimax-Optimal Interpo-
lators. IEEE Trans. Signal Processing 46(6), 1571–1579 (1998)
References 263
35. Choi, H., Munson, Jr., D.C.: Stochastic Formulation of Bandlimited Signal In-
terpolation. IEEE Trans. Circuits Syst. II 47(1), 82–85 (2000)
36. Chun, B.: A Downlink Beamforming in Consideration of Common Pilot and
Phase Mismatch. In: Proc. of the European Conference on Circuit Theory and
Design, vol. 3, pp. 45–48 (2005)
37. Cools, R.: Advances in Multidimensional Integration. Journal of Computational
and Applied Mathematics 149, 1–12 (2002)
38. Cools, R., Rabinowitz, P.: Monomial Cubature Rules since “Stroud”: A Compi-
lation. Journal of Computational and Applied Mathematics 48, 309–326 (1993)
39. Costa, M.: Writing on Dirty Paper. IEEE Trans. Inform. Theory 29(3), 439–441
(1983)
40. Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley &
Sons (1991)
41. Czink, N., Matz, G., Seethaler, D., Hlawatsch, F.: Improved MMSE Estimation
of Correlated MIMO Channels using a Structured Correlation Estimator. In:
Proc. of the 6th IEEE Workshop on Signal Processing Advances in Wireless
Communications, pp. 595–599. New York City (2005)
42. D. Forney, Jr., G.: On the Role of MMSE Estimation in Approaching the
Information-Theoretic Limits of Linear Gaussian Channels: Shannon Meets
Wiener. In: Proc. 2003 Allerton Conf., pp. 430–439 (2003)
43. Dahl, J., Vandenberghe, L., Fleury, B.H.: Robust Least-Squares Estimators
Based on Semidefinite Programming. In: Conf. Record of the 36th Asilomar
Conf. on Signals, Systems, and Computers, vol. 2, pp. 1787–1791. Pacific Grove,
CA (2002)
44. Dante, H.M.: Robust Linear Prediction of Band-Limited Signals. In: Proc. of
the IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 4, pp.
2328–2331 (1988)
45. Degerine, S.: On Local Maxima of the Likelihood Function for Toeplitz Matrix
Estimation. IEEE Trans. Signal Processing 40(6), 1563–1565 (1992)
46. Dembo, A.: The Relation Between Maximum Likelihood Estimation of Struc-
tured Covariance Matrices and Periodograms. IEEE Trans. Acoust., Speech,
Signal Processing 34(6), 1661–1662 (1986)
47. Dembo, A., Mallows, C.L., Shepp, L.A.: Embedding Nonnegative Definite
Toeplitz Matrices in Nonnegative Definite Circulant Matrices, with Applica-
tion to Covariance Estimation. IEEE Trans. Inform. Theory 35(6), 1206–1212
(1989)
48. Dharanipragada, S., Arun, K.S.: Bandlimited Extrapolation Using Time-
Bandwidth Dimension. IEEE Trans. Signal Processing 45(12), 2951–2966 (1997)
49. Dietrich, F.A., Breun, P., Utschick, W.: Tomlinson-Harashima Precoding: A
Continuous Transition from Complete to Statistical Channel Knowledge. In:
Proc. of the IEEE Global Telecommunications Conference, vol. 4, pp. 2379–
2384. Saint Louis, U.S.A. (2005)
50. Dietrich, F.A., Breun, P., Utschick, W.: Robust Tomlinson-Harashima Precod-
ing for the Wireless Broadcast Channel. IEEE Trans. Signal Processing 55(2),
631–644 (2007)
51. Dietrich, F.A., Dietl, G., Joham, M., Utschick, W.: Robust and reduced rank
space-time decision feedback equalization. In: T. Kaiser, A. Bourdoux, H. Boche,
J.R. Fonollosa, J.B. Andersen, W. Utschick (eds.) Smart Antennas—State-of-
the-Art, EURASIP Book Series on Signal Processing and Communications,
chap. Part 1: Receiver, pp. 189–205. Hindawi Publishing Corporation (2005)
52. Dietrich, F.A., Hoffmann, F., Utschick, W.: Conditional Mean Estimator for the
Gramian Matrix of Complex Gaussian Random Variables. In: Proc. of the IEEE
Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 3, pp. 1137–1140.
Philadelphia, U.S.A. (2005)
264 References
53. Dietrich, F.A., Hunger, R., Joham, M., Utschick, W.: Robust Transmit Wiener
Filter for Time Division Duplex Systems. In: Proc. of the 3rd IEEE Int. Symp.
on Signal Processing and Information Technology, pp. 415–418. Darmstadt, Ger-
many (2003)
54. Dietrich, F.A., Ivanov, T., Utschick, W.: Estimation of Channel and Noise Cor-
relations for MIMO Channel Estimation. In: Proc. of the Int. ITG/IEEE Work-
shop on Smart Antennas. Reisensburg, Germany (2006)
55. Dietrich, F.A., Joham, M., Utschick, W.: Joint Optimization of Pilot Assisted
Channel Estimation and Equalization applied to Space-Time Decision Feedback
Equalization. In: Proc. of the IEEE Int. Conf. on Communications, vol. 4, pp.
2162–2167. Seoul, South Korea (2005)
56. Dietrich, F.A., Utschick, W.: Pilot-Assisted Channel Estimation Based on
Second-Order Statistics. IEEE Trans. Signal Processing 53(3), 1178–1193 (2005)
57. Dietrich, F.A., Utschick, W.: Robust Tomlinson-Harashima Precoding. In: Proc.
of the IEEE 16th Int. Symp. on Personal, Indoor and Mobile Radio Communi-
cations, vol. 1, pp. 136–140. Berlin, Germany (2005)
58. Dietrich, F.A., Utschick, W., Breun, P.: Linear Precoding Based on a Stochastic
MSE Criterion. In: Proc. of the 13th European Signal Processing Conference.
Antalya, Turkey (2005)
59. Dimic, G., Sidiropoulos, N.D.: On Downlink Beamforming with Greedy User
Selection: Performance Analysis and a Simple New Algorithm. IEEE Trans.
Signal Processing 53(10), 3857–3868 (2005)
60. Dogandzic, A., Jin, J.: Estimating Statistical Properties of MIMO Fading Chan-
nels. IEEE Trans. Signal Processing 53(8), 3065–3080 (2005)
61. Dorato, P.: A Historical Review of Robust Control. IEEE Control Syst. Mag.
7(2), 44–56 (1987)
62. Doucet, A., Wang, X.: Monte Carlo Methods for Signal Processing: A Review in
the Statistical Signal Processing Context. IEEE Signal Processing Mag. 22(6),
152–170 (2005)
63. Duel-Hallen, A., Hu, S., Hallen, H.: Long-Range Prediction of Fading Signals.
IEEE Signal Processing Mag. 17(3), 62–75 (2000)
64. Ekman, T.: Prediction of Mobile Radio Channels Modeling and Design. Disser-
tation, Uppsala University (2002). ISBN 91-506-1625-0
65. Erez, U., Brink, S.: A Close-to-Capacity Dirty Paper Coding Scheme. IEEE
Trans. Inform. Theory 51(10), 3417–3432 (2005)
66. Ertel, R.B., Cardieri, P., Sowerby, K.W., Rappaport, T.S., Reed, J.H.: Overview
of Spatial Channel Models for Antenna Array Communication Systems. IEEE
Personal Commun. Mag. 5(1), 10–22 (1998)
67. Eyceoz, T., Duel-Hallen, A., Hallen, H.: Deterministic Channel Modeling and
Long Range Prediction of Fast Fading Mobile Radio Channels. IEEE Commun.
Lett. 2(9), 254–256 (1998)
68. Fessler, J.A., Hero, A.O.: Space-Alternating Generalized Expectation-
Maximization Algorithm. IEEE Trans. Signal Processing 42(10), 2664–2677
(1994)
69. Fischer, R.: Precoding and Signal Shaping for Digital Transmission. John Wiley
& Sons, Inc. (2002)
70. Fischer, R.F.H.: The Modulo-Lattice Channel: The Key Feature in Precoding
Schemes. International Journal of Electronics and Communications pp. 244–253
(2005)
71. Fischer, R.F.H., Windpassinger, C., Lampe, A., Huber, J.B.: MIMO Precoding
for Decentralized Receivers. In: Proc. IEEE Int. Symp. on Information Theory,
p. 496 (2002)
72. Fischer, R.F.H., Windpassinger, C., Lampe, A., Huber, J.B.: Tomlinson-
Harashima Precoding in Space-Time Transmission for Low-Rate Backward
References 265
Channel. In: Proc. Int. Zurich Seminar on Broadband Communications, pp.

7–1 – 7–6 (2002)
73. Fleury, B.H.: First- and Second-Order Characterization of Direction Dispersion
and Space Selectivity in the Radio Channel. IEEE Trans. Inform. Theory 46(6),
2027–2044 (2000)
74. Forster, P., T. Asté: Maximum Likelihood Multichannel Estimation under Re-
duced Rank Constraint. In: Proc. IEEE Conf. Acoustics Speech Signal Process-
ing, vol. 6, pp. 3317–3320 (1998)
75. Franke, J.: Minimax-Robust Prediction of Discrete Time Series. Z. Wahrschein-
lichkeitstheorie verw. Gebiete 68, 337–364 (1985)
76. Franke, J., Poor, H.V.: Minimax-Robust Filtering and Finite-Length Robust
Predictors. Lecture Notes in Statistics: Robust and Nonlinear Time Series Anal-
ysis 26, 87–126 (1984)
77. Fuhrmann, D.: Progress in Structured Covariance Estimation. In: Proc. of the
4th Annual ASSP Workshop on Spectrum Estimation and Modeling, pp. 158–
161 (1998)
78. Fuhrmann, D.R.: Correction to “On the Existence of Positive-Definite
Maximum-Likelihood Estimates of Structured Covariance Matrices”. IEEE
Trans. Inform. Theory 43(3), 1094–1096 (1997)
79. Fuhrmann, D.R., Barton, T.: New Results in the Existence of Complex Covari-
ance Estimates. In: Conf. Record of the 26th Asilomar Conference on Signals,
Systems and Computers, vol. 1, pp. 187–191 (1992)
80. Fuhrmann, D.R., Barton, T.A.: Estimation of Block-Toeplitz Covariance Ma-
trices. In: Conf. Record of 24th Asilomar Conference on Signals, Systems and
Computers, vol. 2, pp. 779–783 (1990)
81. Fuhrmann, D.R., Miller, M.I.: On the Existence of Positive-Definite Maximum-
Likelihood Estimates of Structured Covariance Matrices. IEEE Trans. Inform.
Theory 34(4), 722–729 (1988)
82. Ginis, G., Cioffi, J.M.: A Multi-User Precoding Scheme Achieving Crosstalk
Cancellation with Application to DSL Systems. In: Conf. Record of 34th Asilo-
mar Conf. on Signals, Systems, and Computers, vol. 2, pp. 1627–1631 (2000)
83. Goldsmith, A.: Wireless Communications. Cambridge University Press (2005)
84. Golub, G.H., Loan, C.F.V.: Matrix Computations, 2nd edn. The John Hopkins
University Press (1989)
85. Gray, R.M.: Toeplitz and Circulant Matrices: A review. Foundations and Trends
in Communications and Information Theory 2(3), 155–239 (2006)
86. Grenander, U., G. Szegö: Toeplitz Forms and Their Applications. University of
California Press (1958)
87. Guillaud, M., Slock, D.T.M., Knopp, R.: A Practical Method for Wireless Chan-
nel Reciprocity Exploitation Through Relative Calibration. In: Proc. of the 8th
Int. Symp. on Signal Processing and Its Applications, vol. 1, pp. 403–406 (2005)
88. Haardt, M.: Efficient One-, Two, and Multidimensional High-Resolution Array
Signal Processing. Dissertation, Technische Universität München (1997). ISBN
3-8265-2220-6, Shaker Verlag
89. Harashima, H., Miyakawa, H.: Matched-Transmission Technique for Channels
With Intersymbol Interference. IEEE Trans. Commun. 20(4), 774–780 (1972)
90. Hassibi, B., Hochwald, B.M.: How Much Training is Needed in Multiple-Antenna
Wireless Links? IEEE Trans. Inform. Theory 49(4), 951–963 (2003)
91. Herdin, M.: Non-Stationary Indoor MIMO Radio Channels. Dissertation, Tech-
nische Universität Wien (2004)
92. Higham, N.J.: Matrix Nearness Problems and Applications. In: M.J.C. Gover,
S. Barnett (eds.) Applications of Matrix Theory, pp. 1–27. Oxford University
Press (1989)
266 References
93. Hochwald, B.M., Peel, C.B., Swindlehurst, A.L.: A Vector-Perturbation Tech-

nique for Near-Capacity Multiantenna Multiuser Communications—Part II:
Perturbation. IEEE Trans. Commun. 53(3), 537–544 (2005)
94. Hofstetter, H., Viering, I., Lehne, P.H.: Spatial and Temporal Long Term Prop-
erties of Typical Urban Base Stations at Different Heights. In: COST273 6th
Meeting TD(03)61. Barcelona, Spain (2003)
95. Horn, R.A., Johnson, C.R.: Matrix Analysis, 1st edn. Cambridge University
Press (1985)
96. Horn, R.A., Johnson, C.R.: Topics in Matrix Analysis, 1st edn. Cambridge
University Press (1991)
97. Huber, P.J.: Robust Statistics. John Wiley & Sons (1981)
98. Hunger, R., Dietrich, F.A., Joham, M., Utschick, W.: Robust Transmit Zero-
Forcing Filters. In: Proc. of the ITG Workshop on Smart Antennas, pp. 130–237.
Munich, Germany (2004)
99. Hunger, R., Joham, M., Schmidt, D.A., Utschick, W.: On Linear and Nonlinear
Precoders for Decentralized Receivers (2005). Unpublished Technical Report
100. Hunger, R., Joham, M., Utschick, W.: Extension of Linear and Nonlinear Trans-
mit Filters for Decentralized Receivers. In: Proc. European Wireless 2005, vol. 1,
pp. 40–46 (2005)
101. Hunger, R., Joham, M., Utschick, W.: Minimax Mean-Square-Error Transmit
Wiener Filter. In: Proc. VTC 2005 Spring, vol. 2, pp. 1153–1157 (2005)
102. Hunger, R., Utschick, W., Schmidt, D., Joham, M.: Alternating Optimization
for MMSE Broadcast Precoding. In: Proc. of the IEEE Int. Conf. on Acoustics,
Speech, and Signal Processing, vol. 4, pp. 757–760 (2006)
103. Ivanov, T., Dietrich, F.A.: Estimation of Second Order Channel and Noise
Statistics in Wireless Communications. Tech. Rep. TUM-LNS-TR-05-09, Tech-
nische Universität München (2005)
104. Ivrlac, M.T., Choi, R.L.U., Murch, R., Nossek, J.A.: Effective Use of Long-Term
Transmit Channel State Information in Multi-User MIMO Communication Sys-
tems. In: Proc. of the 57th IEEE Vehicular Technology Conference, pp. 409–413.
Orlando, Florida (2003)
105. Jansson, M., Ottersten, B.: Structured Covariance Matrix Estimation: A Para-
metric Approach. In: Proc. of the IEEE International Conference on Acoustics,
Speech, and Signal Processing, vol. 5, pp. 3172–3175 (2000)
106. Jindal, N.: High SNR Analysis of MIMO Broadcast Channels. In: Proc. of the
Int. Symp. on Information Theory, pp. 2310–2314 (2005)
107. Jindal, N.: MIMO Broadcast Channels with Finite Rate Feedback. IEEE Trans.
Inform. Theory 52(11), 5045–5059 (2006)
108. Joham, M.: Optimization of Linear and Nonlinear Transmit Signal Process-
ing. Dissertation, Technische Universität München (2004). ISBN 3-8322-2913-2,
Shaker Verlag
109. Joham, M., Brehmer, J., Utschick, W.: MMSE Approaches to Multiuser Spatio-
Temporal Tomlinson-Harashima Precoding. In: Proc. of the ITG Conf. on
Source and Channel Coding, pp. 387–394 (2004)
110. Joham, M., Kusume, K., Gzara, M.H., Utschick, W., Nossek, J.A.: Transmit
Wiener Filter. Tech. Rep. TUM-LNS-TR-02-1, Technische Universität München
(2002)
111. Joham, M., Kusume, K., Gzara, M.H., Utschick, W., Nossek, J.A.: Transmit
Wiener Filter for the Downlink of TDD DS-CDMA Systems. In: Proc. 7th
IEEE Int. Symp. on Spread Spectrum Techniques and Applications, vol. 1, pp.
9–13 (2002)
112. Joham, M., Kusume, K., Utschick, W., Nossek, J.A.: Transmit Matched Filter
and Transmit Wiener Filter for the Downlink of FDD DS-CDMA Systems. In:
Proc. PIMRC 2002, vol. 5, pp. 2312–2316 (2002)
References 267
113. Joham, M., Schmidt, D.A., Brehmer, J., Utschick, W.: Finite-Length MMSE
Tomlinson-Harashima Precoding for Frequency Selective Vector Channels.
IEEE Trans. Signal Processing 55(6), 3073–3088 (2007)
114. Joham, M., Schmidt, D.A., Brunner, H., Utschick, W.: Symbol-Wise Order Op-
timization for THP. In: Proc. ITG/IEEE Workshop on Smart Antennas (2007)
115. Joham, M., Utschick, W.: Ordered Spatial Tomlinson Harashima Precod-
ing. In: T. Kaiser, A. Bourdoux, H. Boche, J.R. Fonollosa, J.B. Andersen,
W. Utschick (eds.) Smart Antennas—State-of-the-Art, EURASIP Book Series
on Signal Processing and Communications, chap. Part 3: Transmitter, pp. 401–
422. EURASIP, Hindawi Publishing Corporation (2006)
116. Joham, M., Utschick, W., Nossek, J.A.: Linear Transmit Processing in MIMO
Communications Systems. IEEE Trans. Signal Processing 53(8), 2700–2712
(2005)
117. Johnson, C.R., Lundquist, M., Nævdal, G.: Positive Definite Toeplitz Comple-
tions. Journal of London Mathematical Society 59(2), 507–520 (1999)
118. Johnson, D.B.W.D.H.: Robust Estimation of Structured Covariance Matrices.
119. Jongren, G., Skoglund, M., Ottersten, B.: Design of Channel-Estimate-
Dependent Space-Time Block Codes. IEEE Trans. Commun. 52(7), 1191–1203
(2004)
120. Jung, P.: Analyse und Entwurf digitaler Mobilfunksysteme, 1st edn. B.G. Teub-
ner (1997)
121. Kailath, T., Sayed, A.H., Hassibi, B.: Linear Estimation. Prentice Hall (2000)
122. Kassam, A.A., Poor, H.V.: Robust Techniques for Signal Processing: A Survey.
Proc. IEEE 73(3), 433–481 (1985)
123. Kassam, S.A., Poor, H.V.: Robust Techniques for Signal Processing: A Survey.
Proc. IEEE 73(3), 433–481 (1985)
124. Kay, S.M.: Fundamentals of Statistical Signal Processing - Estimation Theory,
1st edn. PTR Prentice Hall (1993)
125. Krim, H., Viberg, M.: Two Decades of Array Signal Processing Research. IEEE
Signal Processing Mag. 13(4), 67–94 (1996)
126. Kusume, K., Joham, M., Utschick, W., Bauch, G.: Efficient Tomlinson-
Harashima Precoding for Spatial Multiplexing on Flat MIMO Channel. In:
Proc. of the IEEE Int. Conf. on Communications, vol. 3, pp. 2021–2025 (2005)
127. Kusume, K., Joham, M., Utschick, W., Bauch, G.: Cholesky Factorization with
Symmetric Permutation Applied to Detecting and Precoding Spatially Multi-
plexed Data Streams. IEEE Trans. Signal Processing 55(6), 3089–3103 (2007)
128. Lapidoth, A., Shamai, S., Wigger, M.: On the Capacity of Fading MIMO Broad-
cast Channels with Imperfect Transmitter Side-Information. In: Proc. of the
43rd Annual Allerton Conference on Communication, Control, and Computing,
pp. 28–30 (2005). Extended version online: http://arxiv.org/abs/cs.IT/0605079
129. Larsson, E.G., Stoica, P.: Space-Time Block Coding for Wireless Communica-
tions. Cambridge University Press (2003)
130. Laurent, M.: Matrix Completion Problems. In: The Encyclopedia of Optimiza-
tion, vol. III, pp. 221–229. Kluwer (2001)
131. Lev-Ari, H., Parker, S.R., Kailath, T.: Multidimensional Maximum-Entropy Co-
variance Extension. IEEE Trans. Inform. Theory 35(3), 497–508 (1989)
132. Li, C.K., Mathias, R.: Extremal Characterizations of the Schur Complement
and Resulting Inequalities. SIAM Review 42(2), 233–246 (2000)
133. Li, H., Stoica, P., Li, J.: Computationally Efficient Maximum Likelihood Es-
timation of Structured Covariance Matrices. IEEE Trans. Signal Processing
47(5), 1314–1323 (1999)
134. Li, Y., Cimini, L.J., Sollenberger, N.R.: Robust Channel Estimation for OFDM
Systems with Rapid Dispersive Fading Channels. IEEE Trans. Commun. 46(7),
902–915 (1998)
268 References
135. Liavas, A.P.: Tomlinson-Harashima Precoding with Partial Channel Knowledge.

IEEE Trans. Commun. 53(1), 5–9 (2005)
136. Liu, J., Khaled, N., Petré, F., Bourdoux, A., Barel, A.: Impact and Mitiga-
tion of Multiantenna Analog Front-End Mismatch in Transmit Maximum Ratio
Combining. EURASIP Journal on Applied Signal Processing 2006, Article ID
86,931, 14 pages (2006)
137. Lu, J., Darmofal, D.L.: Higher-Dimensional Integration with Gaussian Weight
for Applications in Probabilistic Design. SIAM Journal on Scientific Computing
26(2), 613–624 (2004)
138. Marvasti, F.: Comments on “A Note on the Predictability of Band-Limited
Processes”. Proc. IEEE 74(11), 1596 (1986)
139. Mathai, M., Provost, S.B.: Quadratic Forms in Random Variables - Theory and
Applications. Marcel Dekker (1992)
140. Matz, G.: On Non-WSSUS Wireless Fading Channels. IEEE Trans. Wireless
Commun. 4(5), 2465– 2478 (2005)
141. Matz, G.: Recursive MMSE Estimation of Wireless Channels based on Training
Data and Structured Correlation Learning. In: Proc. of the IEEE/SP 13th
Workshop on Statistical Signal Processing, pp. 1342–1347. Bordeaux, France
(2005)
142. McLachlan, G., Krishnan, T.: The EM Algorithm and Extensions. Wiley Series
in Probability and Statistics. John Wiley & Sons (1997)
143. Medard, M.: The Effect Upon Channel Capacity in Wireless Communications of
Perfect and Imperfect Knowledge of the Channel. IEEE Trans. Inform. Theory
46(3), 933–946 (2000)
144. Meyr, H., Moeneclaey, M., Fechtel, S.A.: Digital Communication Receivers. John
Wiley (1998)
145. Mezghani, A., Hunger, R., Joham, M., Utschick, W.: Iterative THP Transceiver
Optimization for Multi-User MIMO Systems Based on Weighted Sum-MSE
Minimization. In: Proc. of the IEEE 7th Workshop on Signal Processing Ad-
vances in Wireless Communications. France (2006)
146. Mezghani, A., Joham, M., Hunger, R., Utschick, W.: Transceiver Design for
Multi-User MIMO Systems. In: Proc. of the Int. ITG/IEEE Workshop on Smart
Antennas. Reisensburg, Germany (2006)
147. Middleton, D.: Introduction to Statistical Communication Theory. McGraw Hill
(1960)
148. Miller, K.S.: Complex Stochastic Processes, 1st edn. Addison-Wesley (1974)
149. Miller, M.I., Fuhrmann, D.R., O’Sullivan, J.A., Snyder, D.L.: Maximum-
Likelihood Methods for Toeplitz Covariance Estimation and Radar Imaging.
In: Advances in Spectrum Analysis and Array Processing, pp. 145–172. Pren-
tice Hall (1991)
150. Miller, M.I., Snyder, D.L.: The Role of Likelihood and Entropy in Incomplete-
Data Problems: Applications to Estimating Point-Process Intensities and
Toeplitz Constrained Covariances. Proc. IEEE 75(7), 892–907 (1987)
151. Miller, M.I., Turmon, M.J.: Maximum-Likelihood Estimation of Complex Sinu-
soids and Toeplitz Covariances. IEEE Trans. Signal Processing 42(5), 1074–1086
(1994)
152. Monogioudis, P., Conner, K., Das, D., Gollamudi, S., Lee, J.A.C., Moustakas,
A.L., Nagaraj, S., Rao, A.M., Soni, R.A., Yuan, Y.: Intelligent Antenna So-
lutions for UMTS: Algorithms and Simulation Results. IEEE Commun. Mag.
42(10), 28–39 (2004)
153. Mugler, D.H.: Computationally Efficient Linear Prediction from Past Samples
of a Band-Limited Signal and its Derivative. IEEE Trans. Inform. Theory 36(3),
589–596 (1990)
154. Neumaier, A.: Solving Ill-Conditioned and Singular Linear Systems: A Tutorial
on Regularization. SIAM Rev. 40(3), 636–666 (1998)
References 269
155. Newsam, G., Dietrich, C.: Bounds on the Size of Nonnegative Definite Circulant
Embeddings of Positive Definite Toeplitz Matrices. IEEE Trans. Inform. Theory
40(4), 1218–1220 (1994)
156. Nicoli, M., Simeone, O., Spagnolini, U.: Multislot Estimation of Fast-Varying
Space-Time Channels in TD-CDMA Systems. IEEE Commun. Lett. 6(9), 376–
378 (2002)
157. Nicoli, M., Simeone, O., Spagnolini, U.: Multi-Slot Estimation of Fast-Varying
Space-Time Communication Channels. IEEE Trans. Signal Processing 51(5),
1184–1195 (2003)
158. Nicoli, M., Sternad, M., Spagnolini, U., Ahlén, A.: Reduced-Rank Channel Es-
timation and Tracking in Time-Slotted CDMA Systems. In: Proc. IEEE Int.
Conference on Communications, pp. 533–537. New York City (2002)
159. Otnes, R., Tüchler, M.: On Iterative Equalization, Estimation, and Decoding.
In: Proc. of the IEEE Int. Conf. on Communications, vol. 4, pp. 2958–2962
(2003)
160. Palomar, D.P., Bengtsson, M., Ottersten, B.: Minimum BER Linear
Transceivers for MIMO Channels via Primal Decomposition. IEEE Trans. Signal
Processing 53(8), 2866–2882 (2005)
161. Paolella, M.S.: Computing Moments of Ratios of Quadratic Forms in Normal
Variables. Computational Statistics and Data Analysis 42(3), 313 – 331 (2003)
162. Papoulis, A.: A Note on the Predictability of Band-Limited Processes. Proc.
IEEE 73(8), 1332 (1985)
163. Papoulis, A.: Levinson’s Algorithm, Wold’s Decomposition, and Spectral Esti-
mation. SIAM Review 27(3), 405–441 (1985)
164. Papoulis, A.: Probability, Random Variables, and Stochastic Processes, 3rd edn.
WCB/McGraw-Hill (1991)
165. Paulraj, A., Papadias, C.B.: Space-Time Processing for Wireless Communica-
tions. IEEE Signal Processing Mag. 14(6), 49–83 (1997)
166. Pedersen, K.I., Mogensen, P.E., Fleury, B.H.: A Stochastic Model of the Tempo-
ral and Azimuthal Dispersion Seen at the Base Station in Outdoor Propagation
Environments. IEEE Trans. Veh. Technol. 49(2), 437–447 (2000)
167. Pedersen, K.I., Mogensen, P.E., Ramiro-Moreno, J.: Application and Perfor-
mance of Downlink Beamforming Techniques in UMTS. IEEE Commun. Mag.
42(10), 134–143 (2003)
168. Poor, H.V.: An Intoduction to Signal Detection and Estimation, 2nd edn.
Springer-Verlag (1994)
169. Porat, B.: Digital Processing of Random Signals. Prentice-Hall (1994)
170. Porat, B.: A Course in Digital Signal Processing. John Wiley & Sons (1997)
171. Prékopa, A.: Stochastic Programming. Kluwer (1995)
172. Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical
Recipes in C, 2nd edn. Cambridge University Press (1999)
173. Proakis, J.G.: Digital Communications, 3rd edn. McGraw-Hill (1995)
174. Psaltopoulos, G., Joham, M., Utschick, W.: Comparison of Lattice Search Tech-
niques for Nonlinear Precoding. In: Proc. of the Int. ITG/IEEE Workshop on
Smart Antennas. Reisensburg, Germany (2006)
175. Psaltopoulos, G., Joham, M., Utschick, W.: Generalized MMSE Detection Tech-
niques for Multipoint-to-Point Systems. In: Proc. of the IEEE Global Telecom-
munications Conference. San Francisco, CA (2006)
176. Quang, A.N.: On the Uniqueness of the Maximum-Likeliwood Estimate of Struc-
tured Covariance Matrices. IEEE Trans. Acoust., Speech, Signal Processing
32(6), 1249–1251 (1984)
177. Remmert, R.: Theory of Complex Functions. Springer-Verlag (1991)
178. Requicha, A.A.G.: The Zeros of Entire Functions: Theory and Engineering Ap-
plications. Proc. IEEE 68(3), 308–328 (1980)
270 References
179. Rey, F., Lamarca, M., Vazquez, G.: A Robust Transmitter Design for MIMO
Multicarrier Systems with Imperfect Channel Estimates. In: Proc. of the 4th
IEEE Workshop on Signal Processing Advances in Wireless Communications,
pp. 546 – 550 (2003)
180. Rosen, Y., Porat, B.: Optimal ARMA Parameter Estimation Based on the Sam-
ple Covariances for Data with Missing Observations. IEEE Trans. Inform. The-
ory 35(2), 342–349 (1989)
181. Schafer, R.W., Mersereau, R.M., Richards, M.A.: Constrained Iterative Restora-
tion Algorithms. Proc. IEEE 69(4), 432–450 (1981)
182. Scharf, L.L.: Statistical Signal Processing: Detection, Estimation, and Time Se-
ries Analysis. Addison-Wesley (1991)
183. Schmidt, D., Joham, M., Hunger, R., Utschick, W.: Near Maximum Sum-Rate
Non-Zero-Forcing Linear Precoding with Successive User Selection. In: Proc. of
the 40th Asilomar Conference on Signals, Systems and Computers (2006)
184. Schmidt, D., Joham, M., Utschick, W.: Minimum Mean Square Error Vector
Precoding. In: Proc. of the EEE 16th Int. Symp. on Personal, Indoor and
Mobile Radio Communications, vol. 1, pp. 107–111 (2005)
185. Schubert, M., Boche, H.: Solution of the Multiuser Downlink Beamforming
Problem With Individual SINR Constraints. IEEE Trans. Veh. Technol. 53(1),
18–28 (2004)
186. Schubert, M., Boche, H.: Iterative Multiuser Uplink and Downlink Beamform-
ing under SINR Constraints. IEEE Trans. Signal Processing 53(7), 2324–2334
(2005)
187. Schubert, M., Shi, S.: MMSE Transmit Optimization with Interference Pre-
Compensation. In: Proc. of the IEEE 61st Vehicular Technology Conference,
vol. 2, pp. 845–849 (2005)
188. Sharif, M., Hassibi, B.: On the Capacity of MIMO Broadcast Channels with
Partial Side Information. IEEE Trans. Inform. Theory 51(2), 506–522 (2005)
189. Shi, S., Schubert, M.: MMSE Transmit Optimization for Multi-User Multi-
Antenna Systems. In: Proc. of the IEEE Int. Conf. on Acoustics, Speech, and
Signal Processing, vol. 3, pp. 409–412 (2005)
190. Simeone, O., Bar-Ness, Y., Spagnolini, U.: Linear and Non-Linear Precod-
ing/Decoding for MIMO Systems with Long-Term Channel State Information
at the Transmitter. IEEE Trans. Wireless Commun. 3(2), 373–378 (2004)
191. Simeone, O., Spagnolini, U.: Multi-Slot Estimation of Space-Time Channels. In:
Proc. of the IEEE Int. Conf. on Communications, vol. 2, pp. 802–806 (2002)
192. Simeone, O., Spagnolini, U.: Lower Bound on Training-Based Channel Esti-
mation Error for Frequency-Selective Block-Fading Rayleigh MIMO Channels.
193. Sion, M.: On General Minimax Theorems. IEEE Trans. Inform. Theory 8, 171–
176 (1958)
194. Slepian, D.: On Bandwidth. Proc. IEEE 64(3), 292–300 (1976)
195. Slepian, D.: Prolate Spheroidal Wave Functions, Fourier Analysis, and
Uncertainty—V: The Discrete Case. The Bell System Technical Journal 57(5),
1371–1430 (1978)
196. Snyders, J.: Error Formulae for Optimal Linear Filtering, Prediction and In-
terpolation of Stationary Time Series. The Annals of Mathematical Statistics
43(6), 1935–1943 (1972)
197. Solov’ev, V.N.: A Minimax-Bayes Estimate on Classes of Distributions with
Bounded Second Moments. Russian Mathematical Surveys 50(4), 832–834
(1995)
198. Solov’ev, V.N.: On the Problem of Minimax-Bayesian Estimation. Russian
Mathematical Surveys 53(5), 1104–1105 (1998)
199. Soloviov, V.N.: Towards the Theory of Minimax-Bayesian Estimation. Theory
Probab. Appl. 44(4), 739–754 (2000)
References 271
200. Stark, H., Woods, J.W.: Probability, Random Processes, and Estimation Theory
for Engineers, 2nd edn. Prentice Hall (1994)
201. Steiner, B., Baier, P.W.: Low Cost Channel Estimation in the Uplink Receiver
of CDMA Mobile Radio Systems. Frequenz 47(11-12), 292–298 (1993)
202. Stenger, F.: Tabulation of Certain Fully Symmetric Numerical Integration For-
mulas of Degree 7, 9 and 11. Mathematics of Computation 25(116), 935 (1971).
Microfiche
203. Stoica, P., Moses, R.: Spectral Analysis of Signals. Pearson Prentice Hall (2005)
204. Stoica, P., Söderström, T.: On Reparametrization of Loss Functions used in Es-
timation and the Invariance Principle. Signal Processing 17(4), 383–387 (1989)
205. Stoica, P., Viberg, M.: Maximum Likelihood Parameter and Rank Estimation in
Reduced-Rank Multivariate Linear Regressions. IEEE Trans. Signal Processing
44(12), 3069–3078 (1996)
206. Stüber, G.L.: Principles of Mobile Communication. Kluwer Academic Publishers
(1996)
207. Sturm, J.F.: Using SeDuMi 1.02, a MATLAB toolbox for optimization over
symmetric cones. Optimization Methods and Software 11–12, 625–653 (1999)
208. Sultan, S.A., Tracy, D.S.: Moments of the Complex Multivariate Normal Dis-
tribution. Linear Algebra and its Applications 237-238, 191–204 (1996)
209. Taherzadeh, M., Mobasher, A., Khandani, A.K.: LLL Lattice-Basis Reduction
Achieves the Maximum Diversity in MIMO Systems. In: Proc. Int. Symp. on
Information Theory, pp. 1300–1304 (2005)
210. Tejera, P., Utschick, W., Bauch, G., Nossek, J.A.: Subchannel Allocation in Mul-
tiuser Multiple-Input Multiple-Output Systems. IEEE Trans. Inform. Theory
52(10), 4721–4733 (2006)
211. Tepedelenlioğlu, C., Abdi, A., Giannakis, G.B., Kaveh, M.: Estimation of
Doppler Spread and Signal Strength in Mobile Communications with Appli-
cations to Handoff and Adaptive Transmission. Wireless Communications and
Mobile Computing 1(1), 221–242 (2001)
212. Tikhonov, A.N., Arsenin, V.Y.: Solutions of Ill-Posed Problems. V. H. Winston
& Sons (1977)
213. Tomlinson, M.: New Automatic Equaliser Employing Modulo Arithmetic. Elec-
tronics Letters 7(5/6), 138–139 (1971)
214. Tong, L., Perreau, S.: Multichannel Blind Identification: From Subspace to Max-
imum Likelihood Methods. Proc. IEEE 86(10), 1951–1968 (1998)
215. Tong, L., Sadler, B.M., Dong, M.: Pilot-Assisted Wireless Transmissions. IEEE
Signal Processing Mag. 21(6), 12–25 (2004)
216. Trefethen, L.N., Bau, D.: Numerical Linear Algebra. SIAM - Society for Indus-
trial and Applied Mathematics (1997)
217. Tüchler, M., Mecking, M.: Equalization for Non-ideal Channel Knowledge. In:
Proc. of the Conf. on Information Sciences and Systems. Baltimore, U.S.A.
(2003)
218. Tugnait, J., Tong, L., Ding, Z.: Single-User Channel Estimation and Equaliza-
tion. IEEE Signal Processing Mag. 17(3), 17–28 (2000)
219. Utschick, W.: Tracking of signal subspace projectors. IEEE Trans. Signal Pro-
cessing 50(4), 769–778 (2002)
220. Utschick, W., Dietrich, F.A.: On Estimation of Structured Covariance Matrices.
In: J. Beyerer, F.P. León, K.D. Sommer (eds.) Informationsfusion in der Mess-
und Sensortechnik, pp. 51–62. Universitätsverlag Karlsruhe (2006). ISBN 978-
3-86644-053-1
221. Utschick, W., Joham, M.: On the Duality of MIMO Transmission Techniques for
Multiuser Communications. In: Proc. of the 14th European Signal Processing
Conference. Florence, Italy (2006)
222. Vaidyanathan, P.P.: On Predicting a Band-Limited Signal Based on Past Sample
Values. Proc. IEEE 75(8), 1125 (1987)
272 References
223. Vandelinde, V.D.: Robust Properties of Solutions to Linear-Quadratic Estima-

tion and Control Problems. IEEE Trans. Automat. Contr. 22(1), 138–139 (1977)
224. Vanderveen, M.C., der Veen, A.J.V., Paulraj, A.: Estimation of Multipath Pa-
rameters in Wireless Communications. IEEE Trans. Signal Processing 46(3),
682 – 690 (1998)
225. Vastola, K.S., Poor, H.V.: Robust Wiener-Kolmogorov Theory. IEEE Trans.
Inform. Theory 30(2), 316–327 (1984)
226. Verdú, S., Poor, H.V.: On Minimax Robustness: A General Approach and Ap-
plications. IEEE Trans. Inform. Theory 30(2), 328–340 (1984)
227. Viering, I.: Analysis of Second Order Statistics for Improved Channel Estimation
in Wireless Communications. Dissertation, Universität Ulm (2003). ISBN 3-18-
373310-2, VDI Verlag
228. Viering, I., Grundler, T., Seeger, A.: Improving Uplink Adaptive Antenna Al-
gorithms for WCDMA by Covariance Matrix Compensation. In: Proc. of the
IEEE 56th Vehicular Technology Conference, vol. 4, pp. 2149–2153. Vancouver
(2002)
229. Viering, I., Hofstetter, H., Utschick, W.: Validity of Spatial Covariance Matrices
over Time and Frequency. In: Proc. of the IEEE Global Telecommunications
Conference, vol. 1, pp. 851–855. Taipeh, Taiwan (2002)
230. Viswanath, P., Tse, D.N.C.: Sum Capacity of the Vector Gaussian Broadcast
Channel and Uplink-Downlink Duality. IEEE Trans. Inform. Theory 49(8),
1912–1921 (2003)
231. Weichselberger, W., Herdin, M., Özcelik, H., Bonek, E.: A Stochastic MIMO
Channel Model with Joint Correlation of Both Link Ends. IEEE Trans. Wireless
Commun. 5(1), 90–100 (2006)
232. Weingarten, H., Steinberg, Y., Shamai, S.S.: The Capacity Region of the Gaus-
sian Multiple-Input Multiple-Output Broadcast Channel. IEEE Trans. Inform.
Theory 52(9), 3936–3964 (2006)
233. Whittle, P.: Some Recent Contributions to the Theory of Stationary Processes.
In: A Study in the Analysis of Stationary Time Series, Appendix 2. Almquist
and Wiksell (1954)
234. Wiener, N., Masani, P.: The Prediction Theory of Multivariate Stochastic Pro-
cesses I. Acta Math. 98, 111–150 (1957)
235. Wiener, N., Masani, P.: The Prediction Theory of Multivariate Stochastic Pro-
cesses II. Acta Math. 99, 93–137 (1958)
236. Wiesel, A., Eldar, Y.C., Shamai, S.: Linear Precoding via Conic Optimization for
Fixed MIMO Receivers. IEEE Trans. Signal Processing 54(1), 161–176 (2006)
237. Windpassinger, C., Fischer, R.F.H., Huber, J.B.: Lattice-Reduction-Aided
Broadcast Precoding. IEEE Trans. Commun. 52(12), 2057–2060 (2004)
238. Xu, Z.: On the Second-Order Statistics of the Weighted Sample Covariance
Matrix. IEEE Trans. Signal Processing 51(2), 527–534 (2003)
239. Yates, R.D.: A Framework for Uplink Power Control in Cellular Radio Systems.
IEEE J. Select. Areas Commun. 13(7), 1341–1347 (1995)
240. Yu, K., Bengtsson, M., Ottersten, B.: MIMO Channel Models. In: T. Kaiser,
A. Bourdoux, H. Boche, J.R. Fonollosa, J.B. Andersen, W. Utschick (eds.) Smart
Antennas —State-of-the-Art, EURASIP Book Series on Signal Processing and
Communications, chap. Part 2: Channel, pp. 271–292. Hindawi Publishing Cor-
poration (2005)
241. Yu, W., Cioffi, J.M.: Trellis Precoding for the Broadcast Channel. In: Proc. of
the IEEE Global Telecommunications Conf., vol. 2, pp. 1344–1348 (2001)
242. Zemen, T., Mecklenbräuker, C.F.: Time-Variant Channel Estimation Using Dis-
crete Prolate Spheroidal Sequences. IEEE Trans. Signal Processing 53(9), 3597–
3607 (2005)
243. Zerlin, B., Joham, M., Utschick, W., Nossek, J.A.: Covariance Based Linear
Precoding. IEEE J. Select. Areas Commun. 24(1), 190–199 (2006)
References 273
244. Zerlin, B., Joham, M., Utschick, W., Seeger, A., Viering, I.: Linear Precoding
in W-CDMA Systems based on S-CPICH Channel Estimation. In: Proc. of
the 3rd IEEE Int. Symp. on Signal Processing and Information Technology, pp.
560–563. Darmstadt, Germany (2003)
245. Zhang, X., Palomar, D.P., Ottersten, B.: Robust Design of Linear MIMO
Transceivers Under Channel Uncertainty. In: Proc. of the IEEE Int. Conf. on
Acoustics, Speech, and Signal Processing, vol. 4, pp. 77–80 (2006)
Index
Alternating optimization, 152, 204 extrapolation, 37

Antenna gain, 214, 216 Bayesian principle, 124, 184, 242
Approximation broadcast channel, see Beamforming, 6, 123, 172
Broadcast channel nonlinear, 7, 185
Approximation of interference, 138 Bessel functions, 248
Approximation of receiver Bias
phase compensation, see Phase channel covariance matrix, 102
compensation at receiver channel estimation, see Channel
scaled matched filter receiver, see estimation
Scaled matched filter receiver noise covariance matrix, 25
Array steering vector, 258 Broadcast channel, 123
Asymmetry in CSI, see Channel state approximation, 133
information
Autocorrelation sequence, see Autoco- C-CSI, see Channel state information
variance sequence Capacity, 123, 183
Autocovariance sequence, 20 Modulo channel, 195
band-limited, 49 Caratheodory
completion, 93 representation theorem, 67
estimation, 90 Channel covariance matrix, 19
interpolation, 93 estimation with positive semidefinite
least-favorable, see Least-favorable constraint, 103
AWGN broadcast channel, 140 Least-Squares estimation, 99
AWGN channel, 134 Maximum Likelihood estimation, 74,
AWGN fading broadcast channel, 138 84, 99
Channel distribution information, 131
Balancing Channel estimation, 22
mean MSE, 179 Bayesian, 22
mean SINR, 179 bias, 24, 29
MSE, 149 Correlator, 27
SINR, 150 Least-Squares, 25
Band-limited LMMSE, 23
extension, 235 Matched filter, 28
power spectral density, 37 Maximum Likelihood, 24
random process, 37 minimax MSE, 45
random sequence, 37 MMSE, 22
prediction, 37, 47 regularization, 24, 27
sequence, 93 variance, 24, 29
275
276 Index
Channel prediction, 33 of cost (linear precoding), 137, 144,

band-limited channel, 47–54 150
lower bound on MSE, 36 of cost (THP), 199
minimax MSE, 47–54 of cost (vector precoding), 196, 203
MMSE, 33 of cost function, 242
multivariate, 33 Confluent hypergeometric function, 248
perfect, 37 Consistent estimator, 64
univariate, 35 Consistent filters, 173
Channel scenarios, 110, 164, 218, 257 Conventional optimization, 2
Channel state information (CSI), 124, Correlations, separable, 20
130 Correlator, 27
asymmetry, 130, 150 Covariance matrix
at receiver, 132, 144 circulant, 66, 68
Complete (C-CSI), 130 doubly symmetric, 66
Incomplete, 132, 144, 180 Kronecker structure, 20, 91, 257
Partial (P-CSI), 131, 132, 136 of channel, see Channel covariance
Statistical (S-CSI), 131, 158 matrix
of noise, see Noise covariance matrix
Cholesky factorization, 184, 191
sample, 64, 65, 72, 101
Circulant covariance matrix, see
structured, 64
Covariance matrix
Toeplitz, 66, 67
Clarke’s power spectral density, see
with circulant extension, 68
Power spectral density
Crosscorrelation, 143
Common training channel, see Training
CSI, see Channel state information
channel
Complete channel state information, see
Dedicated training channel, see Training
Channel state information
channel
Complete data space, 75
Degrees of freedom, 123, 186, 188
Completion of band-limited sequence,
Deterministic random process, see
see Autocovariance sequence
Random process
Completion of matrices, see Matrix
Diagonally dominant matrix, 175, 252,
completion
253
Completion of Toeplitz matrix, 234 Dirty paper coding, 183
Complexity, see Computational Discrete prolate spheroidal sequences, 55
complexity Diversity order, 184, 214
Computational complexity Doppler frequency
channel estimation, 31 maximum, 37
completion of autocovariance Doubly symmetric covariance matrix,
sequence, 96 see Covariance matrix
lattice search, 190 Downlink, 123
Least-Squares, 108 Duality, 124, 169, 184
linear precoding, 155, 158 definition, 170
Maximum Likelihood, 92, 99 for incomplete CSI at receivers, 176
minimax prediction, 52, 59 mean SINR (SINRk ), 173
Tomlinson-Harashima precoding, 194, MSE, 170
211 partial CSI, 171
univariate prediction, 35 SINR, 170
vector precoding, 207 theorem, 173, 177
Conditional covariance matrix, 227
Conditional mean, 227 E-step, 75, 77, 86
Conditional mean estimate Efficient estimator, 64
log-likelihood function, 75 Ergodic information rate, 136
maximally robust, 44 Error covariance matrix, 145, 148, 209,
of channel, 23 214–215
Index 277
Estimation Karush-Kuhn-Tucker (KKT) conditions,

of noise covariance matrix, see Noise 231
covariance matrix Kronecker product, 228
of channel, see Channel estimation
of channel covariance matrix, see Lagrange function, 104, 231, 245, 249,
Channel covariance matrix 255
of temporal correlations, 90 Laplace distribution, 258
Expectation maximization (EM) Lattice, 184, 188
algorithm, 75 search, 190, 192, 196, 206, 212
Expectation-conditional maximization Least-favorable
either algorithm, 89, 111 autocovariance sequence, 49, 53
Extended invariance principle (EXIP), probability density, 43
65, 99 solution, 42, 43
Extension of sequences, 67, 235 Least-Squares
Extension of Toeplitz matrix, 68 channel estimation, see Channel
existence and uniqueness, 70 estimation
Extrapolation of band-limited sequences, unconstrained, 102
see Band-limited weighted, 65, 99
Extrapolation of sequences, see Least-Squares estimation
Extension of sequences of channel covariance matrix, see
Channel covariance matrix
FDD, see Frequency division duplex of noise covariance matrix, see Noise
Feedback filter, 192, 196, 201, 209 covariance matrix
Forward link, 125 Likelihood equation, 65, 99
Frequency division duplex, 130, 145 Likelihood function, see Log-likelihood
function
Gaussian approximation, 138, 140 Linear MMSE (LMMSE) estimation, see
Gaussian probability density, 227 Channel estimation
least-favorable, 43 maximally robust, 44
Geometric-arithmetic mean inequality, Linear precoding, 123, 126
134 ad-hoc criteria, 123
Gradient, complex, see Wirtinger alternating optimization, 152
calculus AWGN broadcast channel, 140
Grid-of-beams, 132, 146 complete CSI, 133, 149
derivation, 245
Hidden-data space, 75 duality, 251, 252
examples, 159
Ill-posed problem, 15, 38, 68, 84, 93 identical receivers, 150
Incomplete channel state information, incomplete CSI, 144, 151
see Channel state information mean MMSE, 137, 139, 150
Information rate, see Rate mean MMSE (MMSEk ), 143, 151
Initialization mean SINR, 137
channel covariance matrix, 87, 90, 91 mean SINR (SINRk ), 139, 140
noise covariance matrix, 81 mean sum rate, 136
optimization linear precoding, 158 MMSE, 123, 134
optimization nonlinear precoding, MMSE receivers, 137, 153
208, 211 partial CSI, 136, 150
Interference function, 180 performance evaluation, 162
Interpolation of band-limited sequence, performance measures, 132
see Autocovariance sequence power constraint, 127
rank-one channel, 159
Jakes’ power spectral density, see Power rate, 123, 134
spectral density receivers with phase compensation,
Jensen’s inequality, 136, 139 248
278 Index
scaled matched filter receivers, 142, MMSE estimation, see Channel

156 estimation
SINR, 123, 134 MMSE precoding, see Linear pre-
statistical CSI, 158, 248 coding, Vector precoding, and
sum MSE, 149, 152 Tomlinson-Harashima precoding
uncorrelated channel, 161 MMSE prediction, see Channel
with common training channel, 145 prediction
Log-likelihood function, 64, 71, 73, 75 Model uncertainties, 2
conditional mean estimate, 75 Modulo channel, 187
Modulo operator, 186
M-step, 75, 77, 86, 92 Monomial cubature rules, 156
MAC-BC duality, see Duality Monte-Carlo integration, 156, 218, 258
Matched filter, 28 Multiple access channel, 184
relation to MMSE, 28 Multiple access channel (MAC), 170
Matched filter receiver, see Scaled Multiple-Input Multiple-Output
matched filter receiver (MIMO), 1, 14, 33, 183
Matrix completion, 95, 233 Multiplexing gain, 135, 167, 183
Matrix inversion lemma, 229 Multivariate channel prediction, see
Maximum Doppler frequency, see Channel prediction
Doppler frequency Mutual information, 133, 134
Maximum Likelihood estimation
of channel, see Channel estimation Nearest-neighbor quantizer, 187
relation to MMSE, 26 Noise covariance matrix, 19
of channel covariance matrix, see Least-Squares estimation, 99, 108
Channel covariance matrix Maximum Likelihood estimation, 74,
of noise covariance matrix, see Noise 77, 99
covariance matrix Non-negative matrix, 175, 253
of structured covariance matrix, 64 Nonlinear precoding, 183, see Vector
of structured covariance matrix precoding
(existence), 64 Notation, 9
reduced-rank, 26 Numerical complexity, see Computa-
regularization, 27 tional complexity
Maximum Likelihood invariance
principle, 111, 241 Overloaded system, 218
Maximum throughput scheduling, 123,
135 P-CSI, see Channel state information
Maxmin MSE completion, 95 Partial channel state information, see
Mean MMSE, see Linear precod- Channel state information
ing, Vector precoding, and Partial matrix, 233
Tomlinson-Harashima precoding Partial sequence, 49, 235
Mean square error Periodic pattern, 94, 98, 238
linear precoding, see Linear precoding Permutation matrix, 191
minimax, see Minimax Perturbation vector, 188
nonlinear precoding, see Vector Phase compensation at receiver, 147,
precoding 151, 247, 248
Minimax Point-to-multipoint scenario, 123
channel estimation, see Channel Positive semidefinite circulant extension,
estimation 68, 84
channel prediction, see Channel Positive semidefinite extension, 235
prediction Power constraint, 127, 128, 189
equality, 42, 43 Power minimization problem, 178, 180
mean square error estimation, 39, 42 Power spectral density
Minimum norm completion, 95 band-limited, see Band-limited
Missing data, 93 Power spectral density, 235
Index 279
band-pass, 53, 59 S-CSI, see Channel state information

Clarke, 55, 257 SAGE, see Space-alternating gener-
Jakes, 55 alized expectation maximization
least-favorable, 49, 53 algorithm
piece-wise constant, 54 Sample correlation matrix, 189, 193
rectangular, 39, 49, 53 Sample covariance matrix, see Covari-
Precoding ance matrix
for training channel, 127, 144, 145, Scaled matched filter receiver, 142, 200
216 Schur complement, 229
linear, see Linear precoding SDMA, see Space division multiple
nonlinear, see Vector precoding access
Tomlinson-Harashima, see Tomlinson- Second order cone program, 98
Harashima precoding Sector-beam, 132
Precoding order, 184, 193, 210, 212, 255 Semidefinite program, 44, 50, 98, 178
Prediction Separable crosscovariance, 20
minimax MSE, see Channel prediction Signal-to-interference-and-noise ratio,
MMSE, see Channel prediction 123, 134
perfect, see Channel prediction SINR, see Signal-to-interference-and-
Predistortion, 123 noise ratio
Preequalization, 6, 123 Slepian sequences, see Discrete prolate
Principal submatrix, 233 spheroidal sequences
Space division multiple access, 123
Space-alternating expectation max-
Quality of service constraints, 132, 134,
imization (SAGE) algorithm,
178, 184
62
Space-alternating generalized expec-
Random process tation maximization (SAGE)
band-limited, 37 algorithm, 75
deterministic, 37 Spatial causality, 196
Random sequence Spatial covariance matrix
band-limited, see Band-limited estimation, 89
periodic, 66, 69, 84, 90, 92, 94 simulation model, 258
Rank-one channel, 159, 215 Spectral distribution, see Power spectral
Rate, 123, 134, 140 density
maximization, 163 Stationary point, 153, 162, 163, 204,
mean, 136 207, 211
Rayleigh distribution, 19 Statistical channel state information, see
Rayleigh fading, 139 Channel state information
Receiver model, 133 Structured covariance matrix, see
Reciprocity, 125, 130 Covariance matrix
Rectangular power spectral density, see Successive encoding, 183
Power spectral density Sufficient statistic, 72
Reduced-rank Maximum Likelihood, see Sum capacity, see Capacity
Maximum Likelihood estimation Sum Mean Square Error, see Linear
Regularization of Maximum Likelihood, precoding and Vector precoding
24, 27, 68, 84 Sum rate, see Rate
Reverse link, 125
Rice fading, 19, 139, 146 TDD, see Time division duplex
Robust algorithms, 2 Throughput, 134
Robust design, 3 Time division duplex, 125, 130, 194
Robust estimator, 41 Time division multiple access, 123, 135
Robust optimization, 3, 241 Toeplitz covariance matrix, see
maximally robust solution, 41, 44 Covariance matrix
minimax, see Minimax Toeplitz matrix
280 Index
block, 71 with 2-norm, 46

positive semidefinite, see Covariance Unconstrained Least-Squares, see
matrix Least-Squares
Tomlinson-Harashima precoding, 183, Uncorrelated channel, 161
191 Uniform distribution, 193, 196
alternating optimization, 208 Uniform linear array, 19, 110, 258
complete CSI, 191, 197, 213
derivation, 254
Variance, see Channel estimation
example, 214
Variance of interference, 133
feedback filter, 197, 201, 209
mean, 139
mean MMSE, 199
Vec-operator, 228
mean SINR (SINRk ), 199, 202
Vector broadcast channel, 183
MMSE, 184
MMSE receivers, 197 Vector precoding, 184, 188
partial CSI, 194, 208, 214 alternating optimization, 204
performance evaluation, 217 complete CSI, 188, 213
precoding for training channel, 216 example, 214
precoding order, see Precoding order identical receivers, 188
scaled matched filter receivers, 201 lattice search, 191
SINR, 197 mean MMSE, 196
statistical CSI, 214 MMSE, 184
sum performance measures, 203 MMSE receivers, 195
zero-forcing, 184, 193 partial CSI, 194, 204
Trace, 228 power constraint, 189
Training channel, 125 precoding for training channel, 216
common, 132, 145 scaled matched filter receivers, 200
dedicated, 132 statistical CSI, 214
model, 17 sum MMSE, 203
precoding, see Precoding sum performance measures, 203
Training sequence, 17 zero-forcing, 184
Trigonometric moment problem, 237 Voronoi region, 188
UMTS, 131, 145

WCDMA, see UMTS
Uncertainty
Weighted Least-Squares, see Least-
class, 41, 42
Squares
in the model, 2
Wirtinger calculus, 230
set, 45, 49, 51
band-limited autocovariance Worst-case, see Minimax
sequence, 51 Writing on dirty paper, 183
of band-limited autocovariance
sequence, 49 Zero-forcing, 161, 184, 193, 196

Signal Processing For Wireless Communications

Uploaded by

Copyright:

Available Formats

Signal Processing For Wireless Communications

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Signal Processing For Wireless Communications

Uploaded by

Copyright:

Available Formats

Foundations in Signal Processing,

Communications and Networking

Robust Signal Processing

With 88 Figures and 5 Tables

Wolfgang Utschick Holger Boche

ISBN 978-3-540-74246-3 e-ISBN 978-3-540-74249-4

Library of Congress Control Number: 2007937523

© 2008 Springer-Verlag Berlin Heidelberg

Cover design: eStudioCalamar S.L., F. Steinen-Broo, Girona, Spain

Printed on acid-free paper

Optimization of adaptive signal processing algorithms is based on a mathe-

Munich, August 2007 Frank A. Dietrich

2 Channel Estimation and Prediction . . . . . . . . . . . . . . . . . . . . . . . 11

3 Estimation of Channel and Noise Covariance Matrices . . . . 59

3.3.3 Estimation of Channel Correlations in the Time,

4 Linear Precoding with Partial Channel State Information 121

5 Nonlinear Precoding with Partial Channel State

A Mathematical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225

B Completion of Covariance Matrices and Extension of

C Robust Optimization from the Perspective of Estimation

D Detailed Derivations for Precoding with Partial CSI . . . . . . 243

E Channel Scenarios for Performance Evaluation . . . . . . . . . . . . 255

F List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259

Wireless communication systems are designed to provide high data rates

1.1 Robust Signal Processing under Model

The design of adaptive signal processing relies on a model of the underly-

For example, the following practical constraints lead to an imperfect charac-

(a) General approach to robust optimization based on a set

(b) Robust optimization based on estimated model parameters and a

(c) Optimization for the least-favorable choice of parameters

Fig. 1.3 General approach to robust optimization of signal processing algorithms

Sometimes suboptimum algorithms turn out to be less sensitive although

1.2 Overview of Chapters and Selected Wireless

The design of adaptive signal processing at the physical layer of a wireless

and their current or future realization. We treat three related fundamental

• Chapter 2: Channel Estimation and Prediction

overview of the general Maximum Likelihood problem for structured co-

Definitions for sets of numbers, vectors, and matrices are:

For a matrix A ∈ CM ×N , we introduce the following notation for standard

The relation A  B is defined as A − B  0, i.e., the difference is positive

A vector-valued function of a vector-valued argument x is written as f :

The standard convolution operator is ∗. The Kronecker sequence δ[i] is

Estimation and prediction of the wireless channel is a very broad topic. In

Approaches to estimation and prediction of the wireless channel can be clas-

• A structured deterministic model describes the physical channel properties

This model is nonlinear in its parameters, which can be estimated, for

For time-variant channels, either interpolation of the estimated channel

to be known perfectly. Assuming a Gaussian distribution, only perfect knowl-

In this chapter we focus on training sequence based channel estimation, where

2.1 Model for Training Channel

A linear MIMO communication channel with K transmitters and M receivers

Fig. 2.3 The training sequence is time-multiplexed and transmitted periodically

y [q, n] = y (qTb + nTs ) = H[q, n] ∗ s[n] + n[q, n] ∈ CM , n = 1, 2, . . . , N

Fig. 2.5 Linear system model for optimization of channel estimators W ∈ CP ×M N

2.1.0.1 Model for one Block

With n ∈ {1, 2, . . . , N }, we consider those N samples of the receive sig-

Y [q] = H[q]S̄ + N[q] ∈ CM ×N , (2.3)

y [q] = Sh[q] + n[q] (2.4)

2. n[q] is temporally uncorrelated and spatially correlated with

The relation A B is defined as A − B 0, i.e., the difference is positive