Tree-Search Multiple-Symbol Differential Decoding For Unitary Space-Time Modulation
Tree-Search Multiple-Symbol Differential Decoding For Unitary Space-Time Modulation
Tree-Search Multiple-Symbol Differential Decoding For Unitary Space-Time Modulation
N
T
j=1
[s
i,j
[k][
2
= E
b
R/T, 1 i N
T
, where E
b
is the en-
ergy per information bit and T is the modulation interval.
We assume a frequency-nonselective MIMO Rayleigh fad-
ing channel. The equivalent complex baseband (ECB) received
signal r
i,j
[k] at time = kN
T
+i and antenna j is given by
r
i,j
[k] =
N
T
=1
s
i,
[k]h
,j
[] +n
j
[],
1 i N
T
, 1 j N
R
. (2)
h
,j
[] and n
j
[] denote, respectively, the complex fading gain
between transmit antenna and receive antenna j and the zero-
mean complex additive spatially and temporally white Gaussian
noise (AWGN) with variance
2
n
= ^
0
/T effective at the jth
receive antenna at time , while ^
0
is the two-sided noise-power
spectral density in the ECB. The fading channels are assumed
to be spatially uncorrelated with identical temporal correlation
according to Clarkes fading model [16]. In particular, h
,j
[] is
zero-mean complex Gaussian distributed with autocorrelation
hh
[]
= ch
,j
[ + ]h
,j
[] = J
0
(2B
h
T), where J
0
()
and B
h
T denote the zeroth-order Bessel function of the rst kind
and the maximum normalized fading bandwidth, respectively.
While the fading channel model (2) will be used for analytical
purposes in Section IVand performance evaluation in Section V,
the derivation of low-complexity DSTM detectors requires the
assumption of a quasi-static fading channel (QSFC), i.e., the
channel is assumed constant during N
T
consecutive modulation
intervals (e.g., [6], [10]). Dening the QSFC matrix H[k] with
h
i,j
[kN
T
] in rowi and column j, 1 i N
T
, 1 j N
R
, the
QSFC is described by
R[k] = S[k]H[k] +N[k]. (3)
The NN
T
N
R
matrix
R[k] of N received matrix symbols
can be expressed as
R[k] =
S
D
[k]
H[k] +
N[k] (4)
with the NN
T
NN
T
block-diagonal matrix
S
D
[k]
= diag
S[k N + 1], . . . , S[k] and NN
T
N
R
matrices
H[k]
=
_
H
T
[k N + 1], . . . , H
T
[k]
T
and
N[k]
=
_
N
T
[k N +
1], . . . , N
T
[k]
T
.
We would like to point out that the QSFCmodel considered in
(3) and (4) is different fromthe often used block fading model
[1][4], [17], which stipulates that the channel remains constant
during the transmission of N consecutive DSTM symbols.
III. TREE-SEARCH MULTIPLE-SYMBOL DIFFERENTIAL
DECODING
In this section, we propose and design tree-search algorithms
for efcient MSDD. To this end, we provide a suitable represen-
tation of the ML MSDD decision rule in Section III-A, based
on which the tree-search decoding algorithms are developed in
Section III-B. We then present inner decoders, which are nested
into the outer tree-search decoder and optimized for different
DSTM constellations in Section III-C.
A. ML MSDD
ML MSDD processes blocks
R[k] of N consecutively re-
ceived matrix symbols to nd estimates for the N 1 data sym-
bols
V [k]
=
_
V
T
[k N + 2], . . . , V
T
[k]
T
corresponding to
the N transmit symbols
S[k]
=
_
S
T
[k N + 1], . . . , S
T
[k]
T
.
In analogy to the single-antenna case (e.g., [7]) consecutively
processed blocks
R[k] overlap by at least one matrix symbol,
PAULI AND LAMPE: TREE-SEARCH MULTIPLE-SYMBOL DIFFERENTIAL DECODING FOR UNITARY SPACE-TIME MODULATION 1569
i.e., the observation window of length N moves forward by at
most N 1 symbols at a time. For the sake of readability, in the
following we omit the reference [k] to time and address subma-
trices of the earlier block-matrices via subscripts with X
i
being
the th submatrix of
X and 1 i (N or N 1).
Using the QSFC model (4) and the fact that
S
D
is a unitary
matrix, the ML MSDD decision rule can be derived as [13,
Section V]
V = argmin
V V
N 1
tr
R
H
S
D
_
C
1
I
N
T
_
S
H
D
R, (5)
where C
=
_
hh
+
2
n
I
N
_
and
hh
= toeplitz
hh
[0],
hh
[N
T
], . . . ,
hh
[(N 1)N
T
]. Using the Cholesky
factorization C
1
= U
H
U with upper triangular U and
trXX
H
= |X|
2
, we obtain
V = argmin
V V
N 1
_
N1
n=1
|V
n
R
n,n
+X
n
|
2
_
(6)
with
X
n
= S
n+1
N
j=n+1
S
H
j
R
n,j
(7)
where
R
n,j
= u
n,j
R
j
, 1 n N 1, n j N, and u
n,j
is the element of U in row n and column j. It is worth point-
ing out that u
n,j
= p
(Nn)
jn
/
(Nn)
e
, where p
(n)
j
and (
(n)
e
)
2
denote the jth coefcient of the nth order linear backward min-
imum mean-squared error (MMSE) predictor for the discrete
time random process H[k] +N[k] and the corresponding error
variance, respectively, and p
(n)
0
= 1, n, e.g., [18].
We note that the brute-force search would evaluate (6) for
all 2
(N1)N
T
R
V (S
N
= I
N
T
can be chosen without loss of
optimality), i.e., decoding complexity would be exponential in
N, N
T
, and R.
B. Efcient Tree-Search Decoding AlgorithmsOuter Decoder
It can be observed that the ML metric in (6) is a sumof N 1
nonnegative scalar terms
2
n
= |V
n
R
n,n
+X
n
|
2
, 1 n N 1 (8)
which through X
n
and (1) depend on symbols V
j
, n + 1
j N 1. Thus, ML MSDD constitutes a tree-search prob-
lem, and
=
N1
i=n
|V
i
R
i,i
+X
i
|
2
= d
2
n+1
+
2
n
, 1 n N 1
(9)
where d
2
N
= 0 and d
2
1
is the ML metric in (6). Starting at n =
N 1, the SD algorithm selects candidates for symbols V
n
based on tentative decisions V
j
=
V
j
, n + 1 j N 1,
and continues to decrement n as long as the current metric
d
n
does not exceed a given maximum metric (radius) , such
that
d
n
. (10)
If the decoder reaches n = 1, the metric of the currently best
candidate
d
2
n
= d
2
n
+ (n 1)b (11)
with d
2
n
from (9). We propose to use the expected value of
2
n
,
given
V
j
= V
j
, n j N 1, as bias
b
= c
2
n
V
j
= V
j
, n j N 1 = N
T
N
R
, (12)
which leads to a simple implementation. The result in (12) is due
to the fact that X
n
is the MMSE estimate for V
n
R
n,n
, and
that the lter coefcients are normalized by the corresponding
error variance [see (6)]. It should be noted that different from
MSDSD, MSDSD-FM is suboptimal in that the ML MSDD
solution is not necessarily found. Furthermore, it is worth men-
tioning that an SD with Fano-type metric was presented for
coherent detection of space-time codes in [23].
3) Initial Search Radius: When using the SE enumeration,
no initial radius is required, i.e.,
init
can be assumed.
Accordingly, our rst strategy is to initialize the MSDSD al-
gorithm with a very large value of
init
. The disadvantage of
this strategy is a potentially slow convergence or equivalently
high complexity, in cases where the decoder makes false ten-
tative decisions for symbols V
n
with n close to N, before
1570 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 55, NO. 8, AUGUST 2007
reaching n = 1 for the rst time. Therefore, our second strategy
is to use a relatively small
init
for initialization, and in case no
solution
V is found for this radius, to increase it gradually by
steps of
init
[21]. Since for MSDSD(-FM) the expected metric
of the correct solution equals cd
2
1
[
V =
V = (N 1)N
T
N
R
(see Section III-B.2), we choose the squared start radius
2
init
proportional to that value expressed as
2
init
= (N 1)N
T
N
R
c , c > 0 . (13)
The value of the constant c inuences the overall search time.
Simulation results presented in Section V conrm that the use
of a nite initial radius is benecial especially for DSTM with
large constellations.
4) Fano Decoding AlgorithmMSDD-Fano: The Fano de-
coding algorithm is a BFS type algorithm. It has recently been
applied to MSDD of single-antenna DPSK [15] and to coherent
MIMO detection [19]. In this paper, we propose its application
to MSDD for DSTM using the metric of (11), and refer to the
resulting algorithm as MSDD-Fano. We do not repeat the basic
description of the basic Fano algorithm, whose details can be
found in [22].
C. Application To Different Signal ConstellationsInner
Decoders
MSDSD, MSDSD-FM, and MSDD-Fano break the exponen-
tial dependence of decoder complexity on N. Also, in order to
break exponential dependence of decoder complexity on N
T
and R it is necessary to give some thought on the search for in-
dividual candidate symbols
V
n
. In this regard, it is important to
note that minimizing
n
is equivalent to the problem of nding
the ML solution for CDD. This observation allows us to imple-
ment MSDD for DSTM using a nested structure of an outer and
N 1 identical inner decoders. The outer decoder, which solves
(6), initializes an inner decoder at stage n, 1 n N 1, with
matrices
R
n,n
and X
n
to nd a new candidate
V
n
that mini-
mizes
n
. It further provides the inner decoder with: 1) a list of
candidates that have been examined previously given the same
set of tentative decisions
V
j
, n + 1 j N 1, and thus,
are to be excluded from the search; and 2) a start radius r to
force
n
< r such that (10) is still fullled. In our investigations,
we considered diagonal (cyclic) and dicyclic group codes [24]
and orthogonal [11] and Cayley [4] nongroup codes. Due to
space limitations, however, we present details only for diagonal
and orthogonal constellations.
1) Diagonal Constellations: Diagonal constellations are de-
ned by the set [1], [2]
V
D
=
_
diage
2
L
c
1
, . . . , e
2
L
c
N
T
l 0, . . . , L 1
_
.
(14)
Astraightforward approach, which we will refer to as full search
(FS) inner decoding, is to compute
n
for all elements of V
D
and
to process them in order of increasing
n
. In this case, MSDD
benets from the efciency of the outer tree-search algorithm,
but its complexity is still exponential in N
T
and R.
In order to overcome this exponential dependence of com-
plexity on N
T
and R we turn to a more sophisticated strat-
egy using a lattice-based CDD algorithm proposed in [25] (see
also [12]). We will refer to this approach as inner lattice decod-
ing (LD). Using the cosine-approximation for small arguments,
the problem of nding the minimizer for
n
can be turned into a
(degenerated) N
T
-dimensional lattice-decoding problem of the
form
x = argmin
x
N
T
|Bx t|, (15)
where the N
T
N
T
matrix B has non-zero entries b
i,j
only
in the rst column j = 1 and on the main diagonal i = j, and
V
n
= diage
2
L
c
1
, . . . , e
2
L
c
N
T
mod( x
1
,L)
(see [12], [25] for
details). Due to the cosine-approximation,
V
n
is not necessarily
the optimal solution, but as will be seen in Section V only small
performance degradations result. Since x in (15) and as such
V
n
can be obtained from a tree search, we propose two tree-
search algorithms, which are particularly suited for efcient
inner decoding.
Sphere Decoding Algorithm: Due to the sparse lower-
triangular structure of B in (15), a particularly efcient sphere
decoder can be applied. The decoder may use the SE strategy
for nding candidates for x
1
, and increase i, 2 i N
T
as
long as
2
i
= [b
1,1
x
1
t
1
[
2
+
i
j=2
[b
j,1
x
1
+b
j,j
x
j
t
j
[
2
<
r
2
with x
j
= (t
j
b
j,1
x
1
)/b
j,j
|. If
N
T
< r, the current x
1
is stored as tentative decoding result, and the decoder checks
the next candidate for x
1
with updated r =
N
T
. The decoder
terminates when [b
1,1
x
1
t
1
[ > r. Thus, the decoder actually
searches only one dimension of the alleged N
T
-dimensional
search space. We found that this implementation of the in-
ner sphere decoder is more efcient than performing a QR-
decomposition and subsequent SD as was advocated in [12] for
CDD. The reason for this is that in MSDD previously examined
candidates are excluded at the very beginning of the search as
they correspond to x
1
.
Stack Decoding Algorithm: One drawback of the SD algo-
rithm is that the search restarts from i = 1 and x
1
= t
1
/b
1,1
|
every time the inner decoder is called to provide the next best
candidate V
n
, given the same tentative decisions
V
j
, n + 1
j N 1. The stack algorithm (e.g., [22]) seems to be an in-
teresting alternative for the inner decoder, since it maintains a
sorted list of examined paths, and decoding can be continued if
the inner decoder is called again.
As the regular stack algorithm is not directly applicable for
solving (15), because x
i
, 1 i N
T
, are not from a nite set,
we propose a modied stack decoder. Instead of replacing the
path x
(i)
= [x
1
, . . . , x
i
] currently at the top of the stack with
all its extensions x
(i+1)
= [x
(i)
, x
i+1
], we only consider its
best extension x
(i+1)
, and due to the sparse structure of the
matrix in (15), the next candidate for x
i
is provided only when
i = 1 according to the SE strategy. The search can be terminated
without loss of optimality, if the metric of the path at the top
of the stack exceeds the start radius r. As for the regular stack
algorithm, the rst path of length N
T
to reach the top of the stack
is the optimal path corresponding to the (next-)best candidate
for V
n
. It remains to store the stack and to continue with this
stack when the next candidate is required. While, theoretically,
limitation of the stack size may lead to decoding errors, we
PAULI AND LAMPE: TREE-SEARCH MULTIPLE-SYMBOL DIFFERENTIAL DECODING FOR UNITARY SPACE-TIME MODULATION 1571
found that limiting the maximum stack size to 2L does not
incur a noticeable performance degradation.
2) Orthogonal Designs (ODs): A second important class of
nongroup DSTM codes are ODs. In particular, we consider
DSTM based on Alamoutis code [11], where data symbols V
are taken from the set
V
O
=
_
1
2
_
a b
b a
a, b
L PSK
_
. (16)
For ODs, it is relatively straightforward to show that
2
n
can be
expressed as
2
n
=
n
+ Rea
n
n
+ Reb
n
n
, 1 n N 1
(17)
with
n
=
2
N
R
j=1
r
n,n,1,j
x
n,1,j
+ r
n,n,2,j
x
n,2,j
,
n
N
R
j=1
r
n,n,1,j
x
n,2,j
r
n,n,2,j
x
n,1,j
, and
n
= |
R
n,n
|
2
+
|X
n
|
2
, where r
n,n,i,j
and x
n,i,j
denote the elements in the ith
row and jth column of
R
n,n
and X
n
, respectively. Apparently,
a
n
and b
n
that minimize
2
n
in (17) are independent of each
other. Hence, the inner decoding simplies to two SE enumera-
tions applied to
L-PSK symbols a
n
and b
n
, respectively [14].
Consequently, outer and inner decoder for MSDD can be
integrated into one decoder searching a (2N 2)-dimensional
tree of
L-PSK symbols. If the MSDSD algorithm from
Section III-B.1 is applied, this decoder is a direct extension of
single-antenna MSDSD from [14] to DSTM in the same way
as for single-antenna MSDSD.
It is worth mentioning that recently, a similar description of
MSDD for DSTM with ODs has been presented in [26]. There,
for the special case of L = 16 (ODs with 4PSK symbols) and
block fading, the detection problemis formulated as a 4(N 1)-
dimensional search in binary variables. The scheme proposed in
[26] can, however, not be extended to arbitrary fading channels
as considered in this paper.
IV. SER ANALYSIS AND SUBSET MSDD
In this section, we rst derive an approximation for the SER
of ML MSDD (Section IV-A). In particular, we consider the
individual SERs for the N 1 data symbols in each block
V .
Evaluation of these individual SERs suggests that decoding of
only a subset of all N 1 symbols can be greatly benecial.
The corresponding subset MSDDalgorithmis presented in Sec-
tion IV-B.
A. SER Analysis
It is intuitive that due to different correlations between the
received samples in
R, symbol decisions on the N 1 data
symbols in
V are not equally reliable. Especially in relatively
fast fading environments it can be expected that symbols located
in the center of the observation window can be detected more
reliably than those at the edges. In order to substantiate this in-
tuitive reasoning, we consider the individual symbol-error rates
SER
n
for V
n
, 1 n N 1.
The pairwise error probability PEP(
S) for detect-
ing
S while
S ,=
S)
=
RHpoles
Res
_
1
s
NN
T
i=1
_
1 +s
i
_
rr[
S
F
__
N
R
_
(18)
where the summation in (18) is taken over all residues corre-
sponding to poles located in the right-hand (RH) side of the
complex s-plane,
i
(X) denotes the ith eigenvalue of X, and
rr[
=
S
D
_
C I
N
T
_
S
H
D
(19)
F
=
S
D
_
C
1
I
N
T
_
S
H
D
S
D
_
C
1
I
N
T
_
S
H
D
(20)
C
= toeplitz
hh
[0],
hh
[1], . . . ,
hh
[NN
T
1]
+
2
n
I
NN
T
(21)
S
D
= diag
S
1,:
,
S
2,:
, . . . ,
S
NN
T
,:
(22)
where
S
i,:
denotes the ith row of
S. Note that
C is the au-
tocorrelation matrix of the regular fading-plus-noise process
[see (2)]. Depending on the multiplicities of the eigenvalues
j
(
rr[
S
F), analytical or numercial evaluation of (18) may be
preferrable [27][30].
With this, the SER
n
for V
n
can be upper bounded using the
union bound, averaged over all L
N
transmit sequences:
SER
n
1
L
N
S,
V
n
,=V
n
PEP(
S). (23)
Evaluating (23) is computationally tractable only for small con-
stellations and observation window sizes N. In order to obtain
a simpler approximation for SER
n
we examine the PEP of (18)
more closely. It can be shown that it depends on
S and
S only
through the matrix
P(
C I
N
T
)P
H
S
H
D
S
D
_
C
1
I
N
T
_
S
H
D
S
D
_
C
1
I
N
T
_ _
(24)
where we introduced P
=
S
H
D
S
D
.
1) Group Constellations: For L being a power of two, diag-
onal (or cyclic) and dicyclic codes [24] fully represent full-rank
unitary group constellations. In both cases, P is a sparse ma-
trix with a single one in each row. Whereas P is constant and
independent of
S
D
for diagonal constellations, there are 2
N
dif-
ferent matrices P in case of dicyclic constellations. This means
that the set of L
N
matrices
S can be partitioned into K equiva-
lence classes
S
k
P
with respect to P, 1 k K, where K = 1
and K = 2
N
for diagonal and dicyclic constellations, respec-
tively. Furthermore, since
S
H
D
S
D
= diagS
H
1
S
1
, . . . , S
H
N
S
N
with S
H
n
S
n
V, 1 n N, for group constellations, the in-
ner sum of (23) is independent of which class representative
S
S
k
P
is chosen. We can, thus, limit the outer summation to
only K L
N
terms (K = 1 or K = 2
N
).
1572 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 55, NO. 8, AUGUST 2007
To reduce the number of terms of the inner sum in (23), we
propose to take only the dominating instead of all L
N1
1
relevant error events into account. Following a similar reason-
ing as in [28] for single-antenna DPSK and utilizing the results
of [17] for the design of unitary space-time constellations, these
dominant error events are found to correspond to matrix pairs
S,
S
H
= Re tr
S
H
n
S
n
, 1 n N (25)
is maximized, which also maximizes . Collecting all matrices
S = [S
T
1
. . . S
T
n1
S
T
n
S
T
n+1
. . . S
T
N
]
T
with
S
n
,= S
n
maxi-
mizing
n
in sets
S
n
, 1 n N, we can approximate SER
n
by
SER
n
1
K
S
k
P
k=1. . . K
_
_
_
S
n
PEP(
S) +
S
n+1
PEP(
S)
_
_
_
(26)
for 1 n N 1.
2) Nongroup Constellations: For nongroup constellations,
e.g., orthogonal [11] and Cayley [4] codes, a simplication
of (23) based on equivalence classes with respect to P is not
possible. Furthermore, it is not guaranteed that the quadruple
Q
n
= S
n
,
S
n
, S
n+1
,
S
n+1
with
S
n
,= S
n
,
S
n+1
= S
n+1
,
and
S
n
maximizing (25) is admissible, i.e., that there is a
V
n
V such that
S
n+1
=
V
n
S
n
. Hence, the correlation
n
(25) is not necessarily an appropriate indicator for the domi-
nant error events for every realization of
S. However, we found
from numerical evaluations that the SER of nongroup codes
is approximated quite accurately using (26) with a few can-
didates
S for which Q
n
is admissible and
S
n
according to
(25), 1 n N 1. For the performance evaluation in Sec-
tion V, we use this approximation with the Ladmissible matrices
S = [I
N
T
, (V
(l)
)
T
, I
N
T
, (V
(l)
)
T
, . . .]
T
, 1 l L.
3) Evaluation: At this point, it is illustrative to consider
SER
n
for the example of diagonal DSTM with parameters
N
T
= 3, R = 1, B
h
T = 0.03, and N
R
= 1. Fig. 1 shows nu-
merical results from (26) and simulation results for the SNR
in terms of 10 log
10
(E
b
/^
0
) required to achieve, respectively,
SER
n
= 10
5
(solid lines) and the average error rate SER =
(1/N 1)
N1
n=1
SER
n
= 10
5
(dashed lines) as function of
the position n, 1 n N 1, for different window sizes N.
MSDD is implemented using MSDSD with FS inner decoding
(MSDSD-FS). Also included is the SER for coherent detection
assuming perfect CSI.
First, we observe a good agreement between the SERapproxi-
mation from (26) and the simulated SER. Second, it can be seen
that the individual error rates SER
n
are almost identical for
symbols V
n
, 2 n N 2, but signicantly deteriorate for
symbols at the edges of the observation window. More speci-
Fig. 1. Required 10 log
10
(E
b
/^
0
) to achieve SER = 10
5
for position
n in observation window of MSDSD-FS. Parameters: Diagonal constellation,
N
T
= 3, N
R
= 1, R = 1, and B
h
T = 0.03.
cally, there are differences of 58 dB in power efciency when
comparing edge and nonedge positions.
B. Subset MSDD
The observations from Fig. 1 strongly suggest a variant of
MSDD, which we refer to as subset MSDD. For subset MSDD,
only the decoding decisions for the N
/
1 N 1 symbols
V
n
located in the middle of the observation window, i.e.,
(N N
/
)/2 < n < (N +N
/
)/2, N N
/
even, are passed to
the sink, whereas the other N N
/
estimates are discarded.
Consequently, the observation window must slide forward by
N
/
1 received symbols at a time, and the decoding complex-
ity is increased by a factor of (N 1)/(N
/
1).
For the example considered here, the error-rate results in
Fig. 1 suggest that only the decisions corresponding to the very
edge symbols at positions n = 1 and n = N 1 should be
discarded, i.e., N
/
= N 2 should be used. When comparing
the resulting average SER of subset MSDD with that of MSDD,
from Fig. 1 we observe gains in power efciency of 6.8, 3.0,
and 1.5 dB for N = 4, N = 10, and N = 20, while complexity
is increased by a factor of 3, 1.29, and 1.12, respectively. It is
also interesting to note that subset MSDD with N = 10 and
N
/
= 8 approaches coherent detection with perfect CSI within
2.0 dB, which is quite remarkable considering the low error
rate SER = 10
5
and the large normalized fading bandwidth
B
h
T = 0.03.
V. PERFORMANCE EVALUATION
In this section, we present numerical and simulated SER re-
sults (Section V-A) and a complexity comparison (Section V-
B) for the different proposed MSDD implementations. Table I
provides an overview of the different MSDD algorithms (see
Sections III-B, III-C, and IV-B), where check-marks indicate
that the subset MSDD variant can be applied, which we denote
by a sufx -S in the following, and that a particular inner
PAULI AND LAMPE: TREE-SEARCH MULTIPLE-SYMBOL DIFFERENTIAL DECODING FOR UNITARY SPACE-TIME MODULATION 1573
TABLE I
SUMMARY OF ALGORITHMS FOR DIFFERENTIAL DETECTION FOR UNITARY DSTM
Fig. 2. Comparison of various decoders with N = 10 (CDD with N = 2).
Required 10 log
10
(E
b
/^
0
) to achieve SER = 10
5
versus B
h
T. Diagonal
constellation with N
T
= 3, N
R
= 1, R = 1. Lines: numerical results, Mark-
ers: simulation results.
decoder is applicable, which is denoted by an (additional) sufx
-FS (all constellations), -LD (only for diagonal constel-
lations), respectively. As benchmark algorithms, we also con-
sider: 1) CDD (N = 2); 2) DFDD [5], [12]; and 3) the branch-
intersect-detect MSD decoder (MSDD-BID) of [10], which is
a suboptimal MSDD algorithm for diagonal constellations. Un-
less specied otherwise, the channel model according to (2) is
applied for simulation.
A. SER Results
We present SER results for three illustrative examples. In
case of DSTM with diagonal constellations we used parameters
[c
1
, . . . , c
N
T
] from [2, Table I].
1) Diagonal Constellation With R = 1, N
T
= 3, N
R
= 1,
and MSDD With N = 10: Fig. 2 compares various decoders
(with FSinner decoding) in terms of the SNRrequired to achieve
SER = 10
5
as function of the normalized fading bandwidth
B
h
T. Both numerical results according to Section IV-A (solid
lines) and simulation results (markers) are plotted. As reference
curves, the SNRfor MSDSD-FS under block-fading (B
h
T = 0)
and for coherent detection with perfect CSI are also shown.
First, we note the good match of the SERapproximation from
Section IV-A and the simulated SER. The slight uctuations
Fig. 3. Comparison of FS and LD-based inner decoding. SER versus
10 log
10
(E
b
/^
0
) for MSDSD with N = 10, MSDSD-S with N = 10 and
N
/
= 8, DFDD, and CDD. Diagonal constellation with N
T
= 3, N
R
=
1, R = 1, and B
h
T = 0.03. Solid lines: FS; Dashed lines: LD.
of the numerically obtained curves are due to the particular
fading autocorrelation according to the Bessel function. Second,
we observe that CDD-FS suffers from a relatively high error
oor already in moderately fast fading with B
h
T 0.006. The
performance of MSDD-BID approaches that of ML MSDD for
B
h
T 0.01, but rapidly deteriorates as B
h
T grows. MSDSD-
FS is considerably more robust to fast fading than MSDD-BID,
and also outperforms DFDD by about 26 dB depending on
the fading bandwidth. Suboptimal MSDD-Fano-FS performs
within 0.30.7 dB of optimal MSDSD-FS. Finally, it can be
seen that subset MSDD (MSDSD-S-FS) signicantly improves
performance in the fast fading regime. Almost all the gain in
power efciency is already accomplished with N
/
= 8, which
corresponds to a moderate complexity increase by a factor of
1.29 compared to MSDD with N = 10.
In Fig. 3, we evaluate the performance with inner LD for
an exemplary fading bandwidth of B
h
T = 0.03. MSDSD,
MSDSD-S, and DFDD with N = 10 and CDD are considered,
and LD inner decoding (dashed lines) is compared with FS
inner decoding (solid lines). It can be seen that the cosine-
approximation involved in LD causes only small performance
degradations in the order of 0.20.3 dB regardless of the partic-
ular detector. We also observed (not shown in Fig. 3) that even
1574 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 55, NO. 8, AUGUST 2007
Fig. 4. Comparison of various decoders with N = 6 (CDD with N = 2)
and LD-based inner decoding. SER versus 10 log
10
(E
b
/^
0
) for diagonal
constellation with N
T
= 3, N
R
= 1, R = 2, and B
h
T = 0.03.
smaller performance losses occur at higher rates R > 1, since
the cosine-approximation becomes more accurate.
2) Diagonal Constellation With R = 2, N
T
= 3, N
R
=
1, B
h
T = 0.03, and MSDD With N = 6: The power efciency
of MSDSD-FM is compared to optimal MSDSD and MSDD-
Fano in Fig. 4. As FS inner decoding is computationally com-
plex for this scenario with L = 64, we consider LD in all cases.
The respective curves for CDD-LD, DFDD-LD, and MSDD-
BID are also included for comparison. MSDSD-FM-LD clearly
outperforms CDD-LD and MSDD-BID, which suffer from a
high error oor in this fast fading scenario. We further observe
that the use of the suboptimal Fano-type metric results in rela-
tively small performance losses of 0.51.0 dB. As expected,
the performance of MSDSD-FM-LD is very similar to that
of MSDD-Fano-LD. Finally, MSDSD-FM-LD and especially
MSDSD-FM-S-LD achieve signicant gains over DFDD-LD,
which has been studied in detail in [12].
3) Orthogonal Design With R = 1, N
T
= 2, N
R
= 1, and
MSDD With N = 10: Fig. 5 shows the performance of DSTM
with ODs in terms of the SNRrequired to achieve SER = 10
5
.
MSDSD, using the efcient decoder described in Section III-C,
is compared with DFDD and CDD (N = 2), respectively.
Numerical results (see Section IV-A, lines) and simulation
results (markers) are plotted. Since for DSTM with ODs the
N
T
antennas transmit constantly, the QSFC model is only an
approximation. In order to illustrate the degradation due to the
deviation of the QSFC model from the underlying channel (2),
numerical and simulation results assuming the QSFC model
are also included in Fig. 5 (dashed lines).
First, we observe that SERs fromthe approximation in Section
IV-A and simulated SERs closely match also for this nongroup
constellation. Furthermore, it can be seen that the performance
is limited by channel variations during the transmission of one
STsymbol, which are not accounted for in the QSFCmodel, and
thus, cause an error oor regardless of the detection algorithm.
MSDSD and DFDD can cope with much faster fading than
Fig. 5. Comparison of various decoders with N = 10 (CDD with N = 2).
Required 10 log
10
(E
b
/^
0
) to achieve SER = 10
5
versus B
h
T. Orthogonal
design with N
T
= 2, N
R
= 1, and R = 1. Fading channel model according
to (2) (solid lines) and QSFC (dashed lines). Lines: numerical results, Markers:
simulation results.
CDD. The performance gains due to MSDSD over DFDD are
between about 1 dB and 4 dB depending on B
h
T.
B. Computational Complexity
We now compare the computational complexity of the pro-
posed algorithms. In particular, we consider the effects of a nite
initial search radius
init
(see Section III-B) and the use of sphere
and stack decoding algorithms for inner LD (see Section III-C);
and compare the complexity of the proposed MSDD implemen-
tations with those of CDD, DFDD, and MSDD-BID. As a mea-
sure of complexity, we choose the average number of real-valued
ops per decoded symbol. Exemplarily, we present results for di-
agonal DSTMwith R = 2, N
T
= 3, N
R
= 1, B
h
T = 0.03, and
MSDD with N = 6 (see Fig. 4 for performance results).
1) Initial Radius: To study the effect of the initial radius
we consider MSDSD-FS (the results are very similar for other
MSDSD algorithms). Fig. 6 depicts the complexity for differ-
ent SNRs as a function of the constant c introduced in (13).
While complexity increases if the initial radius is chosen too
small, substantial complexity savings of up to 60% compared to
init
can be obtained by choosing an optimal nite start
radius. It is interesting to note that for the optimal parameter
c 1.7, the transmitted block
V lies inside the sphere with ra-
dius
init
with a probability of about 95% independent of the
SNR. It can also be observed that, for this relatively fast fading
example, the complexity of MSDSD with innite initial radius
may increase for larger SNR. This is due to the fact that for high
SNR the radius after the rst tentative decision
V may be much
larger than the metric for the ML decision. A nite start radius,
on the other hand, leads to the more intuitive result that the de-
coding complexity decreases monotonically with SNR. All of
the following results assume the optimal nite start radius with
c = 1.7.
PAULI AND LAMPE: TREE-SEARCH MULTIPLE-SYMBOL DIFFERENTIAL DECODING FOR UNITARY SPACE-TIME MODULATION 1575
Fig. 6. Complexity of MSDSD-FS with N = 6 versus parameter c [initial
squared radius
2
init
= (N 1)N
T
N
R
c]. Diagonal constellation with N
T
=
3, N
R
= 1, R = 2, and B
h
T = 0.03.
Fig. 7. Complexity versus 10 log
10
(E
b
/^
0
) for different MSDSD algo-
rithms with N = 6. Diagonal constellation with N
T
= 3, N
R
= 1, R = 2,
and B
h
T = 0.03. Solid lines: without Fano-type metric, Dashed lines: with
Fano-type metric. LD with sphere and stack decoder, respectively.
2) MSDSD versus MSDSD-FM and Inner Decoding With FS
versus LD: Fig. 7 compares the different possibilities to reduce
the complexity of MSDSD, i.e., MSDSD versus MSDSD-FM
and MSDSD(-FM) with inner FS versus inner LD are consid-
ered. LDis implemented using sphere and stack decoder, respec-
tively. It can be seen that the complexity of MSDSD decreases
rapidly with increasing SNR, since the search quickly terminates
for small enough noise. Inner LDleads to a signicant complex-
ity reduction, and the stack decoder is clearly advantageous over
the sphere decoder. Furthermore, applying the Fano-type met-
ric speeds up the search especially in the lower SNR range, as
was anticipated in Section III-B. Combining Fano-type metric
with inner LD using the stack algorithm, i.e., MSDSD-FM-LD,
yields a tremendous complexity reduction by a factor of about
10 to 50 compared to MSDSD-FS in the range of relevant SNR.
Fig. 8. Complexity versus 10 log
10
(E
b
/^
0
) for MSDSD-FM-LD, MSDD-
Fano-LD, and benchmark algorithms with N = 6 (CDD-LD with N = 2).
Diagonal constellation with N
T
= 3, N
R
= 1, R = 2, and B
h
T = 0.03.
MSDSD-LD-FM with inner stack algorithm.
This gain comes at the expense of a less than 1 dB loss in power
efciency (see Fig. 4).
3) Comparison With Benchmark Decoders: Fig. 8 compares
the average complexity of MSDSD-FM-LD considered earlier
with those of MSDD-Fano-LD and the benchmark detectors
CDD-LD, DFDD-LD, and MSDD-BID, respectively. While
achieving a similar power efciency (Fig. 4), MSDSD-FM-LD
requires a lower complexity than MSDD-Fano-LD. The com-
plexity of MSDSD-FM-LD is also well in the range of those
of DFDD-LD and CDD-LD, which are less power efcient and
suffer from a high error oor, respectively. MSDD-BID, which
also shows a high error oor for this transmission scenario, is
considerably more complex than MSDSD-FM-LD.
We, thus, conclude that MSDSD-FM-LD and its subset
MSDD version with the stack algorithm for inner LD offer
the overall best performance-complexity tradeoff.
VI. CONCLUSION
In this paper, we have investigated tree-search MSDD. We
have proposed a nested decoder structure consisting of an outer
and N 1 inner tree-search decoders, where the inner decoders
efciently search the signal space of the potentially large DSTM
constellation. Decoder designs tailored for diagonal and orthog-
onal DSTM codes have been given. Considering the underlying
continuous MIMO fading channel, a tight SER approximation
for MSDDhas been obtained, which can be evaluated efciently.
The novel subset MSDD scheme further improves power ef-
ciency of MSDD by discarding less reliable decisions at the
edges of the observation window. Numerical and simulation
results presented for different DSTM constellations and fad-
ing channel scenarios have shown: 1) to what extent MSDD
outperforms benchmark decoders increasing the range of fad-
ing bandwidths for which DSTM is applicable; 2) that sphere
decoding with a Fano-type metric (MSDSD-FM) achieves
the best performance-complexity tradeoff with a complexity
1576 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 55, NO. 8, AUGUST 2007
comparable to that of DFDD in most cases; 3) that inner stack
decoding is favorably applied to provide an ordered sequence
of candidate signal points for outer tree-search decoding; and
4) that subset MSDD yields impressive additional performance
gains at a very moderate increase in complexity.
REFERENCES
[1] B. L. Hughes, Differential space-time modulation, IEEE Trans. Inf.
Theory, vol. 46, no. 7, pp. 25672578, Nov. 2000.
[2] B. M. Hochwald and W. Sweldens, Differential unitary space-time mod-
ulation, IEEE Trans. Commun., vol. 48, no. 12, pp. 20412052, Dec.
2000.
[3] V. Tarokh and H. Jafarkhani, A differential detection scheme for transmit
diversity, IEEE J. Sel. Areas Commun., vol. 18, no. 7, pp. 11681174,
Jul. 2000.
[4] B. Hassibi and B. M. Hochwald, Cayley differential unitary space-time
codes, IEEE Trans. Inf. Theory, vol. 48, no. 6, pp. 14851503, Jun. 2002.
[5] R. Schober and L. Lampe, Noncoherent receivers for differential space-
time modulation, IEEE Trans. Commun., vol. 50, no. 5, pp. 768777,
May 2002.
[6] B. Bhukania and P. Schniter, Multiple-symbol detection of differential
unitary space-time modulation in fast-Fading channels with known corre-
lation, in Proc. Conf. Inf. Sci. Syst., Princeton, NJ, Mar. 2002.
[7] D. Divsalar and M. K. Simon, Multiple-symbol differential detection of
MPSK, IEEE Trans. Commun., vol. 38, no. 3, pp. 300308, Mar. 1990.
[8] C. Gao, A. M. Haimovich, and D. Lao, Multiple-symbol differential de-
tection for space-time block codes, in Proc. Conf. Inf. Sci. Syst., Princeton,
NJ, Mar. 2002.
[9] P. Tarasak and V. K. Bhargava, Reduced complexity multiple symbol
differential detection of space-time block code, in Proc. IEEE Wireless
Commun. Netw. Conf., Orlando, FL, Mar. 2002, vol. 1, pp. 505509.
[10] T. Cui and C. Tellambura, Multiple-symbol differential unitary space-
time demodulation with reduced-complexity, in Proc. IEEE Int. Conf.
Commun., vol. 2, Seoul, Korea, May 2005, pp. 762766.
[11] S. M. Alamouti, A simple transmitter diversity scheme for wireless com-
munications, IEEEJ. Sel. Areas Commun., vol. 16, no. 7, pp. 14511458,
Oct. 1998.
[12] C. Ling, W. H. Mow, K. H. Li, and A. C. Kot, Multiple-antenna dif-
ferential lattice decoding, IEEE J. Sel. Areas Commun., vol. 23, no. 9,
pp. 18211829, Sep. 2005.
[13] E. Chiavaccini and G. M. Vitetta, Further results on differential space-
time modulations, IEEE Trans. Commun., vol. 51, no. 7, pp. 10931101,
Jul. 2003.
[14] L. Lampe, R. Schober, V. Pauli, and C. Windpassinger, Multiple-symbol
differential sphere decoding, in Proc. IEEE Int. Conf. Commun., Jun.
2004, vol. 2, pp. 787791.
[15] P. Pun and P. Ho, The performance of Fano-multiple symbol differential
detection, in Proc. IEEE Int. Conf. Commun., Seoul, Korea, May 2005,
pp. 25162521.
[16] R. H. Clarke, A statistical theory of mobile-radio reception, Bell Syst.
Tech. J., vol. 47, pp. 9571000, Jul. 1968.
[17] B. M. Hochwald, T. L. Marzetta, T. J. Richardson, W. Sweldens, and
R. Urbanke, Systematic design of unitary space-time constellations,
IEEE Trans. Inform. Theory, vol. 46, no. 6, pp. 19621973, Sep. 2000.
[18] G. M. Vitetta and D. P. Taylor, Maximumlikelihood decoding of uncoded
and coded PSK signal sequences transmitted over Rayleigh at-Fading
channels, IEEE Trans. Commun., vol. 43, no. 11, pp. 27502758, Nov.
1995.
[19] A. D. Murugan, H. El Gamal, M. O. Damen, and G. Caire, A unied
framework for tree search decoding: Rediscovering the sequential de-
coder, IEEE Trans. Inf. Theory, vol. 52, no. 3, pp. 933953, Mar. 2006.
[20] E. Agrell, T. Eriksson, A. Vardy, and K. Zeger, Closest point search in
lattices, IEEE Trans. Inf. Theory, vol. 48, no. 8, pp. 22012214, Aug.
2002.
[21] O. Damen, H. El Gamal, and G. Caire, On maximumlikelihood detection
and the search for the closest lattice point, IEEE Trans. Inf. Theory,
vol. 49, no. 10, pp. 23892402, Oct. 2003.
[22] R. Johannesson and K.
Sh. Zigangirov, Fundamentals of Convolutional
Coding. Piscataway, NJ: IEEE, Feb. 1999.
[23] Z. Guo and P. Nilsson, Reduced complexity SchnorrEuchner decoding
algorithms for MIMO systems, IEEE Trans. Commun., vol. 8, no. 5,
pp. 286288, May 2004.
[24] B. L. Hughes, Optimal space-time constellations from groups, IEEE
Trans. Inf. Theory, vol. 49, no. 2, pp. 401410, Feb. 2003.
[25] K. L. Clarkson, W. Sweldens, and A. Zheng, Fast multiple antenna dif-
ferential decoding, IEEE Trans. Commun., vol. 49, no. 2, pp. 253261,
Feb. 2001.
[26] W.-K. Ma, C.-Y. Chi, and P. C. Ching, Extended differential unitary
space-time modulation: A non-coherent scheme with error penalty less
than 3 dB, in Proc. Int. Conf. Acoust., Speech Signal Process, Toulouse,
France, May 2006, vol. 4, pp. 529532.
[27] M. Brehler and M. K. Varanasi, Asymptotic error probability anal-
ysis of quadratic receivers in Rayleigh-fading channels with applica-
tion to a unied analysis of coherent and noncoherent space-time re-
ceivers, IEEE Trans. Inf. Theory, vol. 47, no. 6, pp. 23832399,
Sep. 2001.
[28] P. Ho and D. Fung, Error performance of multiple-symbol differen-
tial detection of PSK signals transmitted over correlated Rayleigh fad-
ing channels, IEEE Trans. Commun., vol. 40, no. 10, pp. 2529,
Oct. 1992.
[29] E. Biglieri, G. Caire, G. Taricco, and J. Ventura-Traveset, Computing
error probabilities over Fading channels: Aunied approach, Eur. Trans.
Telecommun., vol. 9, pp. 1525, 1998.
[30] S. Siwamogsatham, M. P. Fitz, and J. H. Grimm, A new view of per-
formance analysis of transmit diversity schemes in correlated Rayleigh
fading, IEEE Trans. Inf. Theory, vol. 48, no. 4, pp. 950956, Apr.
2002.
Volker Pauli (S04) received the Diploma degree
in electrical engineering from the University of
Erlangen, Erlangen, Germany, in 2003.
He is currently a Research Assistant in
the Telecommunications Laboratory, University of
Erlangen. He was a Visiting Scholar at the Ecole
Polytechnique Federale de Lausanne, Lausanne,
Switzerland, in 2002 and the University of British
Columbia, Vancouver, Canada, in 2003 and 2005.
His current research interests include noncoherent
detection, iterative receiver structures, multiple-input
multiple-output (MIMO) systems, and multicarrier modulation and equalization.
Lutz Lampe (S98M02) received the Diploma and
the Ph.D. degree from the University of Erlangen,
Germany, in 1998 and 2002, respectively, both in
electrical engineering.
He is currently an Associate Professor in the De-
partment of Electrical and Computer Engineering,
University of British Columbia, Vancouver, Canada.
His current research interests include the areas of
communications and information theory applied to
wireless and powerline transmission.
Dr. Lampe has been a Vice-Chair of the IEEE
Communications Society Technical Committee on Power Line Communica-
tions since 2004. He was General Chair of the 2005 International Symposium
on Power Line Communications and its Applications (ISPLC 2005) and a Co-
Chair of the General Symposium of the 2006 IEEE Global Telecommunications
Conference (Globecom 2006). He is an Editor for the IEEE TRANSACTIONS
ON WIRELESS COMMUNICATIONS and has been an Associate Editor for the
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY from 2004 to 2007. He
was the co-recipient of the Eurasip Signal Processing Journal Best Paper Award
2005, the Best Paper Award at the IEEE International Conference on Ultra-
Wideband (ICUWB) 2006, and the Best Student Paper Awards at the European
Wireless Conference 2000 and at the International Zurich Seminar 2002. In
2003, he received the Dissertation Award of the German Society of Information
Techniques (ITG).