Academia.eduAcademia.edu

On partial transmit sequences for PAR reduction in OFDM systems

2008, IEEE Transactions on Wireless Communications

Partial transmit sequences (PTS) is a popular technique to reduce the peak-to-average power ratio (PAR) in orthogonal frequency division multiplexing (OFDM) systems. PTS is highly successful in PAR reduction and efficient redundancy utilization, but the considerable computational complexity for the required search through a high-dimensional vector space and the necessary transmission of side information (SI) to the receiver are potential problems for a practical implementation. In this paper, we revisit PTS for PAR reduction and tackle these two problems. To address the complexity issue, we formulate the search problem of PTS as a combinatorial optimization (CO) problem. This enables us to (i) unify various search strategies proposed earlier in the PTS literature and (ii) adapt efficient search algorithms known from the CO literature to PTS. We also propose a modified PTS objective function, which reduces the number of multiplications required for PTS. Numerical results show that, perhaps surprisingly, simple random search yields the best performance-complexity tradeoff for moderate PAR reduction, whereas two novel CO-based methods excel if close-to-optimum PAR reduction is desired. The SI transmission problem is solved by a simple preprocessing of the data stream before PAR reduction. This preprocessing introduces the minimal possible redundancy and allows SI embedding without affecting the PAR reduction capability of PTS or causing peak regrowth. Index terms: Orthogonal frequency division multiplexing (OFDM), peak-to-average power ratio (PAR) reduction, partial transmit sequences (PTS), combinatorial optimization, simulated annealing, tabu search, trellis shaping, side information.

On Partial Transmit Sequences for PAR Reduction in OFDM Systems ∗ Trung Thanh Nguyen and Lutz Lampe† Department of Electrical and Computer Engineering University of British Columbia, Vancouver, British Columbia, Canada Email: {trungn,lampe}@ece.ubc.ca Abstract — Partial transmit sequences (PTS) is a popular technique to reduce the peak-to-average power ratio (PAR) in orthogonal frequency division multiplexing (OFDM) systems. PTS is highly successful in PAR reduction and efficient redundancy utilization, but the considerable computational complexity for the required search through a high-dimensional vector space and the necessary transmission of side information (SI) to the receiver are potential problems for a practical implementation. In this paper, we revisit PTS for PAR reduction and tackle these two problems. To address the complexity issue, we formulate the search problem of PTS as a combinatorial optimization (CO) problem. This enables us to (i) unify various search strategies proposed earlier in the PTS literature and (ii) adapt efficient search algorithms known from the CO literature to PTS. We also propose a modified PTS objective function, which reduces the number of multiplications required for PTS. Numerical results show that, perhaps surprisingly, simple random search yields the best performance-complexity tradeoff for moderate PAR reduction, whereas two novel CO-based methods excel if close-to-optimum PAR reduction is desired. The SI transmission problem is solved by a simple preprocessing of the data stream before PAR reduction. This preprocessing introduces the minimal possible redundancy and allows SI embedding without affecting the PAR reduction capability of PTS or causing peak regrowth. Index terms: Orthogonal frequency division multiplexing (OFDM), peak-to-average power ratio (PAR) reduction, partial transmit sequences (PTS), combinatorial optimization, simulated annealing, tabu search, trellis shaping, side information. ∗ This work has been accepted in part for presentation at the 2006 IEEE Global Communications Conference (Globecom). † Corresponding author Revised as Paper for publication in the IEEE Transactions on Wireless Communications, November, 2006. 1 1 Introduction Orthogonal frequency division multiplexing (OFDM) is a widely used modulation technique for wireless communication over frequency-selective channels. One of its main drawbacks is the large peak-to-average power ratio (PAR) of the time-domain transmit signal. If not processed, high peaks in the OFDM signal can lead to unwanted saturation in the power amplifier. As a result, expensive linear power amplifiers need to be employed or nonlinear amplifiers must be operated power-inefficiently to avoid error-rate performance degradation and out-of-band radiation. To alleviate this problem, many PAR reduction techniques have been proposed in the literature, cf. e.g. [1] for an overview. One of the classical and most popular techniques is known as partial transmit sequences (PTS) [2, 3]. In PTS, non-overlapping subsets of OFDM subcarriers are formed, rotated independently, and combined again. Since the signal representations corresponding to different rotations exhibit different PARs, selecting the representation with the minimum PAR leads to PAR reduction. PTS is known to achieve a high performance and redundancy utilization, but implementation problems arise from (i) a relatively high computational complexity for searching the optimum sequence of rotation factors and (ii) the need to transmit side information (SI) about the selected sequence to the receiver to undo the rotation of OFDM subcarriers. In this paper, we take a fresh look at PTS for PAR reduction and propose solutions for both the above-mentioned problems. To tackle the complexity issue of PTS, we formulate the sequence search of PTS as a particular combinatorial optimization (CO) problem. This new perspective enables us to (i) unify various heuristics for reducing the computational complexity of PTS presented previously in [4, 5] under the CO framework, and (ii) present two new efficient search strategies for PTS based on wellknown CO algorithms. We also propose a modified PTS objective function based on the approximation which reduces the number of required multiplications for the peak-amplitude search dramatically. The embedding of SI for PTS has been addressed repeatedly in the literature. Efficient schemes with explicit SI transmission have been devised in e.g. [6, 7]. SI embedding, i.e., SI is not explicitely transmitted via dedicated OFDM subcarriers, has been proposed in the form of differential encoding [3, 8], marking of subcarriers [9, 10], and choosing PTS vectors from special codebooks [11]. Some of the disadvantages Revised as Paper for publication in the IEEE Transactions on Wireless Communications, November, 2006. 2 of these methods are peak regrowth [9, 10], increased redundancy [3, 8], inapplicability to general search algorithms [9, 10, 11], and increased detection complexity [9, 10, 11]. We propose a new method to embed the PTS SI into the OFDM transmit signal, which is inspired by the trellis-shaping technique considered in [12, 13] for PAR reduction. It requires only minimal additional redundancy and complexity, does not affect the PAR reduction algorithm, avoids peak regrowth, and practically does not degrade the overall error-rate performance of the OFDM system. We would like to point out that the different solutions devised in this paper are applicable to PTS in a modular fashion, i.e., the CO-based search algorithms, the modified objective function, and the new SI embedding scheme can be applied individually or in combination to any PTS scheme. Organization: Section 2 briefly introduces the PAR of an OFDM signal and reviews the PTS technique for PAR reduction. In Section 3, the sequence search of PTS is formulated as a CO optimization problem and efficient solutions, i.e., search strategies, are presented. The modified objective function is devised in Section 4, and the new SI embedding scheme is developed in Section 5. Numerical performance and complexity results are presented in Section 6. Section 7 concludes this paper. Notation: (·)T , ⊕, E{·}, and || · ||∞ denote transpose, modulo-2 addition, statistical average, and the max norm (or l∞ norm), respectively. 0K , 1K and I K denote, respectively, the all-zero and all-one vector of length K, and the K × K identity matrix. ℜ{·} and ℑ{·} are the real and imaginary part of a complex number or vector. Finally, the K dimensional vector x = IDFTK×P (X) is the inverse discrete Fourier transform (IDFT) of the P -dimensional vector X, P ≤ K, after appropriate zero-padding of X. 2 Preliminaries In this section, we briefly review the PAR of OFDM signals and the PTS technique for PAR reduction. 2.1 PAR of OFDM Signal We consider OFDM transmission with N subcarriers and subcarrier spacing ∆f = 1/NT , where T is the modulation interval and NT is the duration of the OFDM symbol excluding the guard interval. M-ary data symbols Xk taken from a quadrature-amplitude modulation (QAM) or phase-shift keying (PSK) 3 Revised as Paper for publication in the IEEE Transactions on Wireless Communications, November, 2006. constellation are assigned to OFDM subcarriers at frequencies fk = k∆f , 0 ≤ k ≤ N − 1. As common in the literature (cf. e.g. [4, 14, 5, 13, 1]), we model the complex envelope of the OFDM signal by N −1 1 X Xk ej2πfk t , x(t) = √ N k=0 0 ≤ t < NT , (1) where an idealized rectangular time-domain window is assumed. We also consider a cyclic prefix extension of x(t), which does not alter the PAR. Then, the PAR of the continuous-time signal x(t) is defined as ζ c (x(t)) , max {|x(t)|2 } 0≤t<N T E{|x(t)|2} . (2) In PTS, PAR reduction is based on the sampled, discrete-time signal N −1 1 X Xk ej2πkn/LN , xn , x(nT /L) = √ N k=0 0 ≤ n < LN, where L is the oversampling factor. (3) Accordingly, let us define the vectors X , [X0 . . . XN −1 ] and x , [x0 . . . xLN −1 ] = IDFTLN ×N (X). The corresponding PAR approximation follows as (σx2 , E{|xn |2 }) ζ(x) , max {|xn |2 } 0≤n<LN E{|xn |2 } = ||x||2∞ . σx2 (4) We note that, since E{|x(t)|2 } = E{|xn |2 }, the continuous-time PAR is not smaller than the discretetime PAR, i.e., ζ(x) ≤ ζ c (x(t)). According to the results in [15, 16, Chapter 3], a good approximation ζ(x) ≈ ζ c (x(t)) is achieved for L ≥ 4. We therefore adopt L = 4 when showing our numerical results in Section 6. Finally, we introduce the complementary cumulative density function (CCDF) of the PAR Fζ (ζ0 ) , Pr{ζ(x) > ζ0 } , which is commonly used to quantify the efficacy of a PAR reduction scheme. (5) Revised as Paper for publication in the IEEE Transactions on Wireless Communications, November, 2006. 2.2 4 PTS for PAR Reduction (v) In PTS [2, 3], data symbols in X are partitioned into V disjoint subblocks X (v) , [X0 (v) . . . XN −1 ] with (v) Xk = Xk or 0, 0 ≤ v ≤ V − 1, such that X= V −1 X X (v) . (6) v=0 The number of non-zero components of X (v) is denoted by nv , which in general is different for different v. Three partitioning strategies have been proposed in [3, 8], and we apply the random partitioning strategy, which typically yields the best PAR reduction performance, for the numerical results presented in Section 6. The subblocks X (v) are transformed into V time-domain partial transmit sequences (v) (v) x(v) , [x0 . . . xLN −1 ] = IDFTLN ×N (X (v) ) . (7) These sequences are independently rotated by some phase factors bv , ejφv and then combined to produce the time-domain OFDM signal x= V −1 X bv x(v) . (8) v=0 Assuming that φv and thus bv can attain W different values and taking into account that b0 = 1 can be fixed without loss in PAR reduction, there are W V −1 alternative representations for an OFDM symbol. These representations correspond to all possible vectors b , [b1 . . . bV −1 ]. PTS selects a vector b̂, as described in detail in Section 3 below, such that the PAR ζ(x̂) of the corresponding transmit sequence x̂ is the lowest among all examined sequences. The SI that needs to be transmitted to inform the receiver about the selected vector is R = (V − 1) log2 (W ) bits. Since PTS with binary weighting factors bv ∈ {±1}, i.e., W = 2 and R = V − 1, attains a favorable performance-redundancy tradeoff (cf. e.g. [8]) we concentrate on this choice in the following. 3 Search Strategies for PTS In this section, we first formulate the sequence search of PTS as a binary CO problem and identify various known implementations for this search as so-called descent heuristics. We then adapt two search strategies Revised as Paper for publication in the IEEE Transactions on Wireless Communications, November, 2006. 5 from the CO literature to PTS and optimize their respective parameters. 3.1 Sequence Search as a CO Problem Using the notation introduced in Section 2, we can state the optimization problem of PTS, i.e., to find the vector b that yields the transmit signal with the minimum PAR, as the following binary CO problem: minimize b f (b) (9) subject to: b ∈ {±1} V −1 , with the objective function (see (4), (8)) f (b) = ||x||2∞ = V −1 X v=0 2 (v) . bv x (10) ∞ Thereby, we have exploited the fact that the phase rotations in (8) do not change the average signal power σx2 . Finding the exact solution of (9) requires full enumeration of all 2V −1 possible phase vectors. Each evaluation of the objective function involves (i) combining partial transmit sequences xv and (ii) computing the peak power of the corresponding sequence x. Assuming that complex numbers are stored as pairs of real and imaginary parts and since bv ∈ ±1, calculating the transmit sequence involves 2NL(V − 1) additions and computing the peak power requires 2NL real multiplications per trial vector. Especially the large number of multiplications renders full enumeration computationally prohibitive even for moderate values of V . Hence, the development of nonexact (in the sense that the chosen solution cannot be guaranteed to be optimal with respect to (9)) PTS algorithms performing a reduced number of searches for the best phase vector becomes attractive. We refer to such algorithms as heuristics [17] and from the CO literature there is a rich set of computationally efficient heuristics at our disposal (for a detailed discussion we refer the reader to [17]). In the following, we (re-)introduce nonexact PTS algorithms which fall into two classes, so-called descent heuristics (Section 3.2) and metaheuristics (Section 3.3). In passing, it is worth noting that following similar steps as in [16, Section 4.3], the CO problem (9) could be written as a mixed integer (MI) linear program (LP) if x(v) are real-valued and as a MI Revised as Paper for publication in the IEEE Transactions on Wireless Communications, November, 2006. 6 quadratically constrained quadratic program (QCQP) if x(v) are complex-valued, respectively. However, relaxation of the integrality restrictions to arrive at a computationally simpler LP and QCQP, respectively, will often return the all-zero vector as solution. Hence, the relaxation approach frequently applied to CO problems is not useful in the case of PTS. 3.2 Descent Heuristics for PTS In the description of heuristics in this paper, b̃ is the current solution and b̂ is the best-so-far solution. A move is an attempt to replace the current solution b̃ by a trial solution b. In descent heuristics, only downhill moves, i.e., b̃ is replaced by b only if f (b) < f (b̃), are accepted. Hence, b̃ is also the bestso-far solution b̂. Interestingly, a number of suboptimal PTS algorithms presented in the literature can be categorized as descent heuristics. In particular, the bit-flipping search proposed in [4] is known as a construction greedy algorithm and the PTS method devised in [5] is a particular implementation of a local search algorithm, where the local neighborhood of the vector b̃ is defined in terms of the Hamming distance from b̃. In this paper, we extend the bit-flipping algorithm from [4] in that bit flipping can be continued in a cyclic fashion and the local-search algorithm from [5] in that we maintain a list of all vectors that have been searched in previous iterations, which allows early termination if no more downhill moves can be proposed. Another descent heuristic that is considered in this paper is “simple” random search, where a trial solution b is randomly generated. Random search was already mentioned in [4]. 3.3 Metaheuristics for PTS Descent heuristics usually do not perform very well when dealing with hard CO problems. The reason is that the solution is always updated in the direction of improvements and hence is quickly trapped in a local optimum [17]. Metaheuristics, on the other hand, have mechanisms to accept uphill moves to avoid early convergence to local optima. We consider the application of two well-known metaheuristics to PTS. Revised as Paper for publication in the IEEE Transactions on Wireless Communications, November, 2006. 3.3.1 7 Simulated Annealing The simulated annealing metaheuristic [18] is deduced from the physical annealing process of solid materials. In simulated annealing, a solution vector b represents a system state and the value f (b) represents the energy of the system at that state. A move from the current state b̃ to another state b is accepted or rejected based on the Metropolis criterion [19]: If δ ≤ 0, accept the move, If δ > 0, accept the move with probability e−δ/T , where δ , f (b) − f (b̃) and T is the temperature. At fixed T , the systems is likely to approach thermal equilibrium where the probability of a state b̃ is proportional to e−f (b̃)/T . If the system is cooled down sufficiently slow, simulated annealing returns, to a certain approximation, the global minimum. Unfortunately, such cooling schedules are normally too slow to be practical. Instead, simulated annealing implementations aim at obtaining near-optimum solutions in limited time constraints. The simulated annealing implementation that we propose for PTS sequence search is summarized in Fig. 1 (upper part). The trial solution b is derived from b̃ in the same procedure as in cyclic bit-flipping. This cyclic order avoids leaving any solution in the neighborhood of b̃ unvisited for a long time. The range of the temperature T and how it is cooled down are important factors in simulated annealing. We consider the following cooling schedules. • Geometric cooling: The temperature T is decreased to αT , 0 < α < 1, after each iteration. If the starting temperature is Ts , the final temperature is Tf , and the number of iterations is I, then α = (Tf /Ts )1/I . Experiments are required to find a suitable range for the temperature. In Section 6 we present annealing curves [20] to determine good values for Ts and Tf . • Gaussian-like cooling: This cooling schedule from [21] allows a more flexible control of the temperature decrement. The temperature at the i-th iteration is b Ti = Ts a−(i/cI) , (11) where b , log[log(Ts /Tf )/ log(a)]/ log(1/c) to achieve TI = Tf . The two parameters a and c can Revised as Paper for publication in the IEEE Transactions on Wireless Communications, November, 2006. 8 be selected so that the cooling curve can go through any temperature Ts > Ti > Tf at a given iteration i. • Constant temperature: At a too high temperature, the annealing process becomes complete random, whereas at a too low temperature, it becomes a descent search. Since we are limited by the maximum number of iterations I, it may be advantageous to spend all the time at a fixed temperature in the middle. Such temperature should be high enough to allow escape from local minima and low enough to ensure a thorough search in the regions near good solutions. The performances of simulated annealing with the three cooling schedules will be compared in Section 6. 3.3.2 Tabu Search The tabu search metaheuristic [22, 23] uses information from the search history to drive the search into regions that might have better solutions. Two highly important elements of tabu search are intensification and diversification strategies. The former encourage more thorough search near good visited solutions, whereas the latter take the search to unexplored areas. Tabu search implementations realize these strategies by employing, for examples, tabu lists and aspiration criteria. Tabu lists prohibit moves which would be ineffective. Aspiration criteria, on the other hand, overrule tabu lists by allow tabu moves that are deemed beneficial to the search. In this paper we apply to PTS a simple tabu search implementation that uses only short-term history to avoid ineffective cyclic move sequences. The algorithm, presented in Fig. 1 (lower part), is based on local search. The neighborhood N (b̃) of b̃ is the set of vectors that have Hamming distance dH = 1 from b̃, i.e., |N (b̃)| = V − 1. After each iteration, the new solution differs from the last solution at exactly one bit position. That position is recorded in the tabu list L and it will not be flipped in the next B iterations. Since both uphill and downhill moves are allowed, the role of this tabu list is to avoid possible cycles of visited solutions. The search starts with a randomly generated initial solution and an empty tabu list. The only tunable parameter of the algorithm is the tabu tenure B. Again, we refer to Section 6 for numerical results. Revised as Paper for publication in the IEEE Transactions on Wireless Communications, November, 2006. 4 9 Objective Function Approximation for PTS The heuristics reviewed and proposed in the previous section aim at reducing the number of sequence searches, i.e., the number of evaluations of the cost function f (b) required for PTS. In this section we consider the cost function itself. In particular, we propose an approximation for f (b) that is simpler to compute than the exact peak power while retaining the PAR reduction capability of PTS. 4.1 CO Problem with Approximate Objective Function The peak power is the square of the radius of the circle that encloses the selected PTS transmit sequence x̂ in the complex plane. Following an approach from [24], we approximate this circle by a symmetrical polygon of P sides, where we require P to be a multiple of four. This approximation is illustrated in Fig. 2 for different values of P . Applying the polygon approximation, we can replace the PTS CO problem (9) by minimize b g(b) (12) subject to: b ∈ {±1}V −1 . with the new objective function g(b) , max{ℜ{xej2πp/P } , p = 0, . . . , P − 1} . (13) The selected sequence x̂ is now enclosed by the P -polygon. For efficient computation of g(b), we generate the real-valued sequences   y (v) , ℜ{x(v) } ℑ{x(v) } . . . ℜ{x(v) ej2π(P/4−1)/P } ℑ{x(v) ej2π(P/4−1)/P } (14) of length NLP/2, 0 ≤ v ≤ V − 1. Then, for different vectors b, the combined sequence y, V −1 X bv y (v) , (15) v=0 is formed and the value of g(b) is obtained by a simple maximum search, g(b) = ||y||∞ = max{y, −y} . (16) Revised as Paper for publication in the IEEE Transactions on Wireless Communications, November, 2006. 10 We observe that all algorithms introduced in Section 3 can directly be applied to the new CO problem (12). It is also clear that the solution of (12) approaches that of (9) as P increases in the sense that the corresponding PARs approach identical values. Finally, we note that the approximation of a circle by a symmetrical polygon has also been used in [25] in the context of PAR reduction, but there a different optimization problem which arises in PAR reduction via tone reservation has been considered. 4.2 Computational Complexity It is insightful to compare the number of operations involved in the evaluation f (b) and g(b). Assuming again that complex numbers are stored as Cartesians and that I is the total number of searches, we have: objective function # of multiplications # of additions f (b) 2NLI (to find the peak value) 2NL(V − 1)I (to generate x) g(b) 4NLV (P/4 − 1) (to generate y (v) ) (P/2)NL(V − 1)I + 2NLV (P/4 − 1) (to generate y) (to generate y (v) ) Although the number of additions is increased by about a factor of P/4 for the new objective function compared to that for f (b), the number of multiplications is reduced by a factor of I/[2V (P/4 − 1)]. Considering multiplications as computationally most expensive, using the objective function approximation g(b) is advantageous when more than 2V (P/4 − 1) searches are performed. The numerical results in Section 6 show that this is generally the case for good PAR reduction with PTS. 5 Novel Side Information (SI) Embedding Scheme for PTS The transmission and detection of the SI b is critical for PAR reduction with PTS. In this section, we present a new SI embedding technique that has the following attractive features. (i) The PTS PAR reduction algorithm is not affected and no peak regrowth is caused. (ii) The added redundancy is the minimum value of R = V − 1 bits (considering W = 2), i.e., one bit per subblock. (iii) It can be applied to general QAM/PSK constellations and combined with any PTS search algorithm. (iv) SI detection is very low complexity and performance degradation due to erroneously detected SI is minimal. Revised as Paper for publication in the IEEE Transactions on Wireless Communications, November, 2006. 11 The only assumption we make concerns the mapping of binary data to M-ary data symbols Xk . In particular, we apply a sign-bit mapping where inverting the most significant bit (MSB) inverts the sign of the signal point. The less significant bits (LSBs) may be mapped according to e.g. Gray labeling. 5.1 Implementation of the SI Scheme For the sake of a clear exposition, we think of the multiplication with bv (see (8)) as being executed in the frequency domain. Then, let us define u(v) as the nv -dimensional vector of the MSBs of the nv non-zero components Xk in the subblocks X (v) , and s(v) be this vector for the rotated signal bv X (v) . We also refer to u(v) and s(v) as unshaped and shaped MSBs, respectively. Let us further define   1n , if bv = −1 v c(v) ,  0 , if b = 1 v nv (17) Taking the sign-bit mapping into account, the PTS PAR reduction can then be formulated as s(v) = u(v) ⊕ c(v) . (18) Since c(v) ∈ {1nv , 0nv } are the two codewords of the (nv , 1, nv )-repetition code, the PAR reduction operation (18) can be made transparent for data transmission by obtaining the nv unshaped MSBs from u(v) = d(v) H −T (19) at the transmitter and applying the “inverse” operation s(v) H T = (d(v) H −T ⊕ c(v) )H T = d(v) (20) to the shaped MSBs at the receiver. H is the (nv − 1) × nv repetition code parity-check matrix and H −T the left-inverse of H T . We note that any (nv − 1) × nv matrix H with nv − 1 independent rows which satisfies 1nv H T = 0nv −1 is a valid parity check matrix of the code. For any such matrix, there exists at least one left inverse. Since d(v) contains nv − 1 bits, one redundancy bit per block is implicitly embedded as PTS SI. The block diagram of the proposed SI embedding scheme is shown in Fig. 3. It is interesting to note that this implementation of PTS can be regarded as a special case of trellis-shaping for PAR reduction Revised as Paper for publication in the IEEE Transactions on Wireless Communications, November, 2006. 12 recently presented in [12, 13] with a constellation mapping referred to as Type-I in [13]. The complexity for SI embedding and detection is extremely low as H and H −1 can be made sparse, e.g.,     I I nv −1 nv −1   , H −T H T1 =  = 1 1nv −1 0nv −1 (21) are valid matrices, and operations are in the Galois field GF(2). 5.2 Effect on Error-rate Performance While the particular choice of H does not have any impact on SI embedding, the distributions of errors in d̂ (v) in case of erroneous MSBs ŝ(v) depends on H. For example, consider H T1 in (21) and     I nv −1 0  ⊕  nv −1  . H T2 =  0nv −1 I nv −1 (22) Whereas a single error in the nv th position of s(v) will cause a burst of nv errors if H 1 is applied, single errors in the nv − 2 center positions of s(v) will lead to double-adjacent errors for H 2 . Hence, in case of coded transmission using block codes and hard-decision decoding, separate error control coding for LSBs and MSBs could be advisable, and the code protecting d(v) should take the different error patterns depending on H into account (cf. e.g. [26, Table 5.1] for a list of double-adjacent-error-correcting codes). If convolutional codes or more powerful concatenated codes in combination with iterative decoding are applied, the syndrome forming (20) should be incorporated in the iterative decoding and demapping process, cf. e.g. [27]. For the case of uncoded transmission let us consider M-ary QAM transmission and the bit-error rate (BER) after demapping, BERk , for sufficiently large signal-to-noise ratio (SNR) γk in subcarrier k. For the (m − 1) LSBs per QAM symbol (m , log2 (M)) we find [28, Section 5.2.9] ! r √ 3 4 M −4 Q (23) γk . BERk,LSB ≈ √ M −1 M (m − 1) R∞ 2 where Q(x) , 1/2π x e−t /2 dt is the Gaussian Q-function. Considering the matrices H T1 in (21) and H T2 in (22) as an example, simple counting of the cases of error propagation leads to the approximation ! r 3 2 2nv − 2 Q γk (24) BERk,MSB ≈ √ M −1 M nv Revised as Paper for publication in the IEEE Transactions on Wireless Communications, November, 2006. 13 for the data bits d(v) representing MSBs in both cases. Assuming further that nv ≫ 1, an approximation for the overall error rate is given by ! r   3 nv − 1 4 1 (m − 1)BERk,LSB + BERk,MSB ≈ Q γk . BERk = m nv m M −1 (25) We note that this approximation is independent of nv (as long as nv ≫ 1) and identical to that for uncoded Gray labelled MQAM transmission without shaping for large γk . We therefore conclude that the asymptotical error rate is not affected by the proposed SI embedding scheme. To obtain an expression for the average BER, we assume transmission over a Rayleigh fading channel. We generalize this model in that we consider D-fold diversity with maximal ratio combining achieved either through the use of multiple receive branches or spreading across subcarriers for time-dispersive channels. Using (25) and [28, Section 14.4.1] the average BER is approximated by r    D−1  4pD X D − 1 + i γ̄ 1 i BER ≈ (1 − p) , p = 1− , i m i=0 2 β + γ̄ (26) where β , 2(M − 1)/3 and γ̄ , E{γk }/D = Ēs /N0 /D, and Ēs and N0 denote the average received energy per symbol and the equivalent complex baseband noise power spectral density, respectively. 6 Results and Discussion In this section, we present performance results for PAR reduction using PTS with the techniques introduced in the previous sections. We first discuss the PAR reduction performance and PAR-reduction vs. complexity tradeoff for the various search strategies devised in Section 3. Then, we compare PTS with the exact and the approximate objective function developed Section 4. Finally, BER results when applying the new SI embedding from Section 5 scheme are shown. As relevant and typical system parameters, we choose 16QAM modulation and N = 256 subcarriers, which are divided into V = 16 subblocks employing random partition [3, 8]. 6.1 Performance of PTS with Different Heuristics Structure of the CO problem: First, it is insightful to illustrate the structure of the CO problem (9), especially the dependency of the objective function f (b) on b. For this purpose, we employ the CDF Revised as Paper for publication in the IEEE Transactions on Wireless Communications, November, 2006. 14 Gb(ζ0 ) of the PAR ζ(x) = f (b)/σx2 considering the rotation vector b as an in {−1, +1}V −1 uniformly distributed random variable and a fixed data vector X. Gb(ζ0 ) is plotted in Fig. 4 for 10 different realizations of X. We observe that quite many vectors b lead to moderately low PAR. For example, a PAR of less than 8 dB is achieved for 20% to 50% of all vectors b. However, the flatness of the left end of the graphs shows that only very few vectors lead to a very low PAR close to the minimum PAR value. We therefore conclude that it is hard to achieve close-to-optimum PAR reduction with an unstructured search strategy, i.e., random search (see Section 3.3). PTS with random search: Fig. 5 shows the CCDF of the PAR when random search with I = [2, 4, 16, 64, 256, 1024, 4096] searches is applied. Also included are the CCDFs for OFDM without PAR reduction and for optimal PTS with full enumeration of all I = 2V −1 = 32768 vectors b. In good agreement with the conclusions from Fig. 4, it can be seen that (i) considerable improvements in PAR are already achieved with relatively few random searches, e.g., for I ≤ 64, however (ii) closing the remaining gap of about 1 dB to optimum PTS appears quite difficult to achieve with random search unless the number of searches and thus complexity are increased and become comparable to those of PTS with full search. PTS with simulated annealing: In order to apply simulated annealing, we need to determine a reasonable range of the temperature parameter T (see Section 3.3.1 and Fig. 1). For this purpose, we consider the annealing curve E{f (b̃)}/σx2 , which is the value of f (b̃)/σx2 averaged with respect to b̃ at a fixed temperature T vs. T /σx2 [20]. The annealing curves for three different data vectors X are plotted in Fig. 6. We observe that at about T /σx2 > 10 almost all proposed moves will be accepted, i.e., r < e−δ/T (see Fig. 1), and the value of E{f (b̃)} is close to the average over all possible solutions b. The annealing process becomes random and no improvement over random search is expected. On the other hand, hardly any uphill moves will be accepted for T /σx2 < 0.1, i.e., b̃ is replaced by a new trial vector b only if the corresponding peak value f (b) is smaller than f (b̃). In this case, simulated annealing reduces to bit-flipping search. These results suggest that temperatures in the range of 0.1 ≤ T /σx2 < 10 should be applied for simulated annealing for PTS. After some further experimentation, we found Ts /σx2 = 1 and Tf /σx2 = 0.2 for the geometric and Gaussian-like cooling, with a = 0.5 and c = 0.4 for the latter, and T /σx2 = 0.5 for constant temperature annealing as appropriate values. Revised as Paper for publication in the IEEE Transactions on Wireless Communications, November, 2006. 15 In order to compare the performances for the three cooling schedules, we plot the PAR of the bestso-far solution b̂ as function of the number of iterations in Fig. 7 for three different and representatively chosen data vectors X and I = 1024. For the first data vector (graphs at the top) geometric cooling yields the best results for simulated annealing with more than 50 iterations (i.e., searches). For example, it outperforms constant temperature and Gaussian-like cooling by 0.8 dB and 0.6 dB, respectively. However, the relative performances are switched for the second and third data vector, where, respectively, constant temperature and Gaussian-like cooling perform best. Since the data X shapes the cost function of the CO problem, the optimum cooling schedule for simulated annealing also depends on X, and as seen in Fig. 7, no single cooling schedule is optimal in all cases. In fact, considering average performance, e.g. the CCDF Fζ (ζ0 ), we found that all three cooling schedules yield very similar PAR reduction performance. We therefore only show average PAR results for simulated annealing with constant temperature in the following. In Fig. 8, simulated annealing is compared with random search for I = [16, 64, 256, 1024, 4096] searches. Except for the case of I = 16 and Fζ (ζ0 ) < 10−2 , simulated annealing provides consistent improvements over random search with some significant gains in the low PAR range. In particular, simulated annealing needs to perform only about one-fourth of the number of the searches required for random search to achieve PAR reduction close to that of optimal PTS with full search. Comparison of PTS with different heuristics: Finally, we compare all the search strategies introduced in Section 3. For the case of tabu search, only the tabu tenure B needs to be optimized (see Fig. 1). Numerical results, which are omitted for the sake of brevity, showed that a relatively wide range of values for B is optimum with respect to PAR reduction performance. Therefore, B = 9 is chosen for all the following results. A fair comparison of the performance-complexity tradeoffs for the different PTS search strategies is provided in Fig. 9, where the numbers of searches required to achieve Fζ (ζ0 ) = 10−3 are plotted as function of ζ0 . These numbers are averages for the extended bit-flipping search and local search, since these algorithms terminate early if no further improvement in peak-power reduction is achieved. It is quite remarkable that the “trivial” random search heuristic performs at least as a good as and often outperforms the bit-flipping and local search heuristics, originally proposed in [4] and [5], respectively. Revised as Paper for publication in the IEEE Transactions on Wireless Communications, November, 2006. 16 Only the application of the simulated annealing and tabu search metaheuristics to solve the CO problem of PTS yields an enhanced tradeoff in the low-PAR range (e.g. at 7 dB), where about four times less searches are required than for random search and the same PAR reduction. Thereby, it should be noted that the curves in Fig. 9 depend on the target value of Fζ (ζ0 ), and a somewhat different comparison may result for other values than Fζ (ζ0 ) = 10−3 (for example, see the curves for I = 16 in Fig. 8). 6.2 Objective Function Approximation for PTS Results for PTS when applying the objective function g(b) as introduced in Section 4 as approximation for the true peak power f (b) are presented in Fig. 10. PTS using simulated annealing with I = 1024 and g(b) with P = 4, 8, 16 (see Eq. (13)) are considered. For P = 4 no multiplications are needed to setup and solve the approximate CO problem, compared to 2NLI multiplications required for the original CO problem, and only 0.6 dB in PAR reduction are lost at Fζ (ζ0 ) = 10−3 . When P = 8, 32NL multiplications are required to setup the approximate CO problem, and PAR reduction is still improved compared to optimizing f (b) in 64 searches which involve 128NL, i.e., four times the number of multiplications. With P = 16 we see that the approximation yields essentially the same performance as the original objective function, but the number of multiplications is reduced by a factor of 10. 6.3 SI Embedding Scheme for PTS In order to verify the analytical approximations for the BER if the new SI embedding scheme is applied for PTS, Fig. 11 shows the BER performances for uncoded OFDM transmission with (i) conventional Gray labelled 16QAM (i.e., either perfect explicit SI is available or PAR reduction is not applied) and (ii) 16QAM with sign-bit mapping and the novel SI embedding scheme. Unfaded additive white Gaussian noise (AWGN) and fading AWGN channels with D-fold diversity are considered, and simulation and numerical results using Eqs. (25) and (26) are plotted. It can be seen that (i) numerical and simulation results match very well and (ii) the loss in BER performance compared to OFDM without SI embedding is very small (about a factor 1.2 in BER). This, together with the other features outlined in Section 5, renders the new SI embedding scheme an interesting solution for OFDM systems employing PTS-based PAR reduction. Revised as Paper for publication in the IEEE Transactions on Wireless Communications, November, 2006. 7 17 Conclusion In this paper, we revisited PTS for PAR reduction in OFDM systems and tackled the two main challenges of PTS, which are computational complexity and transmission of SI. We formulated the sequence search of PTS as a particular CO problem, which allowed us (i) to unify various heuristics for finding the optimal phase vector proposed in the PTS literature and (ii) to adapt efficient search algorithms known from the CO literature to PTS. A comparison of the different search strategies led to the interesting conclusion that simple random search performs best for relatively small numbers of sequence searches. The newly proposed simulated annealing and tabu search methods showed their strengths if close-to-optimum PAR reduction is desired. We also proposed a modified PTS objective function, which reduces the number of multiplications required for PTS, while yielding excellent PAR reduction results. Finally, we presented a novel SI embedding scheme, which is transparent to PAR reduction, requires only minimal additional bandwidth and complexity, and does practically not degrade the BER performance. References [1] S. Han and J. Lee, “An overview of peak-to-average power ratio reduction techniques for multicarrier transmission,” IEEE Trans. Wireless Commun., vol. 12, no. 2, pp. 56–65, Apr. 2005. [2] S.H. Müller and J.B. Huber, “OFDM with reduced peak-to-average power ratio by optimum combining of partial transmit sequences,” Electron. Lett., vol. 33, no. 5, pp. 368–369, Feb. 1997. [3] ——, “A novel peak power reduction scheme for OFDM,” in Proc. of Intern. Symp. on Personal, Indoor and Mobile Radio Communications (PIMRC), 1997, pp. 1090–1094. [4] L. J. Cimini, Jr and N. R. Sollenberger, “Peak-to-average power ratio reduction of an OFDM signal using partial transmit sequences,” IEEE Commun. Lett., vol. 4, no. 3, pp. 86–88, Mar. 2000. [5] S. H. Han and J. H. Lee, “PAPR reduction of OFDM signals using a reduced complexity PTS technique,” IEEE Sig. Processing Lett., vol. 11, no. 11, pp. 887–890, Nov. 2004. [6] C.-C. Feng, C.-Y. Wang, C.-Y. Lin, and Y.-H. Hung, “Protection and transmission of side information for peak-to-average power ratio reduction of an OFDM signal using partial transmit sequences,” in Proc. of Veh. Technol. Conf. (VTC), vol. 4, Oct. 2003, pp. 2461–2465. [7] A. D. S. Jayalath and C. Tellambura, “Side information in PAR reduced PTS-OFDM signals,” in Proc. of Intern. Symp. on Personal, Indoor and Mobile Radio Communications (PIMRC), vol. 1, Sept. 2003, pp. 226–230. [8] S. H. Müller-Weinfurtner, OFDM for wireless communications: Nyquist windowing, peak-power reduction, and synchronization. Aachen: Shaker Verlag, 2000. [9] L. J. Cimini, Jr. and N. R. Sollenberger, “Peak-to-average power ratio reduction of an OFDM signal using partial transmit sequences with embedded side information,” in Proc. of IEEE Global Telecommun. Conf. (GLOBECOM), vol. 2, Nov.-Dec. 2000, pp. 746–750. Revised as Paper for publication in the IEEE Transactions on Wireless Communications, November, 2006. 18 [10] C.-C. Feng, Y.-T. Wu, and C.-Y. Chi, “Embedding and detection of side information for peak-toaverage power ratio reduction of an OFDM signal using partial transmit sequences,” in Proc. of Veh. Technol. Conf. (VTC), vol. 2, Oct. 2003, pp. 1354–1358. [11] A. D. S. Jayalath and C. Tellambura, “SLM and PTS peak-power reduction of OFDM signals without side information,” IEEE Trans. Wireless Commun., vol. 4, pp. 2006–2013, Sept. 2005. [12] W. Henkel and B. Wagner, “Another application for trellis shaping: PAR reduction for DMT (OFDM),” IEEE Trans. Commun., vol. 48, no. 9, pp. 1471–1476, Sept. 2000. [13] H. Ochiai, “A novel trellis-shaping design with both peak and average power reduction for OFDM systems,” IEEE Trans. Commun., vol. 52, no. 11, pp. 1916–1926, Nov. 2004. [14] C. Tellambura, “Improved phase factor computation for the PAR reduction of an OFDM signal using PTS,” IEEE Commun. Lett., vol. 5, no. 4, pp. 135–137, Apr. 2001. [15] ——, “Computation of the continuous-time PAR of an OFDM signal with BPSK subcarriers,” IEEE Commun. Lett., vol. 5, no. 5, pp. 185–187, May 2001. [16] J. Tellado, Multicarrier modulation with low PAR: applications to DSL and wireless. Norwell, MA, USA: Kluwer Academic Publishers, 2000. [17] C. R. Reeves, Ed., Modern heuristic techniques for combinatorial problems. London: Blackwell Scientific Publications, 1993. [18] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, “Optimization by simulated annealing,” Science, vol. 220, 4598, pp. 671–680, 1983. [19] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, and A. H. Teller, “Equation of state calculations by fast computing machines,” Journal of Chemical Physics, vol. 21, no. 6, pp. 1087–1092, June 1953. [20] S. R. White, “Concepts of scale in simulated annealing,” in Proc. IEEE Int. Conference on Computer Design, 1984, pp. 646–651. [21] M. M. Atiqullah, “An efficient simple cooling schedule for simulated annealing,” in Proc. Int. Conf. Computational Science and Its Applications (ICCSA), Assisi, Italy, 2004, pp. 396–404. [22] F. Glover, “Future paths for integer programming and links to artificial intelligence,” Comput. Oper. Res., vol. 13, no. 5, pp. 533–549, 1986. [23] F. Glover and F. Laguna, Tabu Search. Norwell, MA, USA: Kluwer Academic Publishers, 1997. [24] X. Chen and T. Parks, “Design of FIR filters in the complex domain,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 35, no. 2, pp. 144–153, Feb. 1987. [25] B.S. Krongold and D.L. Jones, “An active-set approach for OFDM PAR reduction via tone reservation,” IEEE Trans. Signal Processing, vol. 52, no. 2, pp. 495–509, Feb. 2004. [26] R. E. Blahut, Theory and Practice of Error Control Codes. Reading, Massachusetts: Addison– Wesley, 1983. [27] C. Lee, S. Ng, L. Piazzo, and L. Hanzo, “TCM, TTCM, BICM and iterative BICM assisted OFDMbased digital video broadcasting to mobile receivers,” in Proc. of Veh. Technol. Conf. (VTC), Rhodes, Greece, May 2001, pp. 732–736. [28] J. Proakis, Digital Communications, 4th ed. New York: McGraw–Hill, 2001. Revised as Paper for publication in the IEEE Transactions on Wireless Communications, November, 2006. 19 // Simulated Annealing for PTS Input: temperature T Select initial vector b̃, best solution b̂ = b̃ for (i = 1, . . . , I − 1) Select next trial vector b from b̃ by cyclic bit-flipping Calculate δ = f (b) − f (b̃) Generate a uniform random number r in (0, 1) if (δ < 0 or r < e−δ/T ) b̃ = b end if if (f (b) < f (b̂)) b̂ = b end if Update T according to cooling schedule end for Return b̂ // Tabu Search for PTS Input: tabu tenure B Initial tabu list L(B) = ∅ Select initial vector b̃, best solution b̂ = b̃ for (k = 1, . . . , K − 1) Create acceptance set A = N (b̃) \ L(B) Select new solution b̃ = argminb∈A {f (b)} if (f (b̃) < f (b̂)) b̂ = b̃ end if Update tabu list L(B) end for Return b̂ Figure 1: Pseudocode for metaheuristics proposed for PTS. Top: simulated annealing, bottom: tabu search. Revised as Paper for publication in the IEEE Transactions on Wireless Communications, November, 2006. P =8 P = 12 P = 16 P =4 circle Figure 2: Approximation of a circle with a P -sided symmetrical polygon. 20 Revised as Paper for publication in the IEEE Transactions on Wireless Communications, November, 2006. 21 from PTS algorithm Transmitter c(v) d(v) H −T u(v) s(v) MSB Mapping X (v) LSB data Receiver X̂ (v) ŝ(v) Demapping MSB H T d̂ (v) LSB Figure 3: Block diagram of SI embedding scheme for one subblock. Top: transmitter side, bottom: receiver side. Revised as Paper for publication in the IEEE Transactions on Wireless Communications, November, 2006. 22 1 0.9 0.8 Gb(ζ0 ) −→ 0.7 0.6 0.5 different realizations of X 0.4 0.3 0.2 0.1 0 6 7 8 9 10 11 12 10 log10 (ζ0 ) [dB] −→ Figure 4: CDF Gb(ζ0 ) of ζ(x) = f (b)/σx2 considering b as an in {−1, +1}V −1 uniformly distributed random variable and fixed X. Plot shows Gb(ζ0 ) for 10 different realizations of X. Revised as Paper for publication in the IEEE Transactions on Wireless Communications, November, 2006. 23 0 10 Original OFDM RS Optimal PTS [2,4,16,64,256, 1024,4096] searches −1 Fζ (ζ0 ) −→ 10 −2 10 −3 10 6 7 8 10 log10 (ζ0 ) [dB] 9 10 11 −→ Figure 5: CCDF of PAR for random search (RS) with different number of searches. Also shown: OFDM without PAR reduction and optimal PTS with 215 searches. Revised as Paper for publication in the IEEE Transactions on Wireless Communications, November, 2006. 24 7.5 7 E{f (b̃)}/σx2 −→ 6.5 6 5.5 different realizations of X 5 4.5 4 3.5 −2 10 −1 0 10 10 T /σx2 1 10 2 10 −→ Figure 6: Normalized average of f (b̃) with respect to b̃ for simulated annealing at fixed T and three (arbitrarily chosen and) different data vectors X as function of T /σx2 . Revised as Paper for publication in the IEEE Transactions on Wireless Communications, November, 2006. 25 Geometric cooling Gaussian−like cooling Constant temperature 1st data vector 6 5 −→ 6 f (b̂)/σx2 4 5 2nd data vector 4 3rd data vector 6 5 4 100 200 300 400 Iteration i 500 600 700 800 900 1000 −→ Figure 7: PAR f (b̂)/σx2 of best-so-far solution b̂ for simulated annealing with three different cooling schedules vs. number of iterations i. Results are plotted for three (representatively chosen and) different data vectors X. Revised as Paper for publication in the IEEE Transactions on Wireless Communications, November, 2006. 26 0 10 RS SA Optimal PTS 16 −1 10 −→ 64 Fζ (ζ0 ) 256 1024 4096 −2 10 −3 10 6 6.5 7 7.5 8 8.5 10 log10 (ζ0 ) [dB] −→ Figure 8: CCDF of PAR for simulated annealing (SA) with constant temperature and random search (RS) for different number of searches. Also shown: optimal PTS with 215 searches. (average) # of searches required for Fζ (ζ0 ) = 10−3 −→ Revised as Paper for publication in the IEEE Transactions on Wireless Communications, November, 2006. 27 RS SA TS BFS LS dH=1 3 LS d =2 10 H 2 10 1 10 6.8 7 7.2 7.4 7.6 7.8 10 log10 (ζ0 ) [dB] 8 −→ 8.2 8.4 8.6 8.8 Figure 9: (Average) number of searches required to achieve Fζ (ζ0 ) = 10−3 for PTS with different search strategies (random search (RS), simulated annealing (SA), tabu search (TS), bit-flipping search (BFS), local search (LS) with Hamming distances dH = 1 and dH = 2). Revised as Paper for publication in the IEEE Transactions on Wireless Communications, November, 2006. 28 0 10 SA, 64 searches Original OFDM −1 10 −→ SA, 1024 seaches P = 4 Fζ (ζ0 ) P = 8 P = 16 −2 10 PTS with approximated peak power as objective function Optimal PTS PTS with true peak power as objective function −3 10 6 6.5 7 7.5 8 8.5 10 log10 (ζ0 ) [dB] 9 9.5 10 10.5 11 11.5 −→ Figure 10: CCDF of PAR for simulated annealing (SA) with I = 1024 searches, using approximation of the peak power with P = 4, 8, 16. For comparison: SA with 64 searches, optimal PTS, and OFDM without PAR reduction. Revised as Paper for publication in the IEEE Transactions on Wireless Communications, November, 2006. 29 0 10 SI embedding, numerical SI embedding, simulation no SI embedding −1 10 D = 1 −2 D =2 −3 10 BER −→ 10 AWGN D =4 −4 10 −5 10 −6 10 0 5 10 15 10 log10 (Ēs /N0 ) [dB] 20 25 30 −→ Figure 11: BER vs. 10 log10 (Ēs /N0 ) for OFDM with and without SI embedding. AWGN and Rayleigh fading with D-fold diversity reception.