Shi 2009

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

ARTICLE IN PRESS

Signal Processing 89 (2009) 121–132

Contents lists available at ScienceDirect

Signal Processing
journal homepage: www.elsevier.com/locate/sigpro

An efficient acoustic echo cancellation design for systems with long


room impulses and nonlinear loudspeakers
Kun Shi, Xiaoli Ma , G. Tong Zhou
School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332-0250, USA

a r t i c l e in fo abstract

Article history: Acoustic echo is an annoying issue for many hands-free telecommunication systems.
Received 31 December 2007 Because of room acoustics and delay in the transmission path, echoes affect the sound
Received in revised form quality and may hamper communications. Thus, acoustic echo cancellers (AECs) are
1 July 2008
critical for enhancing the audio quality. Echo cancellation is challenging because long
Accepted 2 July 2008
Available online 22 July 2008
room impulse response slows down the convergence rate and increases the computa-
tional complexity of the AEC designs. Nonlinearity of the power amplifier or the
Keywords: loudspeaker further exacerbates the problem. In this paper, we propose a novel AEC
Acoustic echo cancellation algorithm when the loudspeaker-enclosure-microphone system is described by a
Cascade structure
Hammerstein model. We show that by introducing a channel shortening filter, the
Channel shortening
length of the ‘‘effective’’ acoustic echo path is reduced considerably. Hence, the
Nonlinear adaptive filter
computational complexity of AECs is reduced and the convergence rate is increased.
Adaptive algorithms are developed for the proposed structure and their convergence is
shown analytically. The effectiveness of our method is illustrated by computer
simulations.
Published by Elsevier B.V.

1. Introduction and subtracting an estimate of the echo signal from the


microphone received signal.
Acoustic echo is noticeable and annoying in many Many adaptive filtering algorithms were developed to
hands-free telecommunication systems. Highly reverber- remove echoes while keeping full-duplex communica-
ant spaces and long room impulse response degrade the tions. One of the well-adopted methods is the least mean
sound quality and may hamper the effectiveness of square (LMS) algorithm. Several techniques based on
communication. Acoustic echo cancellers (AECs) are affine projection (AP) algorithm or recursive least-squares
developed to improve the audio quality. The general setup (RLS) algorithm have been developed to cope with an ill-
for echo cancellation is shown in Fig. 1. The received far- conditioned input autocorrelation matrix that degrades
end speech is the output at the near-end loudspeaker, LMS performance [1]. Based on a first-order approxima-
passing through the loudspeaker-enclosure-microphone tion, one may assume that the room impulse response
system (LEMS) to cause the echo signal. The microphone decays exponentially. A measure for the degree of this
signal is composed of the echo signal, near-end signal, decay is the reverberation time. Large reverberation time
noise, and any other distortion. Most AEC designs seek to leads to long room impulse response. Usually, the finite
remove the echo, noise, and distortion by reconstructing impulse response (FIR) filters which model the room
response can occupy several hundred to several thousand
taps [1]. It is known that long filters have a much better
performance than the shorter ones. However, the increase
 Corresponding author. of filter length will slow down the convergence rate
E-mail address: xiaoli@ece.gatech.edu (X. Ma). dramatically due to step-size constraint [2,3]. Hence, they

0165-1684/$ - see front matter Published by Elsevier B.V.


doi:10.1016/j.sigpro.2008.07.009
ARTICLE IN PRESS
122 K. Shi et al. / Signal Processing 89 (2009) 121–132

Near-end speech goal. As a result, the adaptive identification algorithm for


Near-end speech +
+ noise the Hammerstein system will converge faster and will
residual echo + noise
incur lower computational load. It is worth pointing out
that the proposed method can also be applied to linear-
echo Mic AEC AEC only scenarios. Here, we consider the nonlinear AEC
problem because the computational burden is much
Speaker heavier due to the coupling of linear and nonlinear parts.
PA Thus, the advantage of our proposed method is more
Far-end clearly demonstrated.
speech Far-end speech The rest of the paper is organized as follows. In Section 2,
we introduce the system model based on the proposed
LEMS
nonlinear AEC structure. Adaptation of the nonlinear filter
Fig. 1. General setup for acoustic echo cancellation. and shortening filter is considered in Section 3. In Section 4,
the convergence rate and computational complexity
are analyzed. Finally, simulation results are shown in
are more complicated and more expensive in terms of Section 5 and Section 6 concludes the paper.
convergence rate and computational complexity for AECs.
On the other hand, competitive consumer audio products
2. Nonlinear AEC design with a shortening filter
demand low-cost and small-sized analog components
(such as loudspeakers) which can exhibit nonlinear
characteristics. According to [4], linear AECs fail when Fig. 2 shows the architecture of our proposed nonlinear
nonlinearity is present in the LEMS. Therefore, this paper AEC. Let the LEMS consist of a (memoryless) nonlinear
focuses on the AEC design for systems with both long PA/loudspeaker followed by a linear subsystem (the room
room impulse responses and nonlinear loudspeakers. impulse response), it can be well represented by a
The nonlinearities in LEMS can be roughly classified Hammerstein model. The linear room impulse response
into two categories: nonlinearity with and without followed by an FIR filter wðnÞ is still a linear system. In the
memory. Nonlinearity with memory usually happens in adaptive compensation path, we use an auxiliary non-
high-quality audio equipments when the time constants linear block uð; hÞ and an FIR filter hðnÞ to approximate the
of the loudspeaker’s electro-mechanical system are large LEMS-shortening concatenation. The signal from the
compared to the sampling rate [5]. Volterra series-based far-end user sðnÞ first goes through the nonlinear
filters and neural networks are often used as nonlinear PA/loudspeaker. It then convolves with the room impulse
adaptive configurations for this category (see e.g., [6–10]). response to form the nonlinear acoustic echo cðnÞ. Next,
Memoryless nonlinearities occur typically at low-cost cðnÞ and vðnÞ are fed into a linear FIR filter called
power amplifier (PA) or the loudspeaker, such as in mobile shortening filter wðnÞ to generate the reference signal
equipments, where weight constraints call for low supply dðnÞ. The purpose of introducing wðnÞ is to make the
voltages [11]. To model an acoustic echo path, nonlinear ‘‘effective’’ channel which is the convolution of the room
cascade filters are usually exploited instead of Volterra impulse response and wðnÞ to have a smaller number of
filters or neural networks. Since the memoryless non- dominant taps. Although in theory, the convolution of two
linearities are primarily of the third and higher orders, sequences yields a longer sequence, when designed
adaptive Volterra filters are not computationally feasible carefully, a more compact sequence can result [16–18].
in this application. Thus, most AEC methods that have The shortening filter design has also been proposed in
been proposed in the literature are based on the cascade other applications. For example, a shortening filter has
structure [11–15]. However, by introducing a nonlinear been used to shorten the long impulse response of twisted
filter before the linear filter makes the adaptive filter pair telephone lines, and it is more common to be referred
design problem more challenging. Several methods have as channel shortening equalizer (CSE) [19].
been proposed to design nonlinear AECs while taking into Suppose that the shortening filter wðnÞ and the
account the convergence rate and computational com- AEC filter hðnÞ have lengths Lw and Lh , respectively.
plexity. For example, in [15], the convergence rate is
improved by using orthogonal polynomial bases to model noise v (n)
loudspeaker nonlinearity. However, the choice of ortho- y (n) d (n)
gonal bases depends on the statistical characteristics of w (n) e (n)
echo c (n) z (n)
the input signal. In [11], the fast convergence is achieved
by signal orthogonalization or RLS adaptation of the h (n)
coefficients for nonlinear part, but the computational x (n)
complexity still remains high. This paper aims at design-
ing efficient AECs that exhibit fast convergence rate and u (⋅;)
low complexity. s (n)
In this paper, we consider nonlinear AEC design for a
Hammerstein system, i.e., memoryless nonlinearity fol-
LEMS
lowed by an FIR filter. We show that with a shortening
filter, a shorter linear filter can be utilized to meet the AEC Fig. 2. Proposed nonlinear AEC structure.
ARTICLE IN PRESS
K. Shi et al. / Signal Processing 89 (2009) 121–132 123

Define vectors important. As long as the mean-square error of the echo


signal is small enough, the design is acceptable. Later, we
wðnÞ ¼ ½w0 ðnÞ; . . . ; wLw 1 ðnÞT , (1)
wil illustrate how to approximate the mean-square error
T
yðnÞ ¼ ½ yðnÞ; . . . ; yðn  Lw þ 1Þ . (2) in (12) and use simulation to verify it.
The reference signal dðnÞ can be expressed as
3.1. Adaptive algorithm to update linear filters
dðnÞ ¼ wT ðnÞyðnÞ. (3)
For the AEC branch, we model the nonlinear function Since the error signal in (10) is a linear function of hðnÞ
uð; hÞ as a linear combination of basis functions bk ðsÞ with and wðnÞ, the update equations can be derived using the
corresponding coefficients yk LMS algorithm [20] by finding the partial derivatives of
X
K e2 ðnÞ. For the AEC filter hðnÞ, we obtain
uðs; hÞ ¼ yk bk ðsÞ; h ¼ ½y1 ; y2 ; . . . ; yK T , (4)
k¼1 hðn þ 1Þ ¼ hðnÞ þ mh eðnÞrh fhT ðnÞSðnÞhðnÞg
where the basis functions are assumed known. Define ¼ hðnÞ þ mh eðnÞST ðnÞhðnÞ, (13)
vectors
where rh denotes the derivative over h. Similarly, the
bðnÞ ¼ ½b1 ðsðnÞÞ; b2 ðsðnÞÞ; . . . ; bK ðsðnÞÞT , (5) update equation for the shortening filter wðnÞ is [cf. (11)]
hðnÞ ¼ ½h0 ðnÞ; . . . ; hLh 1 ðnÞT , (6) wðn þ 1Þ ¼ wðnÞ  mw eðnÞyðnÞ. (14)
xðnÞ ¼ ½xðnÞ; . . . ; xðn  Lh þ 1ÞT (7)
Note that the idea of the shortening filter is similar to the
and a matrix one in [18].
The update equations (13) and (14) do not ensure
SðnÞ ¼ ½bðnÞ; bðn  1Þ; . . . ; bðn  Lh þ 1Þ. (8)
stability unless a strong condition is imposed on the step
The output of the AEC branch is sizes mh and mw . The optimum step size for the LMS
algorithm which guarantees stability and fast convergence
zðnÞ ¼ hT ðnÞxðnÞ ¼ hT ðnÞSðnÞhðnÞ. (9)
leads to the so-called normalized LMS (NLMS) algorithm
Our goal is to design filters wðnÞ, uð; hÞ, and hðnÞ such that [20]. Plugging (10) into (13), we obtain
dðnÞ and zðnÞ approximately cancel each other in a single-
talk scenario (i.e., local speech is not present). The purpose hðn þ 1Þ ¼ hðnÞ þ mh ½wT ðnÞyðnÞ  hT ðnÞSðnÞhðnÞST ðnÞhðnÞ
of the shortening filter wðnÞ is to reduce the required ¼ ½I  mh MðnÞhðnÞ þ pðnÞ, (15)
number of taps in hðnÞ and thus reduce the complexity
where MðnÞ ¼ ST ðnÞhðnÞhT ðnÞSðnÞ and pðnÞ ¼ ST ðnÞhðnÞwT
and improve the convergence rate of the AEC.
ðnÞyðnÞ. If we suppose that the coefficients hðnÞ and wðnÞ
of different filters, and the statistical characteristics of the
3. Adaptive algorithms for the nonlinear AEC
input signal are constant during the adaptation of the
parameters hðnÞ, the matrix E½MðnÞ and the vector E½pðnÞ
In this section, we discuss the adaptive mechanism for
are constant. In addition, E½MðnÞ is a positive semi-
finding the unknown coefficients of the different filters.
definite matrix, i.e., its eigenvalues are non-negative.
Recall that the unknown parameters include the coeffi-
Therefore, the update equation of hðnÞ for the NLMS
cients of the CSE wðnÞ, the coefficients h in the auxiliary
algorithm can be formulated as [20]
nonlinear block uð; hÞ, and the AEC filter hðnÞ. From Fig. 2,
the error signal eðnÞ can be written as [cf. (3) and (9)] mh
hðn þ 1Þ ¼ hðnÞ þ eðnÞST ðnÞhðnÞ. (16)
T T
kST ðnÞhðnÞk22
eðnÞ ¼ dðnÞ  zðnÞ ¼ w ðnÞyðnÞ  h ðnÞSðnÞhðnÞ. (10)
The convergence of the average tap weight vector E½hðnÞ
Then, the mean-square error (MSE) can be expressed as
can be guaranteed when appropriate mh is chosen [21],
Jðh; h; wÞ ¼ E½e2 ðnÞ ¼ E½ðwT ðnÞyðnÞ  hT ðnÞSðnÞhðnÞÞ2 , (11) which will be discussed in detail later. Moreover, subject
to the constraint in (12), we need to normalize hðnÞ after
where E½ denotes the statistical expectation. We propose
each update. Similarly, we obtain the NLMS update of the
the following criterion to solve the unknown parameters
shortening filter as
½h^ ; h;
^ w
^ ¼ arg min Jðh; h; wÞ; subject to khk2 ¼ 1; khk2 ¼ 1, mw
h;h;w wðn þ 1Þ ¼ wðnÞ  eðnÞyðnÞ. (17)
kyðnÞk22
(12)
where the constraints are added to avoid trivial solutions 3.2. Adaptive algorithms to update nonlinear coefficients
and k  k2 denotes the l2 norm.
In the following, we propose an adaptive method to In this section, we present three methods to update the
estimate the coefficients of the three filters. It can be seen nonlinear coefficients. These three methods are widely
from (12) that interactions exist among the updating adopted for nonlinear AECs. However, different schemes
processes of different parameters and it is difficult to exhibit different characteristics in terms of convergence
guarantee the adaptive algorithm convergence to the rate and computational complexity based on different
global minimum. However, AEC design is different from criteria [11,22]. We will show the performance of these
system identification issues where identifiability is methods when a shortening filter is employed.
ARTICLE IN PRESS
124 K. Shi et al. / Signal Processing 89 (2009) 121–132

3.2.1. NLMS adaptation Table 1


Following the similar procedure as that developed for Adaptive algorithm for the nonlinear AEC with a CSE
the coefficients of linear filters, the update equation for
Update hðn þ 1Þ by nonlinearity identification algorithms
the nonlinear coefficients hðnÞ can be derived as
e ¼ dðnÞ  hT ðn þ 1ÞuðnÞ
mh
my hðn þ 1Þ ¼ hðnÞ þ eðnÞST ðnÞhðnÞ
hðn þ 1Þ ¼ hðnÞ þ eðnÞSðnÞhðnÞ. (18) kST ðnÞhðnÞÞk22
2
kSðnÞhðnÞk2 hðn þ 1Þ
hðn þ 1Þ ¼
khðn þ 1Þk2
mw
wðn þ 1Þ ¼ wðnÞ  eðnÞyðnÞ
3.2.2. RLS adaptation kyðnÞk22

The adaptation of nonlinear coefficients h can also be


performed by the RLS algorithm to speed up the
convergence [11]. Taking into account the shortening
filter, we develop the RLS adaptation for the nonlinear
coefficients in our proposed nonlinear AEC structure.
Denote uðnÞ ¼ SðnÞhðnÞ. If the coefficients of the non- where
linear block uð; hÞ are the only unknown parameters, Z 0:5
H
given N0 þ 1 samples, we wish to minimize the cost R1 ¼ S1
yy ðf Þsyb ðf Þsyb ðf Þ df , (24)
0:5
function
R2 ¼ E½bðnÞ bT ðnÞ (25)
n¼N
X0 n¼N
X0 and syb ðf Þ is a vector with the kth element being the cross
JðhÞ ¼ e2 ðnÞ ¼ ½ðdðnÞ  hT ðnÞuðnÞÞ2 . (19) spectral density between yðnÞ and bk ðsðnÞÞ, 1pkpK; lmax is
n¼0 n¼0
the largest generalized eigenvalue for the pair (R1 , R2 ).
The least-squares (LS) method leads to So far, we introduce the adaptive schemes for both
linear and nonlinear parts. The entire step-by-step algo-
rithm is summarized in Table 1.
h^ ¼ ðUT ðN0 ÞUðN 0 ÞÞ1 UT ðN 0 ÞdðN0 Þ, (20)

where UðN0 Þ ¼ ½uð0Þ; uð1Þ; . . . ; uðN 0 ÞT and dðN 0 Þ ¼ ½dð0Þ; 4. Performance analysis of AECs with channel shortener
dð1Þ; . . . ; dðN 0 ÞT . The RLS algorithm calculates h recur-
sively to avoid matrix inversion. In this section, we discuss the performance of the
proposed methods and several practical issues related to
3.2.3. Coherence adaptation real-time implementations.
For NLMS- and RLS-based methods, it is seen that the
non-quadratic form of (12) induces interactions between
the updating equations of nonlinear part and linear 4.1. Residual echo power
part. Thus, the stability is guaranteed by small-enough
step size [11]. An alternative method to estimate the Residual echo power is an important figure of merit to
nonlinear coefficients is based on the pseudo magni- measure the performance of AECs. To separate the effect
tude squared coherence (MSC) function [22]. The MSC- of linear and nonlinear filters, we assume that there is no
based method guarantees stability because it performs model mismatch of the system’s nonlinearity, i.e., the
nonlinearity identification independent of linear part. nonlinearity in the loudspeaker can be modeled using
Denoted by Syx ðf Þ the cross spectral density between uð; hÞ with perfect knowledge of h. The block diagram
yðnÞ and xðnÞ at frequency f, the pseudo-MSC function for this analysis is given in Fig. 3. The LEMS consists of
between random processes yðnÞ and xðnÞ at frequency f is loudspeaker nonlinearity uð; hÞ and room impulse re-
defined as sponse hr ðnÞ. In the following, for any quantity x, x^ stands
for its corresponding estimate.
jSyx ðf Þj2 Consider the background noise at the microphone,
C yx ðf Þ ¼ , (21) i.e., yðnÞ is the corrupted version of cðnÞ by the noise vðnÞ.
Syy ðf Þs2x
Suppose that the original room impulse response hr ðnÞ has
where Syy ðf Þ is power spectral density (PSD) of yðnÞ, length Lo . Define signal vectors
and s2x ¼ E½x2 ðnÞ is the power of xðnÞ. Based on the
~
xðnÞ ¼ ½xðnÞ; xðn  1Þ; . . . ; xðn  Lw  Lo þ 2ÞT , (26)
property of the pseudo-MSC function, the nonlinearity
coefficients h can be solved by maximizing the following vðnÞ ¼ ½vðnÞ; vðn  1Þ; . . . ; vðn  Lw þ 1ÞT (27)
cost function [22]: and a matrix
Z 2 3
0:5 hr ð0Þ    hr ðLo  1Þ 0  0
JðhÞ ¼ C yx ðf ; hÞ df , (22) 6 7
6 0 hr ð0Þ  hr ðLo  1Þ 0  7
0:5 6 7
H¼6 . .. 7.
6 . . . 7
with the solution satisfying 4 5
0  0 hr ð0Þ  hr ðLo  1Þ
R1 h ¼ lmax R2 h, (23) (28)
ARTICLE IN PRESS
K. Shi et al. / Signal Processing 89 (2009) 121–132 125

v (n)  There is no mismatch between the nonlinear model


and the loudspeaker nonlinearity.
s (n) s (n) c (n) y (n) dˆ (n)  The optimal solution h approximates the first Lh taps
u (;⋅) hr (n) ˆ (n)
w
of g~ .
LEMS  There is no noise in the microphone received signal,
i.e., vðnÞ ¼ 0.
 Double talk situation does not exist before the filters
converge.
ˆ
u(;⋅) ĥ (n)
x̂ (n) ẑ (n) e (n)  The input signal sðnÞ is wide-sense stationary.

Define
Fig. 3. System structure for performance analysis.
~
SðnÞ ¼ ½SðnÞ SD ðnÞ; g^~ ðnÞ ¼ ½^gT ðnÞ g^ TD ðnÞT , (35)
Over a block of Lw output symbols, the input–output where
relationship of hr ðnÞ can be cast in the matrix form
g^ ðnÞ ¼ ½g^ 0 ðnÞ; g^ 1 ðnÞ; . . . ; g^ Lh 1 ðnÞT ,
yðnÞ ¼ HxðnÞ
~ þ vðnÞ. (29)
g^ D ðnÞ ¼ ½g^ Lh ðnÞ; g^ Lh þ1 ðnÞ; . . . ; g^ Lh þD1 ðnÞT ,
Define the correlation matrices Rxx ¼ E½xðnÞ ~ x~ T ðnÞ,
SD ðnÞ ¼ ½bðn  Lh Þ; bðn  Lh  1Þ; . . . ; bðn  Lh  D þ 1Þ.
Ryy ¼ E½yðnÞyT ðnÞ, Rvv ¼ E½vðnÞvT ðnÞ, and Rxy ¼ RTyx ¼
E½xðnÞy
~ T
ðnÞ. Under the assumption of the perfect knowl- ^
The reference signal dðnÞ and the estimated signal z^ ðnÞ can
edge of the nonlinear coefficients, i.e., h^ ¼ h, the optimal be expressed as
solution for h^ based on (11) is the eigenvector of matrix RD
^
dðnÞ ¼ hT SðnÞ
~ g^~ ðnÞ ¼ hT SðnÞ^gðnÞ þ hT SD ðnÞ^g ðnÞ, (36)
corresponding to the smallest eigenvalue [18] D
T
RD h^ ¼ lmin h^ (30) z^ ðnÞ ¼ h^ ðnÞSðnÞhðnÞ.
^ (37)

and the corresponding optimal solution for the shortening Then, the error signal can be written as
filter w is [18] ^ T
eðnÞ ¼ dðnÞ  z^ ðnÞ ¼ hT SðnÞ^gðnÞ þ hT SD ðnÞ^gD ðnÞ  h^ ðnÞSðnÞhðnÞ.
^
^ ¼
w Rxy R1 ^ (31)
yy h, (38)
where D ¼ Lo þ Lw  1  Lh , RD ¼ ½ILh 0Lh D   Rx=y  Define
½ILh 0Lh D T , and T
ey ðnÞ9hT SðnÞh  h^ ðnÞSðnÞh ¼ eTy ðnÞuðnÞ, (39)
T
Rx=y ¼ Rxx  Rxy R1
yy Ryx ¼ R1
xx þH R1
vv H. (32)
where ey ðnÞ is the nonlinear coefficients error ey ðnÞ ¼
Define the ‘‘filtered’’ room impulse response as gðnÞ ^ ¼ h  h^ ðnÞ. Note that ey ðnÞ can be interpreted as the esti-
^
hr ðnÞ  wðnÞ with its vector form as g^~ ðnÞ ¼ ½g^ 0 ðnÞ; g^ 1 ðnÞ; . . . ; mation error produced by the nonlinear AEC filter under
g^ Lo þLw 2 ðnÞT . Therefore, the residual echo signal can be ^
the assumption of perfect linear coefficients, i.e., hðnÞ ¼h
written as and wðnÞ
^ ¼ w. Similarly, define the tracking errors caused
^ by imperfect h or w, respectively,
^
eres ðnÞ ¼ xðnÞ  gðnÞ  xðnÞ  hðnÞ, (33)
where  denotes linear convolution. The minimum mean- eh ðnÞ ¼ eTh ðnÞxðnÞ, (40)
square error (MMSE) (i.e., residual echo power) can be ew ðnÞ ¼ eTw ðnÞyðnÞ, (41)
obtained [18]
where eh ðnÞ and ew ðnÞ are errors of the coefficients of AEC
E½e2res ðnÞ ¼ E½ðxðnÞ  gðnÞ
^ ^
 xðnÞ  hðnÞÞ2
 ¼ lmin . (34) filter hðnÞ and shortening filter wðnÞ, respectively,
^
eh ðnÞ ¼ h  hðnÞ, (42)
4.2. Convergence analysis of the NLMS adaptation
ew ðnÞ ¼ w  wðnÞ.
^ (43)

A number of methods have been proposed for Note that the desired w should give g^ D  0. Thus, the
Hammerstein system identification but few of them estimation error of w leads to an imperfect g~ ðnÞ. Therefore,
provide stability and convergence analysis [23–26]. an alternative way to express the error signal due to the
Among them only [26] has known the convergence to inaccurate estimate of w is given by
the global minimum of the error surface. However, for our
ew ðnÞ ¼ hT SðnÞeg ðnÞ  hT SD ðnÞegD ðnÞ, (44)
setup, the introduction of shortening filter makes the
convergence analysis more challenging. Based on the where eg ¼ g  g^ ðnÞ  h  g^ ðnÞ, and egD ¼ gD  g^ D ðnÞ.
following assumptions, we discuss the convergence Finally, the error signal in (38) can be approximated by
behavior of the residual echo power for NLMS adaptation. the first order terms
Denote the optimal solutions of filters hðnÞ, wðnÞ, and
eðnÞ  ey ðnÞ þ eh ðnÞ þ ew ðnÞ þ hT SD ðnÞgD , (45)
uð; hðnÞÞ by h, w, and h, respectively.
where the second order terms are neglected. It is pointed
 The nonlinearity of loudspeaker and linear room out that this approximation ignores the interactions
impulse response are time invariant. between update of different parameters. This is suitable
ARTICLE IN PRESS
126 K. Shi et al. / Signal Processing 89 (2009) 121–132

when each coefficients vector gets more and more For the ith entry of ēw ðnÞ we obtain
converged. We will use some simulation results to show !n
that the solution found from the proposed method is not m ðiÞ
¯ ðiÞ
 w ðnÞ ¼ 1 w l ¯ ðiÞ
w ð0Þ. (53)
far from the ‘‘perfect’’ one. Based on the assumption that s2y y
h  g and (33) we obtain
Since ew ðnÞ is independent of yðnÞ, we may replace the
eres ðnÞ  xTD ðnÞgD ¼ hT SD ðnÞgD , (46) stochastic product yðnÞyT ðnÞ by its expected value and
hence write
where xD ðnÞ ¼ ½xðn  Lh Þ; . . . ; xðn  Lh  D þ 1ÞT . Combin-
ing (45) and (46), we obtain E½e2w ðnÞ ¼ E½eTw ðnÞyðnÞyT ðnÞew ðnÞ ¼ E½eTw ðnÞRy ew ðnÞ. (54)
eðnÞ  ey ðnÞ þ eh ðnÞ þ ew ðnÞ þ eres ðnÞ. (47)
Using (52) and (53), we may express E½e2w ðnÞ in (54) as
Interestingly, the first-order approximation of the error
term decouples the effects of the three filters, where the
Lw
X
E½e2w ðnÞ ¼ E½ēTw ðnÞKy ēw ðnÞ ¼ lðiÞ ðiÞ
y E½j¯
2
w ðnÞj 
first three terms are the errors caused by the estimation i¼1
errors of the individual unknown coefficients, and the last !2n
X
Lw
mw ðiÞ
one is due to the imperfect shortening. Note that, during lðiÞ 1 ð¯ðiÞ 2
¼ y l w ð0ÞÞ . (55)
the update of each filter’s coefficients, the form of (11) i¼1
s2y y
makes it difficult to analyze the global convergence.
Similarly, the MSEs of the estimates of h and h are
However, the decoupling of the residual error allows us
obtained, respectively,
to approximate the algorithm’s performance, because
each decoupled error term is generated by the estimation Lh
X  2n
mh ðiÞ
error of only one filter. E½e2h ðnÞ ¼ lðiÞ
x 1 l ð¯ðiÞ ð0ÞÞ2 , (56)
s2x x h
In the following, we discuss the convergence charac- i¼1
X
K  2n
teristic of the algorithm. Based on (47), the MSE can be my ðiÞ 2
E½e2y ðnÞ ¼ lðiÞ
u 1 l ð¯ðiÞ
y ð0ÞÞ , (57)
expressed as
i¼1
s2u u
JðnÞ ¼ E½e2 ðnÞ ðiÞ ðiÞ
where lx and lu are the ith eigenvalues of the auto-
 E½e2w ðnÞ þ E½e2h ðnÞ 2
þ E½ey ðnÞ þ E½e2res ðnÞ correlation matrices Rx ¼ E½xðnÞxT ðnÞ and Ru ¼ E½uðnÞuT ðnÞ,
þ 2E½ew ðnÞeh ðnÞ þ 2E½ew ðnÞey ðnÞ respectively. s2x , s2u , ¯ ðiÞ
h
, and ¯ ðiÞ
y are defined in a similar
þ 2E½eh ðnÞey ðnÞ þ 2E½ew ðnÞeres ðnÞ. (48) way as s2y and ¯ ðiÞ
w .
For the cross terms in (48), we assume that different
Consider only w as an unknown parameter, which is coefficients are independent on each other. Following the
updated by (17). Combining (43) and (17), we obtain similar procedure, we obtain
ew ðn þ 1Þ ¼ w  wðn
^ þ 1Þ Lh
Lw X
X
" #
mw E½ew ðnÞeh ðnÞ ¼  Ryx ði; jÞ¯ðiÞ hðjÞ ð0Þ
w ð0Þ¯
T
¼ ILw  yðnÞy ðnÞ ew ðnÞ, (49) i¼1 j¼1
kyðnÞk22 !n  n
m ðiÞ mh ðjÞ
where eðnÞ ¼ ew ðnÞ ¼ e T
According to the direct-  1 w l 1 l , (58)
w ðnÞyðnÞ. s2y y s2x x
averaging method [20, p. 259], the solution of the
difference equation (49), operating under the assumption
of a small step-size, is close to the solution of the
Lw X
X K
E½ew ðnÞey ðnÞ ¼  Ryu ði; jÞ¯ðiÞ ðjÞ
w ð0Þ¯ y ð0Þ
following stochastic difference equation: i¼1 j¼1
" # !n  n
mw m ðiÞ my ðjÞ
ew ðn þ 1Þ ¼ E ILw  T
yðnÞy ðnÞ ew ðnÞ. (50)  1 w l 1 l , (59)
kyðnÞk2 2 s2y y s2u u
Assuming that the size of yðnÞ is long enough, we obtain
Lh X
X K
" #
yðnÞyT ðnÞ E½yðnÞyT ðnÞ E½eh ðnÞey ðnÞ ¼ Rxu ði; jÞ¯ðiÞ
h
ð0Þ¯ðjÞ
y ð0Þ
E  . i¼1 j¼1
kyðnÞk22 Lw E½y2 ðmÞ    
m ðiÞ n m ðjÞ n
 1  2h lx 1  2y lu , (60)
Denote s2y ¼ Lw E½y2 ðmÞ and Ry ¼ E½yðnÞyT ðnÞ. By applying sx su
the eigenvalue decomposition on Ry , we obtain
!n
Ry ¼ Q y Ky Q Ty , (51) X
Lw
m ðiÞ
E½ew ðnÞeres ðnÞ ¼  r yx ðiÞ¯ ðiÞ
 w ð0Þ 1 w l , (61)
i¼1
s2y y
where Q y is a unitary matrix and Ky is a diagonal matrix
ðiÞ
consisting of the eigenvalues ly ; i ¼ 1; 2; . . . ; Lw . Define where Ryx ði; jÞ, Ryu ði; jÞ and Rxu ði; jÞ are the entries at the ith
T
ēw ðnÞ ¼ Q y ew ðnÞ. We rewrite (50) as column and the jth row of the matrices Ryx ¼ Q Tw
! E½yðnÞxT ðnÞQ h , Ryu ¼ Q Tw E½yðnÞuT ðnÞQ y , and Rxu ¼ Q Th
mw
ēw ðn þ 1Þ ¼ I  K ē ðnÞ. (52) E½xðnÞuT ðnÞQ y , respectively, and r yx ðiÞ is the ith entry of
s2y y w the vector ryx ¼ Q Tw E½yðnÞxTD ðnÞgD. Therefore, the MSE in
ARTICLE IN PRESS
K. Shi et al. / Signal Processing 89 (2009) 121–132 127

(48) can be written in Eq. (62) (cf. (55)–(61)). several hundreds or even close to a thousand. With a CSE,
!2n the AEC filter hðnÞ with a much smaller length Lh than Lo
X
Lw
mw ðiÞ can be used to estimate the echo signal. The dominant
2
JðnÞ  lmin þ lðiÞ
y 1 l ð¯ðiÞ
w ð0ÞÞ
i¼1
s2y y factor of computational complexity for AEC without a
Lh
X  2n shortening filter resides in the terms Lo and KLo . When Lh
mh ðiÞ
þ lðiÞ
x 1 l ð¯ðiÞ ð0ÞÞ2 and Lw are much smaller than Lo , the computational
i¼1
s2x x h
complexity of the proposed methods is reduced consider-
X
K  2n
my ðiÞ 2
ably relative to the existing ones.
þ lðiÞ
u 1 l ð¯ðiÞ
y ð0ÞÞ
i¼1
s2u u
!n  n 4.4. Implementation issues
X Lh
Lw X
mw ðiÞ mh ðjÞ
 R1 ði; jÞ¯ðiÞ ðjÞ
w ð0Þ¯ ð0Þ 1  l 1 l
i¼1 j¼1
h s2y y s2x x
!n  4.4.1. Double-talk issue
X
Lw X
K n
m ðiÞ m ðjÞ The purpose of the echo canceller is to identify the
 Ryu ði; jÞ¯ ðiÞ
  ðjÞ
w ð0Þ¯ y ð0Þ 1 w l 1  2y lu
i¼1 j¼1
s2y y su acoustic echo path and subtract an estimated echo signal,
Lh X     thereby achieving cancellation. However, when the speech
X K
mh ðiÞ n m ðjÞ n of the two talkers (one at each end) arrives simulta-
þ Rxu ði; jÞ¯ðiÞ
h
ð0Þ¯ðjÞ
y ð0Þ 1  2 lx 1  2y lu
i¼1 j¼1
sx su neously at the canceller, there is a double-talk situation.
!n
X
Lw
mw ðiÞ These occurrences may cause the problem for identifica-
 r yx ðiÞ¯ðiÞ
w ð0Þ 1  l . (62) tion of the echo path. The disturbed near-end speech may
i¼1
s2y y
cause the adaptive filter to diverge, or allow audibly
According to (62) we know that the convergence rate annoying echo to pass through to the other side [1]. The
ðiÞ ðiÞ
depends on the eigenvalues ly ; i ¼ 1; 2; . . . ; Lw , lx ; i ¼ usual way to alleviate this problem is to slow down or
ðiÞ
1; 2; . . . ; Lh , and lu ; i ¼ 1; 2; . . . ; K, and the smallest eigen- completely halt the filter adaptation when a near-end
value dominates the convergence rate. The step sizes speech is detected. This means that a double-talk detector
should be small enough to guarantee the convergence, (DTD) is needed. Our proposed AEC can work in the same
and the selection of step size depends on the statistical fashion as the traditional ones, which operate associa-
properties of signals. Different from other AECs, we notice tively with a DTD. The adaptation procedure of all the
that as time goes on, there is a residual echo power lmin filter blocks in Fig. 2 has to be inhibited during double-
which is due to the imperfect shortening filter. talk periods.

4.3. Computational complexity analysis


4.4.2. Equalizer in AEC
Since a shortening filter is inserted following the
Table 2 shows the computational complexity in terms microphone received signal, the speech signal at the
of the number of multiplications and additions required near-end is linearly ‘‘filtered’’ before it arrives at the far-
per iteration by the proposed and existing algorithms. For end. Moreover, the background noise is constantly filtered
all algorithms in Section 3, the auxiliary nonlinear block with the time-varying shortening filter. This may result in
uð; hÞ uses the same order K. For the algorithms without an audibly annoying ‘‘modulation’’ of the noise character-
shortening filter (NLMS, RLS, and MSC), the AEC filter hðnÞ istic. The speech quality is described by the intelligibility,
adopts the same length Lo as the original room impulse which is usually quantified by D50 measure. D50 is
response. The computational complexity of these algo- defined as the ratio of the energy within 50 ms after the
rithms depends on K and Lo . For the algorithms with first peak of a room impulse response versus the entire
shortening filter (NLMS-CSE, RLS-CSE, MSC-CSE), the impulse response energy [27]. Therefore, if the audibility
computational complexity is given in terms of K, Lh , of the filtered speech is degraded to below a certain
and Lw . threshold, it can be improved by using an equalizer to
Usually the order of the nonlinear model K is small, compensate for this convolution effect. With the equal-
while the length of room impulse response Lo can be izer, not only the near-end speech is ‘‘deconvolved’’, but
also the background noise is ‘‘demodulated’’. The equal-
Table 2 izer works associatively with the near-end voice activity
Computational complexity comparison of the proposed and existing
detector (VAD) and is switched on in the presence of near-
methods
end speech. Since the shortening filter freezes to update
Algorithms Multiplications Additions during the double-talk period, the equalizer can be
designed based on the latest coefficients of shortening
NLMS 2KLo þ 2Lo þ 3K þ 4 2KLo þ Lo þ 2K  2 filter. The design of equalizers is mature in the area of
RLS 2KLo þ 3K 2 þ 2Lo þ 5K þ 2 2KLo þ 2K 2 þ Lo þ 2K  1 communication systems and is beyond the scope of this
MSC 2KLo þ 2ðK þ 1Þ log2 N þ 8K 2 2KLo þ 3ðK þ 1Þ log2 N þ 8K 2
paper.
þ2Lo þ 17K þ 4 þLo þ 6K  3
NLMS-CSE 2KLh þ 4Lh þ 5K þ 3Lw þ 6 2KLh þ 2Lh þ 3K þ 3Lw  6
RLS-CSE 2KLh þ 3K 2 þ 4Lh þ 7K 2KLh þ 2K 2 þ 2Lh þ 3K 5. Simulation results
þ3Lw þ 4 þ3Lw  5
MSC-CSE 2KLh þ 2ðK þ 1Þ log2 N þ 8K 2 2KLh þ 3ðK þ 1Þ log2 N þ 8K 2
In this section, the performance of the proposed
þ4Lh þ 17K þ Lw þ 4 þ2Lh þ 6K þ Lw  5
methods is assessed via computer simulations. The
ARTICLE IN PRESS
128 K. Shi et al. / Signal Processing 89 (2009) 121–132

far-end signal sðnÞ was generated according to an i.i.d.


Gaussian distribution. In the acoustic echo path, non-
linearity typically manifests itself as in saturation curves 25
in the PA driving the loudspeaker, or in the loudspeaker
itself [11]. Thus, the PA/loudspeaker nonlinearity is
modeled by 20

dðsÞ ¼ tanhðsÞ ¼ ðe2s  1Þ=ðe2s þ 1Þ, (63)

ERLE (dB)
where tanh denotes the hyperbolic tangent function. This 15
is equivalent to the sigmoid function utilized in [13]. The
room impulse response is generated by an FIR filter whose
coefficients were obtained via 10

bðnÞean ; 4pnpLo ;
hr ðnÞ ¼ (64) NLMS−CSE with Lh = 100
0 otherwise
5 NLMS with Lh = 300
with bðnÞ being standard normal distributed; Lo ¼ 300 and
a ¼ 0:02. NLMS with Lh = 100
In [11], it has been shown that polynomial model is 0 NLMS linear with Lh = 300
appropriate for the nonlinear effect in real situations.
Thus, in the simulations, the auxiliary nonlinear block uðÞ 5000 10000 15000
in Fig. 2 adopts polynomial basis functions Number of samples
bk ðsÞ ¼ sk ; k ¼ 1; 2; . . . ; K, (65) Fig. 4. NLMS-based algorithms with/without CSE.
where the maximum nonlinearity order K ¼ 7. Note here
we assume a mismatch between the PA nonlinearity and
its corresponding model. We set the filter length Lh ¼ 100
and Lw ¼ 300, respectively. For comparison purpose, we 0.2
also implement algorithms without the shortening filter,
0.1
whereby the linear block is assumed to have length
hr (n)

300 ð¼ Lo Þ. In our tests, the signal-to-noise ratio (SNR) was 0


set at 30 dB. −0.1
The microphone received signal yðnÞ was generated −0.2
under the single-talk situation. Echo return loss enhance-
ment (ERLE) is used to measure the performance of the 0 100 200 300 400 500
proposed nonlinear echo canceller. Usually, ERLE is given n
as the ratio of the microphone received echo power to the
residual echo power. However, due to the introduction of
0.2
the shortening filter, the ERLE in our case is defined as
hr (n)∗ w (n)

0.1
2
E½d ðnÞ 0
ERLE (dB) ¼ 10 log10 , (66)
E½e2 ðnÞ −0.1
where dðnÞ and eðnÞ represent the ‘‘filtered’’ microphone −0.2
received signal and residual echo signal, respectively. It is
pointed out that instead of using microphone signal yðnÞ, 0 100 200 300 400 500
we use its ‘‘filtered’’ version dðnÞ in (66). Because the n
power of signal eðnÞ is different with and without the Fig. 5. Impulse response: (a) The original room impulse response hr ðnÞ
shortening filter. The effect of shortening filter to micro- has length Lo ¼ 300. (b) In theory, hr ðnÞ  wðnÞ would have length
phone signal yðnÞ and residual error signal eðnÞ is that the Lo þ Lw  1 ¼ 599; the actual effective duration of hr ðnÞ  wðnÞ is
gain kwðnÞk2 is applied to both of them. Lh ¼ 100, illustrating the effect of the channel shortening filter.

Appropriate initialization should be selected to fulfill


the requirements in (12). In the simulations, the initial
coefficients of AEC filter hðnÞ is randomly generated and calculated the theoretical residual error power in (34) by
then normalized to be unit norm. Shortening filter is finding the minimum eigenvalue of matrix RD in (32).
initialized with wð0Þ
^ ¼ 0. A natural and successful choice With SNR ¼ 30 dB, the theoretical result from Section 4.1
for nonlinear coefficients is h^ ð0Þ ¼ ½1; 0; . . . ; 0T . Fig. 4 give us the maximum ERLE as 25.3 dB, which is consistent
shows the results of NLMS-based algorithms with/without with the simulated result. This also shows that the
the shortening filter. It can be seen that the ERLE of the parameter estimation errors (the first three terms in
NLMS algorithm without the CSE can reach 29 dB, while (47)) converge to zero, i.e., global optimal points. Although
the ERLE-CSE saturated at around 25 dB. This effect is the proposed method has a 4 dB loss in ERLE, it increases
ascribed to the residual error in (34). To analytically justify the convergence rate a lot while achieving considerable
the 4 dB performance loss of our proposed method, we good echo cancellation performance. Moreover, the
ARTICLE IN PRESS
K. Shi et al. / Signal Processing 89 (2009) 121–132 129

proposed method reduces the computational complexity 30


significantly (see Table 6). In Fig. 4, we also show the
ERLEs of the NLMS algorithm with a shorter linear filter
length (‘‘NLMS with Lh ¼ 100’’) and the traditional linear 25
AEC (‘‘NLMS linear with Lh ¼ 300’’) as benchmarks. None
of them can achieve reasonable performance. This 20
simulation shows the effectiveness of the proposed
method.
Fig. 5 depicts the original and the shortened impulse 15

ERLE (dB)
response. We see that the method is quite successful in
reducing the effective impulse response length. A more
10
complete elimination of the tail of hr ðnÞ  wðnÞ can be
achieved with a longer AEC filter, but even with the given
hðnÞ of length Lh ¼ 100, a fairly high ERLE is achieved 5
(see Fig. 4).
For RLS- and MSC-based methods, Figs. 6 and 7 show MSC−CSE with Lh = 100
three cases with/without shortening filter. In these 0
MSC with Lh = 300
figures, we observe that the algorithms with a shortening
filter and a shorter AEC filter converge faster due to the MSC with Lh = 100
−5
small number of filter taps. However, without shortening
filter, the performance of the algorithms with a short AEC 5000 10000 15000
filter degrades about 15 dB. For the three different Number of samples
methods (NLMS, RLS and MSC) to update nonlinear
Fig. 7. MSC-based algorithms with/without CSE.
coefficients with a shortening filter, they have very similar
convergence rate. The RLS-CSE and MSC-CSE algorithms
achieve better ERLE compared to NLMS-CSE algorithm.
To quantitatively evaluate the system identification
Table 3
performance, misalignment is taken as a figure of merit.
Dh (dB) and Dy (dB) of different methods
For the linear part misalignment, we use the distance
measure defined as NLMS-CSE RLS-CSE MSC-CSE

^ 2
k^g  hk Dh (dB) 42.1 42.8 43.4
2
Dh ðdBÞ ¼ 10 log10 . (67) Dy (dB) 44.9 45.5 45.8
k^gk22

Recall g^ is the first part of the shortened room impulse


response. For the nonlinear part misalignment, we adopt

the distance measure as

30 kh  h^ k22
Dy ðdBÞ ¼ 10 log10 . (68)
khk22
25
For the proposed methods, Dh and Dy are calculated when
the iterative process is terminated and at SNR ¼ 30 dB
20 (see Table 3). The results show that the estimates of the
nonlinear coefficients converge to the true values since
the misalignment Dy is very small. The results also
ERLE (dB)

15 illustrate the effectiveness of the shortening filter since


the residual energy is negligible relative to the energy of
the dominant part after shortening.
10 Furthermore, we evaluate the performance of the
proposed method for different ratios of the room impulse
response length Lo with respect to the AEC filter length Lh
5
RLS−CSE with Lh = 100 and shortening filter length Lw . First, we fix Lo ¼ 300 and
Lw ¼ 300. ERLEs with respect to different Lo =Lh are shown
RLS with Lh = 300 in Table 4. It can be seen that decreasing Lh degrades the
0
RLS with Lh = 100 echo cancellation performance due to the increase of the
uncompensated residual error. Second, ERLEs for different
5000 10000 15000 Lo =Lw are shown in Table 5 with Lo ¼ 300 and Lh ¼ 100. It
Number of samples can be observed that larger Lw achieves better shortening
performance, thus leads to better ERLE performance,
Fig. 6. RLS-based algorithms with/without CSE. however more computational burden is incurred.
ARTICLE IN PRESS
130 K. Shi et al. / Signal Processing 89 (2009) 121–132

Table 4
ERLE (dB) for different Lo =Lh 2

s (n)
Criterion n Lo =Lh 1 2 3 4 5 6 0
−2
NLMS-CSE 25.4 25.2 25.2 22.1 21.4 20.6
0 1 2 3 4 5 6 7
RLS-CSE 25.8 25.5 25.3 22.3 21.4 20.8
MSC-CSE 25.6 25.5 25.4 22.7 21.6 20.8
n x 104

v (n)
0
Table 5
ERLE (dB) for different Lo =Lw −2
0 1 2 3 4 5 6 7
Criterion n Lo =Lw 0.3 0.5 1 2 3 4 5 n x 104

NLMS-CSE 27.4 25.7 25.2 21.2 19.6 18.9 18.0


RLS-CSE 28.0 25.9 25.3 21.2 19.8 18.8 18.1 2
MSC-CSE 27.4 26.3 25.4 21.5 19.8 18.7 18.2

e (n)
0
−2
0 1 2 3 4 5 6 7
30 n x 104

Fig. 9. Different signals for nonlinear acoustic echo cancellation: (a) far-
end speech sðnÞ, (b) near-end speech vðnÞ, (c) Nonlinear AEC output eðnÞ.
25

20 Table 6
Computational complexity comparison with specific parameters

15 Algorithms Multiplications Additions


ERLE (dB)

NLMS 4825 4512


10 RLS 4984 4611
MSC 5427 5099
NLMS-CSE 3515 3372
RLS-CSE 3674 3471
5
MSC-CSE 3501 3354

RLS−CSE−RES with Lh = 100


0
RLS with Lh = 300
To evaluate the performance of the proposed method
RLS−RES with Lh = 100 in double-talk situations, we carry out simulation with the
−5
same setup as the previous one. In the simulation, an ideal
5000 10000 15000 double-talk detector is assumed. During the presence of
Number of samples near-end speech, all filters are freezed to update. More-
over, an equalizer of length 600 is employed in the
Fig. 8. AEC algorithms with/without CSE with a RES.
presence of near-end speech. The far-end speech sðnÞ is
shown in Fig. 9(a). The near-end speech vðnÞ is depicted in
Fig. 9(b). In Fig. 9(c) the nonlinear AEC output signal eðnÞ is
Although the proposed methods improve the conver- shown. It can be seen that the echo signal has been
gence rate and reduce the computational complexity, they sufficiently suppressed. The near-end speech is slightly
introduce some performance loss. To compensate this distorted. The origin of the distortion is due to the
loss, we take advantage of the residual echo suppressor combined effect of shortening filter and the following
(RES) after the echo canceller. A practical AEC usually equalizer.
consists of a RES to achieve sufficient echo reduction, With Lo ¼ 300, Lh ¼ 100, Lw ¼ 300 and K ¼ 7, the
since it is difficult to reach the echo attenuation as average number of operations during each iteration of
required by ITU and ETSI recommendations with an echo all methods are shown in Table 6. Note that in Table 6 we
canceller filter alone [1]. For demonstration purpose, we also take into account the RES and equalizer used
implement a RES based on [28] by designing a gain filter previously. For implementing the RES, the numbers of
in frequency domain. Fig. 8 shows the ERLE performance multiplications and additions needed in each iteration are
of our proposed method with a RES. We notice that the 174 and 258, respectively. For the equalizer with length of
RES can sufficiently suppress the residual echo. Also, even 600, the numbers of multiplications and additions needed
with a RES, a short AEC filter without CSE cannot achieve in each iteration are 600 and 599. It is shown that the
satisfactory ERLE, which further indicates the effective- computational complexity is reduced significantly by the
ness of our proposed method. proposed nonlinear AEC algorithms using a shortening
ARTICLE IN PRESS
K. Shi et al. / Signal Processing 89 (2009) 121–132 131

30

25 25

20
20
15
ERLE (dB)

ERLE (dB)
15
10

10 5

0
5
RLS−CSE−RES with Lh = 300 RLS
−5
RLS with Lh = 1024 MSC
0 RLS−CSE
RLS−RES with Lh = 300 −10 MSC−CSE

0.5 1 1.5 2 0.5 1 1.5 2


Number of samples x 104 Number of samples x 104

Fig. 10. AEC algorithms with/without CSE using long room impulse Fig. 12. AEC algorithms with/without CSE for real speech data.
response.
method performs well in the very long impulse response
environment and converges much faster than the one
without CSE but using a long linear filter. Note that the
complexity of our proposed method is much lower than
the one with long linear filter. Moreover, we evaluate the
25
performance of nonlinear AEC in the echo path change
scenario, which is a typical case for acoustic echo
20 cancellation application due to the time-varying acoustic
environment. The echo path change is simulated by
toggling between different room impulse responses. We
ERLE (dB)

15
consider two echo path change cases: echo path changes
before and after the convergence of nonlinear AEC. The
10 resulting ERLE is shown in Fig. 11. It can be observed that
for both types of path changes, the nonlinear AEC can
echo path change retrack the true echo path and the reconvergence rate is
5 after convergence not affected by the path changes. It is also illustrated that
introducing of CSE does not have reconvergence issue
0 echo path change during the echo path changes, because all of the filters in
before convergence the system are updated simultaneously. The performance
of the proposed methods is also justified using a real
0.5 1 1.5 2 2.5 3 speech signal, which is illustrated in Fig. 12. It is shown
that our proposed methods outperform the existing ones.
Number of samples x 104

Fig. 11. ERLE with echo path changes.


6. Conclusions
filter. For the shortening algorithms with different non-
linear update schemes, NLMS- and RLS-based algorithms In this paper, we proposed an efficient AEC design
need a few more operations for each iteration, but, gain a method. A very long room impulse response can pose
little bit performance in terms of ERLE (see Figs. 4, 6, and significant challenges for AEC design, especially on the
7). Moreover, RLS- and MSC-based methods outperform convergence rate and computational complexity. We
NLMS-based method with respect to the convergence rate. proposed a novel AEC structure by introducing CSE, which
Finally, we evaluate the performance in more realistic can be applied in both linear and nonlinear AECs. We
scenarios. In the following simulations, the IMAGE developed adaptive algorithms to update the unknown
method [29] is adopted to generate a room impulse coefficients. The shortening filter reshapes the room
response of length 1024 with sampling rate 8 kHz. The response impulse. In consequence, the ‘‘filtered’’ version
nonlinearity in (63) is also used to describe the loudspea- of the linear part in LEMS can be modeled by an FIR filter
ker nonlinearity. Fig. 10 shows the ERLE with i.i.d. with fewer taps. Therefore, the convergence rate is
Gaussian far-end signal. It can be seen that the proposed increased and computational complexity is reduced.
ARTICLE IN PRESS
132 K. Shi et al. / Signal Processing 89 (2009) 121–132

References [15] G.-Y. Jiang, S.-F. Hsieh, Nonlinear acoustic echo cancellation using
orthogonal polynomial, in: Proceedings of the IEEE ICASSP,
Toulouse, France, May 2006, pp. 273–276.
[1] E. Hänsler, G. Schmidt, Acoustic Echo and Noise Control: A Practical [16] S. Qureshi, E. Newhall, An adaptive receiver for data transmission
Approach, Wiley, NJ, 2004. over time-dispersive channels, IEEE Trans. Inform. Theory 19 (July
[2] T.-K. Woo, Fast hierarchical least mean square algorithm, IEEE 1973) 448–457.
Signal Process. Lett. 8 (November 2001) 289–291. [17] W. Lee, F. Hill, A maximum-likelihood sequence estimator with
[3] Y. Gu, K. Tang, H. Cui, W. Du, Convergence analysis of a deficient-
decision-feedback equalization, IEEE Trans. Comm. 25 (September
length LMS filter and optimal-length sequence to model exponen-
1977) 971–979.
tial decay impulse response, IEEE Signal Process. Lett. 10 (January
[18] N. Al-Dhahir, J.M. Cioffi, Efficiently computed reduced-parameter
2003) 4–7.
input-aided MMSE equalizers for ML detection: a unified approach,
[4] A.N. Birkett, R.A. Goubran, Limitations of handsfree acoustic echo
IEEE Trans. Inform. Theory 42 (May 1996) 903–915.
cancellers due to nonlinear loudspeaker distortion and enclosure
[19] P.J.W. Melsa, R.C. Younce, C.E. Rohrs, Impulse response shortening
vibration effects, in: Proceedings of the IEEE Workshop on
for discrete multitone transceivers, IEEE Trans. Comm. 44 (Decem-
Applications of Signal Processing to Audio and Acoustics,
ber 1996) 1662–1672.
New Paltz, New York, October 1995, pp. 103–106.
[20] S. Haykin, Adaptive Filter Theory, fourth ed., Prentice-Hall, Engle-
[5] F.X.Y. Gao, W.M. Snelgrove, Adaptive linearization of a loudspeaker,
wood Cliffs, NJ, 2002.
in: Proceedings of the IEEE ICASSP, Toronto, Canada, May 1991,
[21] N.J. Bershad, S. Bouchired, F. Castanie, Stochastic analysis of
pp. 3589–3592.
adaptive gradient identification of Wiener-Hammerstein systems
[6] A. Stenger, L. Trautmann, R. Rabenstein, Nonlinear acoustic echo
for Gaussian inputs, IEEE Trans. Signal Process. 48 (February 2000)
cancellation with 2nd order adaptive Volterra filters, in: Proceed-
ings of the IEEE ICASSP, vol. 2, Phoenix, AZ, March 1999, 557–560.
pp. 877–880. [22] K. Shi, X. Ma, G. T. Zhou, Adaptive nonlinearity identification in a
[7] F. Küch, W. Kellermann, Partitioned block frequency-domain Hammerstein system using a pseudo coherence function, in: IEEE
adaptive second-order Volterra filter, IEEE Trans. Signal Process. Statistical Signal Processing Workshop, Madison, WI, August 2007,
53 (February 2004) 564–575. pp. 26–29.
[8] A. Guérin, G. Faucon, R. Le Bouquin-Jeannes, Nonlinear acoustic [23] N. Bershad, P. Celka, J.-M. Vesin, Stochastic analysis of gradient
echo cancellation based on Volterra filters, IEEE Trans. Speech Audio adaptive identification of nonlinear systems with memory for
Process. 11 (November 2003) 672–683. Gaussian data and noisy input and output measurements, IEEE
[9] A.N. Birkett, R.A. Goubran, Nonlinear loudspeaker compensation Trans. Signal Process. 47 (March 1999) 675–689.
for hands free acoustic echo cancellation, IEE Electron. Lett. 32 [24] A.E. Nordsjö, L.H. Zetterberg, Identification of certain time-varying
(June 1996) 1063–1064. nonlinear Wiener and Hammerstein systems, IEEE Trans. Signal
[10] G. Sentoni, A. Altenberg, Nonlinear acoustic echo canceller with Process. 49 (March 2001) 577–592.
DABNET þ FIR structure, in: Proceedings of the IEEE Workshop on [25] J. Jeraj, V.J. Mathews, A stable adaptive Hammerstein filter
Applications of Signal Processing to Audio and Acoustics, New Paltz, employing partial orthogonalization of the input signals, IEEE
NY, October 2005, pp. 37–40. Trans. Signal Process. 54 (April 2006) 1412–1420.
[11] A. Stenger, W. Kellermann, Adaptation of a memoryless preproces- [26] J. Jeraj, V.J. Mathews, Stochastic mean-square performance analysis
sor for nonlinear acoustic echo cancelling, Signal Processing 80 of an adaptive Hammerstein filter, IEEE Trans. Signal Process. 54
(September 2000) 1747–1760. (June 2006) 2168–2177.
[12] B.S. Nollett, D.L. Jones, Nonlinear echo cancellation for hands-free [27] International Organization for Standardization (ISO), ISO Norm
speakerphones, in: Proceedings of the IEEE Workshop on Nonlinear 3382: Acoustics—measurement of the reverberation time of rooms
Signal and Image Processing, Mackinac Island, MI, September 1997. with reference to other acoustical parameters, 1997.
[13] J.-P. Costa, A. Lagrange, A. Arliaud, Acoustic echo cancellation using [28] S. Gustafsson, R. Martin, P. Vary, Combined acoustic echo control
nonlinear cascade filters, in: Proceedings of the IEEE ICASSP, vol. 5, and noise reduction for hands-free telephony, Signal Processing 64
Hong Kong, China, April 2003, pp. 389–392. (January 1998) 21–32.
[14] H. Dai, W.-P. Zhu, Compensation of loudspeaker nonlinearity in [29] J.B. Allen, D.A. Berkley, Image method for efficiently simulating
acoustic echo cancellation using raised-cosine function, IEEE Trans. small-room acoustics, J. Acoustic Soc. Amer. 65 (1979)
Circuits Systems 53 (November 2006) 1190–1194. 943–950.

You might also like