Shi 2009
Shi 2009
Shi 2009
Signal Processing
journal homepage: www.elsevier.com/locate/sigpro
a r t i c l e in fo abstract
Article history: Acoustic echo is an annoying issue for many hands-free telecommunication systems.
Received 31 December 2007 Because of room acoustics and delay in the transmission path, echoes affect the sound
Received in revised form quality and may hamper communications. Thus, acoustic echo cancellers (AECs) are
1 July 2008
critical for enhancing the audio quality. Echo cancellation is challenging because long
Accepted 2 July 2008
Available online 22 July 2008
room impulse response slows down the convergence rate and increases the computa-
tional complexity of the AEC designs. Nonlinearity of the power amplifier or the
Keywords: loudspeaker further exacerbates the problem. In this paper, we propose a novel AEC
Acoustic echo cancellation algorithm when the loudspeaker-enclosure-microphone system is described by a
Cascade structure
Hammerstein model. We show that by introducing a channel shortening filter, the
Channel shortening
length of the ‘‘effective’’ acoustic echo path is reduced considerably. Hence, the
Nonlinear adaptive filter
computational complexity of AECs is reduced and the convergence rate is increased.
Adaptive algorithms are developed for the proposed structure and their convergence is
shown analytically. The effectiveness of our method is illustrated by computer
simulations.
Published by Elsevier B.V.
where UðN0 Þ ¼ ½uð0Þ; uð1Þ; . . . ; uðN 0 ÞT and dðN 0 Þ ¼ ½dð0Þ; 4. Performance analysis of AECs with channel shortener
dð1Þ; . . . ; dðN 0 ÞT . The RLS algorithm calculates h recur-
sively to avoid matrix inversion. In this section, we discuss the performance of the
proposed methods and several practical issues related to
3.2.3. Coherence adaptation real-time implementations.
For NLMS- and RLS-based methods, it is seen that the
non-quadratic form of (12) induces interactions between
the updating equations of nonlinear part and linear 4.1. Residual echo power
part. Thus, the stability is guaranteed by small-enough
step size [11]. An alternative method to estimate the Residual echo power is an important figure of merit to
nonlinear coefficients is based on the pseudo magni- measure the performance of AECs. To separate the effect
tude squared coherence (MSC) function [22]. The MSC- of linear and nonlinear filters, we assume that there is no
based method guarantees stability because it performs model mismatch of the system’s nonlinearity, i.e., the
nonlinearity identification independent of linear part. nonlinearity in the loudspeaker can be modeled using
Denoted by Syx ðf Þ the cross spectral density between uð; hÞ with perfect knowledge of h. The block diagram
yðnÞ and xðnÞ at frequency f, the pseudo-MSC function for this analysis is given in Fig. 3. The LEMS consists of
between random processes yðnÞ and xðnÞ at frequency f is loudspeaker nonlinearity uð; hÞ and room impulse re-
defined as sponse hr ðnÞ. In the following, for any quantity x, x^ stands
for its corresponding estimate.
jSyx ðf Þj2 Consider the background noise at the microphone,
C yx ðf Þ ¼ , (21) i.e., yðnÞ is the corrupted version of cðnÞ by the noise vðnÞ.
Syy ðf Þs2x
Suppose that the original room impulse response hr ðnÞ has
where Syy ðf Þ is power spectral density (PSD) of yðnÞ, length Lo . Define signal vectors
and s2x ¼ E½x2 ðnÞ is the power of xðnÞ. Based on the
~
xðnÞ ¼ ½xðnÞ; xðn 1Þ; . . . ; xðn Lw Lo þ 2ÞT , (26)
property of the pseudo-MSC function, the nonlinearity
coefficients h can be solved by maximizing the following vðnÞ ¼ ½vðnÞ; vðn 1Þ; . . . ; vðn Lw þ 1ÞT (27)
cost function [22]: and a matrix
Z 2 3
0:5 hr ð0Þ hr ðLo 1Þ 0 0
JðhÞ ¼ C yx ðf ; hÞ df , (22) 6 7
6 0 hr ð0Þ hr ðLo 1Þ 0 7
0:5 6 7
H¼6 . .. 7.
6 . . . 7
with the solution satisfying 4 5
0 0 hr ð0Þ hr ðLo 1Þ
R1 h ¼ lmax R2 h, (23) (28)
ARTICLE IN PRESS
K. Shi et al. / Signal Processing 89 (2009) 121–132 125
Define
Fig. 3. System structure for performance analysis.
~
SðnÞ ¼ ½SðnÞ SD ðnÞ; g^~ ðnÞ ¼ ½^gT ðnÞ g^ TD ðnÞT , (35)
Over a block of Lw output symbols, the input–output where
relationship of hr ðnÞ can be cast in the matrix form
g^ ðnÞ ¼ ½g^ 0 ðnÞ; g^ 1 ðnÞ; . . . ; g^ Lh 1 ðnÞT ,
yðnÞ ¼ HxðnÞ
~ þ vðnÞ. (29)
g^ D ðnÞ ¼ ½g^ Lh ðnÞ; g^ Lh þ1 ðnÞ; . . . ; g^ Lh þD1 ðnÞT ,
Define the correlation matrices Rxx ¼ E½xðnÞ ~ x~ T ðnÞ,
SD ðnÞ ¼ ½bðn Lh Þ; bðn Lh 1Þ; . . . ; bðn Lh D þ 1Þ.
Ryy ¼ E½yðnÞyT ðnÞ, Rvv ¼ E½vðnÞvT ðnÞ, and Rxy ¼ RTyx ¼
E½xðnÞy
~ T
ðnÞ. Under the assumption of the perfect knowl- ^
The reference signal dðnÞ and the estimated signal z^ ðnÞ can
edge of the nonlinear coefficients, i.e., h^ ¼ h, the optimal be expressed as
solution for h^ based on (11) is the eigenvector of matrix RD
^
dðnÞ ¼ hT SðnÞ
~ g^~ ðnÞ ¼ hT SðnÞ^gðnÞ þ hT SD ðnÞ^g ðnÞ, (36)
corresponding to the smallest eigenvalue [18] D
T
RD h^ ¼ lmin h^ (30) z^ ðnÞ ¼ h^ ðnÞSðnÞhðnÞ.
^ (37)
and the corresponding optimal solution for the shortening Then, the error signal can be written as
filter w is [18] ^ T
eðnÞ ¼ dðnÞ z^ ðnÞ ¼ hT SðnÞ^gðnÞ þ hT SD ðnÞ^gD ðnÞ h^ ðnÞSðnÞhðnÞ.
^
^ ¼
w Rxy R1 ^ (31)
yy h, (38)
where D ¼ Lo þ Lw 1 Lh , RD ¼ ½ILh 0Lh D Rx=y Define
½ILh 0Lh D T , and T
ey ðnÞ9hT SðnÞh h^ ðnÞSðnÞh ¼ eTy ðnÞuðnÞ, (39)
T
Rx=y ¼ Rxx Rxy R1
yy Ryx ¼ R1
xx þH R1
vv H. (32)
where ey ðnÞ is the nonlinear coefficients error ey ðnÞ ¼
Define the ‘‘filtered’’ room impulse response as gðnÞ ^ ¼ h h^ ðnÞ. Note that ey ðnÞ can be interpreted as the esti-
^
hr ðnÞ wðnÞ with its vector form as g^~ ðnÞ ¼ ½g^ 0 ðnÞ; g^ 1 ðnÞ; . . . ; mation error produced by the nonlinear AEC filter under
g^ Lo þLw 2 ðnÞT . Therefore, the residual echo signal can be ^
the assumption of perfect linear coefficients, i.e., hðnÞ ¼h
written as and wðnÞ
^ ¼ w. Similarly, define the tracking errors caused
^ by imperfect h or w, respectively,
^
eres ðnÞ ¼ xðnÞ gðnÞ xðnÞ hðnÞ, (33)
where denotes linear convolution. The minimum mean- eh ðnÞ ¼ eTh ðnÞxðnÞ, (40)
square error (MMSE) (i.e., residual echo power) can be ew ðnÞ ¼ eTw ðnÞyðnÞ, (41)
obtained [18]
where eh ðnÞ and ew ðnÞ are errors of the coefficients of AEC
E½e2res ðnÞ ¼ E½ðxðnÞ gðnÞ
^ ^
xðnÞ hðnÞÞ2
¼ lmin . (34) filter hðnÞ and shortening filter wðnÞ, respectively,
^
eh ðnÞ ¼ h hðnÞ, (42)
4.2. Convergence analysis of the NLMS adaptation
ew ðnÞ ¼ w wðnÞ.
^ (43)
A number of methods have been proposed for Note that the desired w should give g^ D 0. Thus, the
Hammerstein system identification but few of them estimation error of w leads to an imperfect g~ ðnÞ. Therefore,
provide stability and convergence analysis [23–26]. an alternative way to express the error signal due to the
Among them only [26] has known the convergence to inaccurate estimate of w is given by
the global minimum of the error surface. However, for our
ew ðnÞ ¼ hT SðnÞeg ðnÞ hT SD ðnÞegD ðnÞ, (44)
setup, the introduction of shortening filter makes the
convergence analysis more challenging. Based on the where eg ¼ g g^ ðnÞ h g^ ðnÞ, and egD ¼ gD g^ D ðnÞ.
following assumptions, we discuss the convergence Finally, the error signal in (38) can be approximated by
behavior of the residual echo power for NLMS adaptation. the first order terms
Denote the optimal solutions of filters hðnÞ, wðnÞ, and
eðnÞ ey ðnÞ þ eh ðnÞ þ ew ðnÞ þ hT SD ðnÞgD , (45)
uð; hðnÞÞ by h, w, and h, respectively.
where the second order terms are neglected. It is pointed
The nonlinearity of loudspeaker and linear room out that this approximation ignores the interactions
impulse response are time invariant. between update of different parameters. This is suitable
ARTICLE IN PRESS
126 K. Shi et al. / Signal Processing 89 (2009) 121–132
when each coefficients vector gets more and more For the ith entry of ēw ðnÞ we obtain
converged. We will use some simulation results to show !n
that the solution found from the proposed method is not m ðiÞ
¯ ðiÞ
w ðnÞ ¼ 1 w l ¯ ðiÞ
w ð0Þ. (53)
far from the ‘‘perfect’’ one. Based on the assumption that s2y y
h g and (33) we obtain
Since ew ðnÞ is independent of yðnÞ, we may replace the
eres ðnÞ xTD ðnÞgD ¼ hT SD ðnÞgD , (46) stochastic product yðnÞyT ðnÞ by its expected value and
hence write
where xD ðnÞ ¼ ½xðn Lh Þ; . . . ; xðn Lh D þ 1ÞT . Combin-
ing (45) and (46), we obtain E½e2w ðnÞ ¼ E½eTw ðnÞyðnÞyT ðnÞew ðnÞ ¼ E½eTw ðnÞRy ew ðnÞ. (54)
eðnÞ ey ðnÞ þ eh ðnÞ þ ew ðnÞ þ eres ðnÞ. (47)
Using (52) and (53), we may express E½e2w ðnÞ in (54) as
Interestingly, the first-order approximation of the error
term decouples the effects of the three filters, where the
Lw
X
E½e2w ðnÞ ¼ E½ēTw ðnÞKy ēw ðnÞ ¼ lðiÞ ðiÞ
y E½j¯
2
w ðnÞj
first three terms are the errors caused by the estimation i¼1
errors of the individual unknown coefficients, and the last !2n
X
Lw
mw ðiÞ
one is due to the imperfect shortening. Note that, during lðiÞ 1 ð¯ðiÞ 2
¼ y l w ð0ÞÞ . (55)
the update of each filter’s coefficients, the form of (11) i¼1
s2y y
makes it difficult to analyze the global convergence.
Similarly, the MSEs of the estimates of h and h are
However, the decoupling of the residual error allows us
obtained, respectively,
to approximate the algorithm’s performance, because
each decoupled error term is generated by the estimation Lh
X 2n
mh ðiÞ
error of only one filter. E½e2h ðnÞ ¼ lðiÞ
x 1 l ð¯ðiÞ ð0ÞÞ2 , (56)
s2x x h
In the following, we discuss the convergence charac- i¼1
X
K 2n
teristic of the algorithm. Based on (47), the MSE can be my ðiÞ 2
E½e2y ðnÞ ¼ lðiÞ
u 1 l ð¯ðiÞ
y ð0ÞÞ , (57)
expressed as
i¼1
s2u u
JðnÞ ¼ E½e2 ðnÞ ðiÞ ðiÞ
where lx and lu are the ith eigenvalues of the auto-
E½e2w ðnÞ þ E½e2h ðnÞ 2
þ E½ey ðnÞ þ E½e2res ðnÞ correlation matrices Rx ¼ E½xðnÞxT ðnÞ and Ru ¼ E½uðnÞuT ðnÞ,
þ 2E½ew ðnÞeh ðnÞ þ 2E½ew ðnÞey ðnÞ respectively. s2x , s2u , ¯ ðiÞ
h
, and ¯ ðiÞ
y are defined in a similar
þ 2E½eh ðnÞey ðnÞ þ 2E½ew ðnÞeres ðnÞ. (48) way as s2y and ¯ ðiÞ
w .
For the cross terms in (48), we assume that different
Consider only w as an unknown parameter, which is coefficients are independent on each other. Following the
updated by (17). Combining (43) and (17), we obtain similar procedure, we obtain
ew ðn þ 1Þ ¼ w wðn
^ þ 1Þ Lh
Lw X
X
" #
mw E½ew ðnÞeh ðnÞ ¼ Ryx ði; jÞ¯ðiÞ hðjÞ ð0Þ
w ð0Þ¯
T
¼ ILw yðnÞy ðnÞ ew ðnÞ, (49) i¼1 j¼1
kyðnÞk22 !n n
m ðiÞ mh ðjÞ
where eðnÞ ¼ ew ðnÞ ¼ e T
According to the direct- 1 w l 1 l , (58)
w ðnÞyðnÞ. s2y y s2x x
averaging method [20, p. 259], the solution of the
difference equation (49), operating under the assumption
of a small step-size, is close to the solution of the
Lw X
X K
E½ew ðnÞey ðnÞ ¼ Ryu ði; jÞ¯ðiÞ ðjÞ
w ð0Þ¯ y ð0Þ
following stochastic difference equation: i¼1 j¼1
" # !n n
mw m ðiÞ my ðjÞ
ew ðn þ 1Þ ¼ E ILw T
yðnÞy ðnÞ ew ðnÞ. (50) 1 w l 1 l , (59)
kyðnÞk2 2 s2y y s2u u
Assuming that the size of yðnÞ is long enough, we obtain
Lh X
X K
" #
yðnÞyT ðnÞ E½yðnÞyT ðnÞ E½eh ðnÞey ðnÞ ¼ Rxu ði; jÞ¯ðiÞ
h
ð0Þ¯ðjÞ
y ð0Þ
E . i¼1 j¼1
kyðnÞk22 Lw E½y2 ðmÞ
m ðiÞ n m ðjÞ n
1 2h lx 1 2y lu , (60)
Denote s2y ¼ Lw E½y2 ðmÞ and Ry ¼ E½yðnÞyT ðnÞ. By applying sx su
the eigenvalue decomposition on Ry , we obtain
!n
Ry ¼ Q y Ky Q Ty , (51) X
Lw
m ðiÞ
E½ew ðnÞeres ðnÞ ¼ r yx ðiÞ¯ ðiÞ
w ð0Þ 1 w l , (61)
i¼1
s2y y
where Q y is a unitary matrix and Ky is a diagonal matrix
ðiÞ
consisting of the eigenvalues ly ; i ¼ 1; 2; . . . ; Lw . Define where Ryx ði; jÞ, Ryu ði; jÞ and Rxu ði; jÞ are the entries at the ith
T
ēw ðnÞ ¼ Q y ew ðnÞ. We rewrite (50) as column and the jth row of the matrices Ryx ¼ Q Tw
! E½yðnÞxT ðnÞQ h , Ryu ¼ Q Tw E½yðnÞuT ðnÞQ y , and Rxu ¼ Q Th
mw
ēw ðn þ 1Þ ¼ I K ē ðnÞ. (52) E½xðnÞuT ðnÞQ y , respectively, and r yx ðiÞ is the ith entry of
s2y y w the vector ryx ¼ Q Tw E½yðnÞxTD ðnÞgD. Therefore, the MSE in
ARTICLE IN PRESS
K. Shi et al. / Signal Processing 89 (2009) 121–132 127
(48) can be written in Eq. (62) (cf. (55)–(61)). several hundreds or even close to a thousand. With a CSE,
!2n the AEC filter hðnÞ with a much smaller length Lh than Lo
X
Lw
mw ðiÞ can be used to estimate the echo signal. The dominant
2
JðnÞ lmin þ lðiÞ
y 1 l ð¯ðiÞ
w ð0ÞÞ
i¼1
s2y y factor of computational complexity for AEC without a
Lh
X 2n shortening filter resides in the terms Lo and KLo . When Lh
mh ðiÞ
þ lðiÞ
x 1 l ð¯ðiÞ ð0ÞÞ2 and Lw are much smaller than Lo , the computational
i¼1
s2x x h
complexity of the proposed methods is reduced consider-
X
K 2n
my ðiÞ 2
ably relative to the existing ones.
þ lðiÞ
u 1 l ð¯ðiÞ
y ð0ÞÞ
i¼1
s2u u
!n n 4.4. Implementation issues
X Lh
Lw X
mw ðiÞ mh ðjÞ
R1 ði; jÞ¯ðiÞ ðjÞ
w ð0Þ¯ ð0Þ 1 l 1 l
i¼1 j¼1
h s2y y s2x x
!n 4.4.1. Double-talk issue
X
Lw X
K n
m ðiÞ m ðjÞ The purpose of the echo canceller is to identify the
Ryu ði; jÞ¯ ðiÞ
ðjÞ
w ð0Þ¯ y ð0Þ 1 w l 1 2y lu
i¼1 j¼1
s2y y su acoustic echo path and subtract an estimated echo signal,
Lh X thereby achieving cancellation. However, when the speech
X K
mh ðiÞ n m ðjÞ n of the two talkers (one at each end) arrives simulta-
þ Rxu ði; jÞ¯ðiÞ
h
ð0Þ¯ðjÞ
y ð0Þ 1 2 lx 1 2y lu
i¼1 j¼1
sx su neously at the canceller, there is a double-talk situation.
!n
X
Lw
mw ðiÞ These occurrences may cause the problem for identifica-
r yx ðiÞ¯ðiÞ
w ð0Þ 1 l . (62) tion of the echo path. The disturbed near-end speech may
i¼1
s2y y
cause the adaptive filter to diverge, or allow audibly
According to (62) we know that the convergence rate annoying echo to pass through to the other side [1]. The
ðiÞ ðiÞ
depends on the eigenvalues ly ; i ¼ 1; 2; . . . ; Lw , lx ; i ¼ usual way to alleviate this problem is to slow down or
ðiÞ
1; 2; . . . ; Lh , and lu ; i ¼ 1; 2; . . . ; K, and the smallest eigen- completely halt the filter adaptation when a near-end
value dominates the convergence rate. The step sizes speech is detected. This means that a double-talk detector
should be small enough to guarantee the convergence, (DTD) is needed. Our proposed AEC can work in the same
and the selection of step size depends on the statistical fashion as the traditional ones, which operate associa-
properties of signals. Different from other AECs, we notice tively with a DTD. The adaptation procedure of all the
that as time goes on, there is a residual echo power lmin filter blocks in Fig. 2 has to be inhibited during double-
which is due to the imperfect shortening filter. talk periods.
ERLE (dB)
where tanh denotes the hyperbolic tangent function. This 15
is equivalent to the sigmoid function utilized in [13]. The
room impulse response is generated by an FIR filter whose
coefficients were obtained via 10
bðnÞean ; 4pnpLo ;
hr ðnÞ ¼ (64) NLMS−CSE with Lh = 100
0 otherwise
5 NLMS with Lh = 300
with bðnÞ being standard normal distributed; Lo ¼ 300 and
a ¼ 0:02. NLMS with Lh = 100
In [11], it has been shown that polynomial model is 0 NLMS linear with Lh = 300
appropriate for the nonlinear effect in real situations.
Thus, in the simulations, the auxiliary nonlinear block uðÞ 5000 10000 15000
in Fig. 2 adopts polynomial basis functions Number of samples
bk ðsÞ ¼ sk ; k ¼ 1; 2; . . . ; K, (65) Fig. 4. NLMS-based algorithms with/without CSE.
where the maximum nonlinearity order K ¼ 7. Note here
we assume a mismatch between the PA nonlinearity and
its corresponding model. We set the filter length Lh ¼ 100
and Lw ¼ 300, respectively. For comparison purpose, we 0.2
also implement algorithms without the shortening filter,
0.1
whereby the linear block is assumed to have length
hr (n)
0.1
2
E½d ðnÞ 0
ERLE (dB) ¼ 10 log10 , (66)
E½e2 ðnÞ −0.1
where dðnÞ and eðnÞ represent the ‘‘filtered’’ microphone −0.2
received signal and residual echo signal, respectively. It is
pointed out that instead of using microphone signal yðnÞ, 0 100 200 300 400 500
we use its ‘‘filtered’’ version dðnÞ in (66). Because the n
power of signal eðnÞ is different with and without the Fig. 5. Impulse response: (a) The original room impulse response hr ðnÞ
shortening filter. The effect of shortening filter to micro- has length Lo ¼ 300. (b) In theory, hr ðnÞ wðnÞ would have length
phone signal yðnÞ and residual error signal eðnÞ is that the Lo þ Lw 1 ¼ 599; the actual effective duration of hr ðnÞ wðnÞ is
gain kwðnÞk2 is applied to both of them. Lh ¼ 100, illustrating the effect of the channel shortening filter.
ERLE (dB)
response. We see that the method is quite successful in
reducing the effective impulse response length. A more
10
complete elimination of the tail of hr ðnÞ wðnÞ can be
achieved with a longer AEC filter, but even with the given
hðnÞ of length Lh ¼ 100, a fairly high ERLE is achieved 5
(see Fig. 4).
For RLS- and MSC-based methods, Figs. 6 and 7 show MSC−CSE with Lh = 100
three cases with/without shortening filter. In these 0
MSC with Lh = 300
figures, we observe that the algorithms with a shortening
filter and a shorter AEC filter converge faster due to the MSC with Lh = 100
−5
small number of filter taps. However, without shortening
filter, the performance of the algorithms with a short AEC 5000 10000 15000
filter degrades about 15 dB. For the three different Number of samples
methods (NLMS, RLS and MSC) to update nonlinear
Fig. 7. MSC-based algorithms with/without CSE.
coefficients with a shortening filter, they have very similar
convergence rate. The RLS-CSE and MSC-CSE algorithms
achieve better ERLE compared to NLMS-CSE algorithm.
To quantitatively evaluate the system identification
Table 3
performance, misalignment is taken as a figure of merit.
Dh (dB) and Dy (dB) of different methods
For the linear part misalignment, we use the distance
measure defined as NLMS-CSE RLS-CSE MSC-CSE
^ 2
k^g hk Dh (dB) 42.1 42.8 43.4
2
Dh ðdBÞ ¼ 10 log10 . (67) Dy (dB) 44.9 45.5 45.8
k^gk22
30 kh h^ k22
Dy ðdBÞ ¼ 10 log10 . (68)
khk22
25
For the proposed methods, Dh and Dy are calculated when
the iterative process is terminated and at SNR ¼ 30 dB
20 (see Table 3). The results show that the estimates of the
nonlinear coefficients converge to the true values since
the misalignment Dy is very small. The results also
ERLE (dB)
Table 4
ERLE (dB) for different Lo =Lh 2
s (n)
Criterion n Lo =Lh 1 2 3 4 5 6 0
−2
NLMS-CSE 25.4 25.2 25.2 22.1 21.4 20.6
0 1 2 3 4 5 6 7
RLS-CSE 25.8 25.5 25.3 22.3 21.4 20.8
MSC-CSE 25.6 25.5 25.4 22.7 21.6 20.8
n x 104
v (n)
0
Table 5
ERLE (dB) for different Lo =Lw −2
0 1 2 3 4 5 6 7
Criterion n Lo =Lw 0.3 0.5 1 2 3 4 5 n x 104
e (n)
0
−2
0 1 2 3 4 5 6 7
30 n x 104
Fig. 9. Different signals for nonlinear acoustic echo cancellation: (a) far-
end speech sðnÞ, (b) near-end speech vðnÞ, (c) Nonlinear AEC output eðnÞ.
25
20 Table 6
Computational complexity comparison with specific parameters
30
25 25
20
20
15
ERLE (dB)
ERLE (dB)
15
10
10 5
0
5
RLS−CSE−RES with Lh = 300 RLS
−5
RLS with Lh = 1024 MSC
0 RLS−CSE
RLS−RES with Lh = 300 −10 MSC−CSE
Fig. 10. AEC algorithms with/without CSE using long room impulse Fig. 12. AEC algorithms with/without CSE for real speech data.
response.
method performs well in the very long impulse response
environment and converges much faster than the one
without CSE but using a long linear filter. Note that the
complexity of our proposed method is much lower than
the one with long linear filter. Moreover, we evaluate the
25
performance of nonlinear AEC in the echo path change
scenario, which is a typical case for acoustic echo
20 cancellation application due to the time-varying acoustic
environment. The echo path change is simulated by
toggling between different room impulse responses. We
ERLE (dB)
15
consider two echo path change cases: echo path changes
before and after the convergence of nonlinear AEC. The
10 resulting ERLE is shown in Fig. 11. It can be observed that
for both types of path changes, the nonlinear AEC can
echo path change retrack the true echo path and the reconvergence rate is
5 after convergence not affected by the path changes. It is also illustrated that
introducing of CSE does not have reconvergence issue
0 echo path change during the echo path changes, because all of the filters in
before convergence the system are updated simultaneously. The performance
of the proposed methods is also justified using a real
0.5 1 1.5 2 2.5 3 speech signal, which is illustrated in Fig. 12. It is shown
that our proposed methods outperform the existing ones.
Number of samples x 104
References [15] G.-Y. Jiang, S.-F. Hsieh, Nonlinear acoustic echo cancellation using
orthogonal polynomial, in: Proceedings of the IEEE ICASSP,
Toulouse, France, May 2006, pp. 273–276.
[1] E. Hänsler, G. Schmidt, Acoustic Echo and Noise Control: A Practical [16] S. Qureshi, E. Newhall, An adaptive receiver for data transmission
Approach, Wiley, NJ, 2004. over time-dispersive channels, IEEE Trans. Inform. Theory 19 (July
[2] T.-K. Woo, Fast hierarchical least mean square algorithm, IEEE 1973) 448–457.
Signal Process. Lett. 8 (November 2001) 289–291. [17] W. Lee, F. Hill, A maximum-likelihood sequence estimator with
[3] Y. Gu, K. Tang, H. Cui, W. Du, Convergence analysis of a deficient-
decision-feedback equalization, IEEE Trans. Comm. 25 (September
length LMS filter and optimal-length sequence to model exponen-
1977) 971–979.
tial decay impulse response, IEEE Signal Process. Lett. 10 (January
[18] N. Al-Dhahir, J.M. Cioffi, Efficiently computed reduced-parameter
2003) 4–7.
input-aided MMSE equalizers for ML detection: a unified approach,
[4] A.N. Birkett, R.A. Goubran, Limitations of handsfree acoustic echo
IEEE Trans. Inform. Theory 42 (May 1996) 903–915.
cancellers due to nonlinear loudspeaker distortion and enclosure
[19] P.J.W. Melsa, R.C. Younce, C.E. Rohrs, Impulse response shortening
vibration effects, in: Proceedings of the IEEE Workshop on
for discrete multitone transceivers, IEEE Trans. Comm. 44 (Decem-
Applications of Signal Processing to Audio and Acoustics,
ber 1996) 1662–1672.
New Paltz, New York, October 1995, pp. 103–106.
[20] S. Haykin, Adaptive Filter Theory, fourth ed., Prentice-Hall, Engle-
[5] F.X.Y. Gao, W.M. Snelgrove, Adaptive linearization of a loudspeaker,
wood Cliffs, NJ, 2002.
in: Proceedings of the IEEE ICASSP, Toronto, Canada, May 1991,
[21] N.J. Bershad, S. Bouchired, F. Castanie, Stochastic analysis of
pp. 3589–3592.
adaptive gradient identification of Wiener-Hammerstein systems
[6] A. Stenger, L. Trautmann, R. Rabenstein, Nonlinear acoustic echo
for Gaussian inputs, IEEE Trans. Signal Process. 48 (February 2000)
cancellation with 2nd order adaptive Volterra filters, in: Proceed-
ings of the IEEE ICASSP, vol. 2, Phoenix, AZ, March 1999, 557–560.
pp. 877–880. [22] K. Shi, X. Ma, G. T. Zhou, Adaptive nonlinearity identification in a
[7] F. Küch, W. Kellermann, Partitioned block frequency-domain Hammerstein system using a pseudo coherence function, in: IEEE
adaptive second-order Volterra filter, IEEE Trans. Signal Process. Statistical Signal Processing Workshop, Madison, WI, August 2007,
53 (February 2004) 564–575. pp. 26–29.
[8] A. Guérin, G. Faucon, R. Le Bouquin-Jeannes, Nonlinear acoustic [23] N. Bershad, P. Celka, J.-M. Vesin, Stochastic analysis of gradient
echo cancellation based on Volterra filters, IEEE Trans. Speech Audio adaptive identification of nonlinear systems with memory for
Process. 11 (November 2003) 672–683. Gaussian data and noisy input and output measurements, IEEE
[9] A.N. Birkett, R.A. Goubran, Nonlinear loudspeaker compensation Trans. Signal Process. 47 (March 1999) 675–689.
for hands free acoustic echo cancellation, IEE Electron. Lett. 32 [24] A.E. Nordsjö, L.H. Zetterberg, Identification of certain time-varying
(June 1996) 1063–1064. nonlinear Wiener and Hammerstein systems, IEEE Trans. Signal
[10] G. Sentoni, A. Altenberg, Nonlinear acoustic echo canceller with Process. 49 (March 2001) 577–592.
DABNET þ FIR structure, in: Proceedings of the IEEE Workshop on [25] J. Jeraj, V.J. Mathews, A stable adaptive Hammerstein filter
Applications of Signal Processing to Audio and Acoustics, New Paltz, employing partial orthogonalization of the input signals, IEEE
NY, October 2005, pp. 37–40. Trans. Signal Process. 54 (April 2006) 1412–1420.
[11] A. Stenger, W. Kellermann, Adaptation of a memoryless preproces- [26] J. Jeraj, V.J. Mathews, Stochastic mean-square performance analysis
sor for nonlinear acoustic echo cancelling, Signal Processing 80 of an adaptive Hammerstein filter, IEEE Trans. Signal Process. 54
(September 2000) 1747–1760. (June 2006) 2168–2177.
[12] B.S. Nollett, D.L. Jones, Nonlinear echo cancellation for hands-free [27] International Organization for Standardization (ISO), ISO Norm
speakerphones, in: Proceedings of the IEEE Workshop on Nonlinear 3382: Acoustics—measurement of the reverberation time of rooms
Signal and Image Processing, Mackinac Island, MI, September 1997. with reference to other acoustical parameters, 1997.
[13] J.-P. Costa, A. Lagrange, A. Arliaud, Acoustic echo cancellation using [28] S. Gustafsson, R. Martin, P. Vary, Combined acoustic echo control
nonlinear cascade filters, in: Proceedings of the IEEE ICASSP, vol. 5, and noise reduction for hands-free telephony, Signal Processing 64
Hong Kong, China, April 2003, pp. 389–392. (January 1998) 21–32.
[14] H. Dai, W.-P. Zhu, Compensation of loudspeaker nonlinearity in [29] J.B. Allen, D.A. Berkley, Image method for efficiently simulating
acoustic echo cancellation using raised-cosine function, IEEE Trans. small-room acoustics, J. Acoustic Soc. Amer. 65 (1979)
Circuits Systems 53 (November 2006) 1190–1194. 943–950.