ARTICLE IN PRESS
Signal Processing 89 (2009) 1762–1773
Contents lists available at ScienceDirect
Signal Processing
journal homepage: www.elsevier.com/locate/sigpro
An algorithm for mixing matrix estimation in instantaneous blind
source separation
V.G. Reju , Soo Ngee Koh, Ing Yann Soon
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798, Singapore
a r t i c l e i n f o
abstract
Article history:
Received 19 July 2008
Received in revised form
7 March 2009
Accepted 15 March 2009
Available online 26 March 2009
Sparsity of signals in the frequency domain is widely used for blind source separation
(BSS) when the number of sources is more than the number of mixtures (underdetermined BSS). In this paper we propose a simple algorithm for detection of points in
the time–frequency (TF) plane of the instantaneous mixtures where only single source
contributions occur. Samples at these points in the TF plane can be used for the mixing
matrix estimation. The proposed algorithm identifies the single-source-points (SSPs) by
comparing the absolute directions of the real and imaginary parts of the Fourier
transform coefficient vectors of the mixed signals. Finally, the SSPs so obtained are
clustered using the hierarchical clustering algorithm for the estimation of the mixing
matrix. The proposed idea for the SSP identification is simpler than the previously
reported algorithms.
& 2009 Published by Elsevier B.V.
Keywords:
Blind source separation
Sparse component analysis
Underdetermined blind source separation
Mixing matrix estimation
1. Introduction
Blind source separation (BSS) is the process of
separating the original signals from their mixtures without any knowledge about the mixing process or the
signals. Many algorithms have been proposed for both
instantaneous and convolutive BSS [1,2]. In the case where
the number of sources is less than or equal to the number
of mixtures, independent component analysis (ICA)
method is the most popularly used. However, in practical
situations the number of sources may be more than the
number of mixed signals, and cases like this are called
underdetermined BSS. In this paper we address the
problem of estimation of mixing matrix for the separation
of sources from their instantaneous underdetermined
mixtures. The instantaneous mixing process can be
Corresponding author.
E-mail addresses: rejuvg@yahoo.com.sg, reju@ntu.edu.sg (V.G. Reju),
esnkoh@ntu.edu.sg (S.N. Koh), eiysoon@ntu.edu.sg (I.Y. Soon).
URLS: http://www.ntu.edu.sg/eee/home/esnkoh (S.N. Koh),
http://www.ntu.edu.sg/home/eiysoon (I.Y. Soon).
0165-1684/$ - see front matter & 2009 Published by Elsevier B.V.
doi:10.1016/j.sigpro.2009.03.017
mathematically expressed as
(1)
xðtÞ ¼ AsðtÞ
T
where xðtÞ ¼ ½x1 ðtÞ; . . . ; xP ðtÞ are the P mixed signals, A is
the mixing matrix of order P Q with apq as its ðp; qÞth
element, sðtÞ ¼ ½s1 ðtÞ; . . . ; sQ ðtÞT are the Q sources, t is the
time instant and T is the transpose operator.
Generally the algorithms for underdetermined case are
suitable for determined and overdetermined cases also.
This is also true for the proposed algorithm in this paper.
When mixing is underdetermined, the performances of
the ICA methods are poor and instead sparse component
analysis (SCA) is used. In SCA, sparsity of the signals is
utilized to separate the signals from their mixed signals.
A signal is said to be sparse in the temporal domain, if the
signal amplitude is zero during most of the time period.
However, in practice the natural signals like speech are
not very sparse in the time domain. In [3], Bofill et al.
show that signals like speech are more sparse in the
frequency domain than in the time domain and hence if
we transform the time domain signal into the frequency
domain, the sparsity can be utilized to separate the signals
from their mixtures.
ARTICLE IN PRESS
V.G. Reju et al. / Signal Processing 89 (2009) 1762–1773
The idea of BSS based on TF representation was first
reported by Belouchrani and Amin [4]. The algorithm is for
the separation of nonstationary sources in the overdetermined case (number of observations 4 number of
sources) based on joint-diagonalization of a set of spatial
time frequency distributions (STFDs) of the whitened
observations at selected TF locations. The algorithm is
further extended in [5] to make it suitable for underdetermined cases also, under the assumption that
the sources are W-disjoint orthogonal in the TF domain.
The idea has been further extended in [6,7]. In [8], the
algorithm proposed in [4] is extended for the case of
stochastic sources and a criterion is proposed for the
selection of the points in the TF plane where the spatial
matrices should be jointly diagonalized.
By utilizing sparsity in the TF domain, many algorithms
have been proposed for blind source separation of underdetermined mixtures [2,3,5–20]. In [19] the fact that, at
single-source-points (SSPs), the direction of the modulus of
the mixture vectors in the TF domain will be the same as
those of the column vectors of the mixing matrix is utilized
to develop an algorithm called searching-and-averagingbased method, which relaxes the degree of sparsity needed.
Searching-and-averaging-based algorithm for time domain
signals is also proposed by the same authors in [20], where
for the estimation of the mixing matrix, the algorithm
removes the samples which are not in the same or inverse
direction of the columns of the mixing matrix.
In [14], it is assumed that the signals are W-disjoint
orthogonal in the TF plane, i.e., only one source will occur
in each TF window, which is quite restrictive. Later, it is
shown that approximate W-disjoint orthogonality is
sufficient to separate most speech signals and an
algorithm called Degenerate Unmixing Estimation Technique
(DUET) is proposed in [15]. Aı̈ssa-El-Bey et al. [6] relax the
disjoint orthogonality constraint but assume that at any
time the number of active sources in the TF plane is
strictly less than the number of mixtures. Hence, when
the number of sensors is two, the condition in [6] is
exactly the same as W-disjoint orthogonality condition.
The algorithm proposed in [11] also assumes that the
maximum number of active sources at any instant is less
than the number of mixtures. In [16,13], these constrains
are again relaxed with the only requirement that each
source occurs alone in a tiny set of adjacent TF windows
while several sources may co-exist everywhere else in the
TF plane. This method can therefore be used even when
the sources overlap in most of the areas in the TF plane.
The algorithm proposed in [16] is based on the complex
ratio of the mixtures in the TF domain and it is called the
TIme Frequency Ratio Of Mixtures (TIFROM) method. In the
TF domain if only one source occurs in several adjacent
windows, then the complex ratio of the mixtures in those
windows will remain constant and it will take different
values only if more than one source occur. Hence
identifying the area where this ratio remains constant is
equivalent to identifying the SSPs. The constant complex
ratios of the mixtures at the SSPs are called canceling
coefficients and these canceling coefficients can be used
for the estimation of the sources from their mixtures. The
TIFROM algorithm is further improved in [21].
1763
One of the problems with the TIFROM method is its
performance degradation because of the inconsistent
estimation of the mixing system. This inconsistency is
due to the fact that the TIFROM algorithm uses a series of
minimum variances of the ratios of the mixed signals in
the TF domain taken over the selected windows for the
estimation of the column vectors of the mixing matrix.
The absolute values of these variances monotonically
increase with the increase in the mean of the corresponding ratios or the corresponding columns of the mixing
matrix. Since the TIFROM algorithm looks for the mean
corresponding to the minimum variance, in cases where
the column matrix and hence the ratios and the
corresponding variances are high, the algorithm will end
up with a wrong result as it will take the mean of the
ratios corresponding to a smaller variance as the column
of the mixing matrix. This problem is solved in [17] by
normalizing the variances. Even though the normalization
of the variances created uniformity, if the TF windows
used for estimating one of the column of the mixing
matrix is sparser than the TF windows used for estimating
another column of the mixing matrix, the variance
corresponding to first case will be less than that of the
second case [17]. This difference in variance may lead to
mixing matrix estimation error, and to solve this problem
an algorithm based on k-means clustering is proposed
in [17].
The restriction of the TIFROM algorithm, i.e., the
requirement of single-source-zone, is further relaxed in
[18] where it requires only two adjacent points in the
same frequency bin with single source contributions for
the estimation of the SSPs. In [18], the fact that at SSPs the
mixture vectors in the TF domain will be proportional in
magnitude to one of the columns of the mixing matrix is
used, i.e., jXðt; kÞj ’ aj jSj ðt; kÞj. Hence if we plot the scatter
diagram using the magnitude of the observed data in the
TF domain, it will have a clear orientation towards the
directions of the column vectors of the mixing matrix if
the sources are sufficiently sparse. In situation where the
sources are not sufficiently sparse, the orientation of the
scatter diagram will not be very clear. Under such
situations, the estimation of the directions of the columns
of the mixing vectors will be difficult. Now, at points ðt; kÞ
and ðt þ 1; kÞ in the TF plane of the mixtures, if more than
one source component are present, the direction of the
mixture vectors Xðt; kÞ and Xðt þ 1; kÞ will be the same
only if the amplitudes of all the sources remain the same
at both the points ðt; kÞ and ðt þ 1; kÞ, i.e., at two
consecutive time frames. Since this condition is very
unlikely to happen, the mixture vectors Xðt; kÞ and
Xðt þ 1; kÞ, 8t, which keeps the directions the same can
be considered as SSPs. Utilizing this fact, in [18] the points
which satisfy the above condition, i.e.,
ffjXðt; kÞj ffjXðt þ 1; kÞjoyth
(2)
where ffz represents the angle of the vector z and yth is
the threshold angle, are selected as the SSPs. Then
histogram method is applied to the estimated SSPs for
the estimation of the mixing matrix.
In [12], a two stage approach consisting of mixing
matrix estimation followed by source estimation is
ARTICLE IN PRESS
1764
V.G. Reju et al. / Signal Processing 89 (2009) 1762–1773
proposed. The restriction of the necessary condition for
TIFROM algorithm, i.e., the requirement of some adjacent
TF regions where only one of the sources occurs, is relaxed
in [12]. The mixing matrix estimation procedure proposed
in [12] is an extension of the DUET [15] and TIFROM [16]
methods; it is based on the ratios of the TF transforms of
the mixtures. From the transformation ratio matrix so
obtained, several submatrices each of which has identical
columns are detected and these columns will correspond
to the points where only one of the sources occurs in the
transformation ratio matrix.
It can be seen that the main objective in all these
algorithms is the detection of the points in the TF domain
where only one source occurs at a time. In this paper we
propose a simple algorithm to identify these points and
use them for the estimation of the mixing matrix using
the hierarchical clustering algorithm which is well known
because of its versatility [22]. The proposed algorithm can
be used for the mixtures where the sources are overlapped
in the TF plane, except for some points. Unlike in [16,18],
these SSPs need not to be adjacent points in the TF domain
and the proposed algorithm is simpler than that in [12],
which requires many tuning parameters and a long
procedure. Since the algorithms proposed in [16,12] can
be directly used for source estimation, either from the
identified SSPs [16] or the estimated mixing matrix
[16,12], we are not repeating the same and our focus in
this paper is on SSP identification and the mixing matrix
estimation only.
The paper is structured as follows. The proposed
algorithm is derived in Section 2; in Section 3, some
experimental results are given and finally conclusions are
drawn in Section 4.
2. Proposed method
2.1. Single-source-point identification
The instantaneous mixing model in (1) can be
expressed in the TF domain using short time Fourier
transform (STFT) as
Xðt; kÞ ¼ ASðt; kÞ
¼
Q
X
aq Sq ðt; kÞ
(3)
q¼1
where
Xðt; kÞ ¼ ½X 1 ðt; kÞ; . . . ; X P ðt; kÞT
and
Sðt; kÞ ¼
½S1 ðt; kÞ; . . . ; SQ ðt; kÞT are, respectively, the STFT coefficients
of the mixtures and sources in the kth frequency bin at
time frame t and aq ¼ ½a1q ; . . . ; aPq T is the qth column of
the mixing matrix A. For ease of explanation, assume that
there are only two sources, i.e., Q ¼ 2, and number of
mixtures is P. Now at any point in the TF plane, say ðt 1 ; k1 Þ,
if the source component from only one of the sources, say
that of s1 , is present, i.e., S1 ðt 1 ; k1 Þa0 and S2 ðt 1 ; k1 Þ ¼ 0.
Then, Eq. (3) can be written as
Xðt 1 ; k1 Þ ¼ a1 S1 ðt 1 ; k1 Þ
RfXðt 2 ; k2 Þg ¼ a2 RfS2 ðt 2 ; k2 Þg
(7)
IfXðt2 ; k2 Þg ¼ a2 IfS2 ðt 2 ; k2 Þg
(8)
Hence at ðt 2 ; k2 Þ the absolute directions of RfXðt 2 ; k2 Þg and
IfXðt2 ; k2 Þg are the same which are same as that of a2 . Now
consider another point ðt 3 ; k3 Þ where the contributions
from both the sources are present. Then at ðt 3 ; k3 Þ, the
directions of RfXðt 3 ; k3 Þg and IfXðt 3 ; k3 Þg will be
RfXðt 3 ; k3 Þg ¼ a1 RfS1 ðt 3 ; k3 Þg þ a2 RfS2 ðt 3 ; k3 Þg
IfXðt 3 ; k3 Þg ¼ a1 IfS1 ðt 3 ; k3 Þg þ a2 IfS2 ðt 3 ; k3 Þg
(9)
(10)
From (9) and (10) it can be seen that the absolute direction
of R Xðt 3 ; k3 Þ will be the same as that of I Xðt 3 ; k3 Þ only
if
RfS1 ðt 3 ; k3 Þg RfS2 ðt 3 ; k3 Þg
¼
IfS1 ðt 3 ; k3 Þg
IfS2 ðt 3 ; k3 Þg
(11)
However, in practice, the probability to satisfy the above
condition is very low. This fact is experimentally shown in
Fig. 1, where the mean of the percentage of the points in
the TF plane which are below the absolute value of
difference between the ratios, i.e., jðRfS1 ðt; kÞg=IfS1 ðt; kÞgÞ
ðRfS2 ðt; kÞg=IfS2 ðt; kÞgÞj, calculated for 15 pairs of randomly
selected speech utterances of length 10 s each is shown.
For example, from Fig. 1, there is only 0.3% of the total
multi-source-points (MSPs) (i.e., the point in the TF plane
of the mixture where more than one source occur) in the
TF plane with difference between the ratios of less than
0:01, i.e., jðRfS1 ðt; kÞg=IfS1 ðt; kÞgÞ ðRfS2 ðt; kÞg=IfS2 ðt; kÞgÞj
o0:01. It can also be seen from Fig. 1 that the probability
to satisfy the condition in (11) is almost zero. Hence we
can say that, in practice, a point, ðt; kÞ, in the TF plane of
the mixture will be a SSP if the absolute direction of
RfXðt; kÞg is the same as that of IfXðt; kÞg; otherwise, it will
be a MSP.
For a general case of P mixtures and Q sources, at a MSP
ðt; kÞ, the real and imaginary parts of Xðt; kÞ can be written
as
RfXðt; kÞg ¼
Q
X
aq RfSq ðt; kÞg
(12)
aq IfSq ðt; kÞg
(13)
q¼1
IfXðt; kÞg ¼
(5)
(6)
where Rfxg and Ifxg, respectively, represent the real and
imaginary part of x. From (5) and (6) it can be seen that
the absolute directions of RfXðt 1 ; k1 Þg and IfXðt1 ; k1 Þg are
the same, which is same as that of a1 . Similarly at another
point, say ðt 2 ; k2 Þ, if only the contribution from source s2 is
present, i.e., S1 ðt 2 ; k2 Þ ¼ 0 and S2 ðt 2 ; k2 Þa0, then from (3)
we can write
(4)
Equating real and imaginary parts of (4), we will get
RfXðt 1 ; k1 Þg ¼ a1 RfS1 ðt 1 ; k1 Þg
IfXðt1 ; k1 Þg ¼ a1 IfS1 ðt 1 ; k1 Þg
Q
X
q¼1
ARTICLE IN PRESS
1765
V.G. Reju et al. / Signal Processing 89 (2009) 1762–1773
102
Percentage of the total MSPs which are below the
magnitude of the difference between the ratios
101
100
10−1
10−2
10−3
10−4
10−5
10−6
10−8
10−6
10−4
10−2
100
102
Magnitude of the difference between the ratios
104
106
108
R {S2 (t,k)}
R {S1 (t,k)}
−
I {S1 (t,k)}
I {S2 (t,k)}
Fig. 1. Percentage of samples which are below the magnitude of the difference between the ratios of the real and imaginary parts of the DFT coefficient of
the signals.
Table 1
Algorithm for the detection of the single-source-points.
Now, the angle between (12) and (13) is given by
0
RfXðt; kÞgT IfXðt; kÞg
1
C
y ¼ cos1 B
@qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiqffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiA
¼
0
RfXðt; kÞgT RfXðt; kÞg
IfXðt; kÞgT IfXðt; kÞg
1
PQ
PQ
p¼1 ðð
q¼1 apq RfSq ðt; kÞgÞð
q¼1 apq IfSq ðt; kÞgÞÞ
C
1 B
cos @qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
A (14)
PQ
PQ
PP
2 PP
2
p¼1 ð
p¼1 ð
q¼1 apq RfSq ðt; kÞgÞ
q¼1 apq IfSq ðt; kÞgÞ
PP
In the above equation, y will become 0 or 180 if
RfSq ðt; kÞg
RfS1 ðt; kÞg
RfSQ ðt; kÞg
¼ ¼
¼ ¼
IfSq ðt; kÞg
IfS1 ðt; kÞg
IfSQ ðt; kÞg
Step
1:
Step
2:
Step
3:
Step
4:
Convert x in the time domain to the TF domain to get X.
Check the condition in (16).
If the condition in (16) is satisfied, then Xðt; kÞ is a sample at the
SSP, which we keep for mixing matrix estimation; otherwise,
discard the point.
Repeat Steps 2–3 for all the points in the TF plane or until
sufficient number of SSPs are obtained.
(15)
Hence, for the absolute directions of RfXðt; kÞg and IfXðt; kÞg
to be the same, at any point ðt; kÞ in the TF plane, either the
point must be a SSP or the ratios between the real and
imaginary parts of the Fourier transform coefficients of all
the signals at that points must be the same. However, as
shown previously, the probability for the second case is
extremely low and this probability will decrease as the
number of sources increases. Hence we can conclude that
SSPs in the TF plane are the points where the absolute
direction of RfXðt; kÞg is the same as that of IfXðt; kÞg.
The probability of getting SSPs where the amplitudes
of all the source contributions except one are exactly equal
to zero is very low in a practical situation. Hence, we relax
the condition for SSP as the point in the TF plane where
the component of one of the sources is significantly higher
than that of the remaining sources. As a result, the point in
the TF plane where the difference between the absolute
directions of RfXðt; kÞg and IfXðt; kÞg is less than Dy is taken
as SSP, i.e., SSPs are the points in the TF plane where the
following condition is satisfied:
RfXðt; kÞgT IfXðt; kÞg
kRfXðt; kÞgkkIfXðt; kÞgk4 cosðDyÞ
(16)
pffiffiffiffiffiffiffiffi
where j j represent the absolute value and kyk ¼ yT y.
Samples at these SSPs are used for the clustering
algorithm in Section 2.2. The algorithm to locate the SSPs
in the TF plane is summarized in Table 1.
ARTICLE IN PRESS
1766
V.G. Reju et al. / Signal Processing 89 (2009) 1762–1773
1
6
First cut
7
0.5
5
4
0.5
x2
10
9
1
0
8
2
3
−0.5
1−|cos (θ)|
0.4
Second cut
0.3
0.2
11
12
14
0.1
13
15
−1
0
−1
−0.5
0
x1
0.5
1
2 10 9
3
1
8
4 11 5 14 7 12 6 13 15
Sample index
Fig. 2. Hierarchical clustering: (a) scatter diagram, (b) dendrogram.
2.2. Mixing matrix estimation
After identifying the SSPs in the TF plane, the next
stage is the estimation of the mixing matrix. Here we are
using the hierarchical clustering technique [22,23] for the
estimation of mixing matrix. We are not claiming that this
is the best algorithm to cluster the samples as other
algorithms can also be used [7]. Readers are also referred
to [23] and the references therein for a detailed review on
clustering algorithms. The main contribution of this paper
is the efficient algorithm proposed in Section 2.1 for the
detection of SSPs. The real and imaginary parts of Xðt; kÞ at
~ and
the SSPs in the TF plane are stacked into an array, X,
this array is used as the input for clustering. It can be seen
that either the real or imaginary parts of the sample
vectors at the SSPs are sufficient for clustering as the
absolute directions of RfXðt; kÞg and IfXðt; kÞg are the same,
except for a difference of maximum Dy. See Section 3 for
more explanation.
For hierarchical clustering, we used 1 j cosðyÞj as the
~T X
~
~
~
distance measure, where cosðyÞ ¼ X
m n =ðkXm kkXn kÞ is the
cosine of the angle between mth and nth sample vectors
~ n , respectively, in X.
~ This
~ m and X
(column vectors) X
clustering is illustrated with a simple example in Fig. 2,
where the scatter diagram of the data and its dendrogram
are shown. To get a clear idea about the clustering
algorithm used, Matlab code (only up to the hierarchical
tree generation) is also provided in Table 2. In hierarchical
clustering, data are partitioned into different clusters by
cutting the dendrogram at suitable distance, as shown in
Fig. 2. If the data contains outliers, the selection of the
distance and hence the selection of the number of clusters
is important. For example in Fig. 2, if we divide the
dendrogram into two clusters, we will have one point
(point 15), which is the outlier, in one cluster and the
Table 2
Matlab code for clustering algorithm.
Y
Y
Z
T
¼
¼
¼
¼
pdist(Xtilde, ‘cosine’);
1-abs(1-Y);
linkage(Y,‘average’);
cluster(Z,‘maxclust’,C);
remaining will be in the second cluster. In this particular
case we must form three clusters and discard the cluster
with the least number of samples so that the outlier will
be removed. Automatic selection of the number of clusters
without any knowledge about the data is difficult.1 Hence,
here we assume that out of the valid clusters (if there are
Q sources, there must be Q valid clusters), the cluster with
the minimum number of samples will contain at least 5%
of the average number of samples in the remaining valid
clusters. We also assume that the maximum number of
outliers is less than 5% of the total number of samples in
the valid clusters. Hence in the algorithm for cutting the
dendrogram to form clusters, we will first cut the
dendrogram at a suitable height to form Q clusters and
if the clusters do not satisfy the above conditions, the
dendrogram will be cut at another height to form Q þ 1
clusters. This process is repeated until the above condition
is satisfied or when the maximum number of clusters is
equal to two times the number of sources. In our
experiments the total number of clusters never exceeded
2Q .
1
It may be noted that there are some advanced techniques for the
automatic estimation of the number of sources, e.g., [7].
ARTICLE IN PRESS
1767
V.G. Reju et al. / Signal Processing 89 (2009) 1762–1773
60
80
60
40
40
20
20
0
0
−20
−20
−40
−40
−60
−80
−80
−60
−40
−20
0
20
40
60
80
−60
−60
−40
−20
0
20
40
60
60
40
20
0
−20
−40
−60
−60
−40
−20
0
20
40
60
Fig. 3. Scatter diagram of the mixtures taking samples from 40 frequency bins; P ¼ 2; Q ¼ 6; and Dy ¼ 0:8 (a) all the DFT coefficients, (b) samples at SSPs
obtained by comparing the directions of RfXðt; kÞg with that of IfXðt; kÞg; (c) samples at SSPs obtained after elimination of the outliers.
~ contains only the samples at SSPs, the scatter
Since X
plot will have a clear orientation towards the directions
of the column vectors in the mixing matrix, as shown in
~ will cluster into Q groups.
Fig. 3, and hence the points in X
After clustering, the column vectors of the mixing matrix
are determined by calculating the centroid of each cluster.
The points lying in the left-hand side of the vertical axis in
the scatter diagram (for two mixture case) are mapped to
the right-hand side (by changing their sign) before
calculating the centroid; otherwise, very small value or
zero will result.
The mixing matrix estimation error can be further
reduced by removing the points which are away from the
mean direction of the cluster. Here we remove all the
points which are away from the mean direction of
the cluster by sfq , where is a constant and sfq is the
standard deviation of the directions of the samples in the
qth cluster. In other words, ith sample in the qth cluster is
removed if jfq ðiÞ mfq j4sfq , where fq ðiÞ is the absolute
direction of the ith sample in the qth cluster and mfq is the
mean of the absolute direction of the samples in the qth
cluster. This is illustrated in Fig. 3.
ARTICLE IN PRESS
1768
V.G. Reju et al. / Signal Processing 89 (2009) 1762–1773
3. Experimental results
In all the experiments in this paper, except for the
cluster diagram and in Section. 3.1, average performance
of 100 randomly selected combinations, from a set of 11
speech utterances, which are not sparse in the time
domain, are used. The other experimental conditions are:
sampling frequency 16 kHz, STFT size 1024, Hanning
window as the weighting function and ¼ 0:5.
To show that the proposed algorithm is effective in
identifying the SSPs and hence in estimating the mixing
matrix, six speech utterances are mixed using the mixing
matrix
A¼
0:0872
0:3420
0:7071
0:9848
0:8660
0:5000
0:9962
0:9397
0:7071
0:1736
0:5000
0:8660
The scatter diagram in Fig. 3 clearly shows the effectiveness of the proposed method for selecting the SSPs, which
are in the direction of the column vectors of the mixing
matrix, and rejecting the other points. The mixing matrix
estimation error obtained is
^ ¼
AA
0:0020
0:0049
0:0032
0:0005
0:0007
0:0056
0:0002
0:0018
0:0032
0:0029
0:0012
0:0032
^ is the estimated mixing matrix, which correwhere A
sponds to 47:61 dB normalized mean square error
(NMSE). The NMSE in dB is defined as
!
P
2
^
p;q ðapq apq Þ
NMSE ¼ 10 log10
(17)
P
2
p;q ðapq Þ
where a^ pq is the ðp; qÞth element of the estimated matrix
^ Since the number of samples to be used for clustering
A.
and estimation of the mixing matrix is significantly
reduced, the computational time and memory requirement for the clustering algorithm are also reduced. For
hierarchical clustering the computational complexity is
OðN2 Þ, where N is the number of samples to be clustered
[23]. Generally, in the TF domain, the number of samples
having very small values dominates and these samples
can be removed without much impact on the mixing
matrix estimation error. In all our experiments, except
where it is mentioned, samples in the TF domain having
magnitude below 0.25 (i.e., kRfXðt; kÞgjjo0:25) are removed.
The advantage of elimination of the outliers from the
samples at SSPs estimated by comparing the absolute
direction of RfXðt; kÞg and IfXðt; kÞg is illustrated in Fig. 4,
where the mixing matrix is A whose qth column vector is
½cosðyq Þ; sinðyq ÞT , with yq ¼ ðp=2:4 þ ðq 1Þp=6Þ and
q ¼ 1; 2; . . . ; 6. In Fig. 4, the mixing matrix estimation
error when the initial samples at SSPs obtained by
comparing the absolute direction of RfXðt; kÞg and
IfXðt; kÞg is shown by dotted lines and that obtained after
eliminating the outliers as explained in Section 2.2 and
recalculating the centroid is shown with solid lines.
In Fig. 5, the mixing matrix estimation error obtained
by recalculating the centroid of the clusters after eliminating the outliers is compared with that obtained by reclustering the outlier free samples. It can be seen from the
figure that there is no advantage in re-clustering samples
after eliminating the outliers. Since the SSPs are identified
0
∆θ = 0.8 (By clustering initial SSPs)
∆θ = 0.2 (By clustering initial SSPs)
∆θ = 0.1 (By clustering initial SSPs)
∆θ = 0.05 (By clustering initial SSPs)
∆θ = 0.8 (After elimination of outliers)
∆θ = 0.2 (After elimination of outliers)
∆θ = 0.1 (After elimination of outliers)
∆θ = 0.05 (After elimination of outliers)
−5
−10
NMSE (dB)
−15
−20
−25
−30
−35
−40
−45
0
5
10
15
20
25
Total no. of frequency bins used
30
35
40
Fig. 4. NMSE on mixing matrix estimation before (dotted lines) and after (solid lines) elimination of the outliers from the initial estimated samples at SSPs
for various values of Dy; P ¼ 2 and Q ¼ 6.
ARTICLE IN PRESS
1769
V.G. Reju et al. / Signal Processing 89 (2009) 1762–1773
0
∆θ = 0.8 (After elimination of outliers)
∆θ = 0.2 (After elimination of outliers)
∆θ = 0.1 (After elimination of outliers)
∆θ = 0.05 (After elimination of outliers)
∆θ = 0.8 (After re−clustering the outlier−free samples)
∆θ = 0.2 (After re−clustering the outlier−free samples)
∆θ = 0.1 (After re−clustering the outlier−free samples)
∆θ = 0.05 (After re−clustering the outlier−free samples)
−5
−10
NMSE (dB)
−15
−20
−25
−30
−35
−40
−45
0
5
10
15
20
25
Total no. of frequency bins used
30
35
40
Fig. 5. NMSE on mixing matrix estimation before and after re-clustering the outlier-free samples for various values of Dy; P ¼ 2 and Q ¼ 6.
by comparing the absolute direction of RfXðt; kÞg with that
of IfXðt; kÞg, at SSPs the maximum difference in direction
between the two vectors will be only Dy. Hence there will
not be much difference in performance even if we use
RfXðt; kÞg or IfXðt; kÞg alone instead of both. This is
illustrated in Fig. 6 where the variation of mixing matrix
estimation error for different values of Dy, when RfXðt; kÞg
alone (solid lines) and RfXðt; kÞg together with IfXðt; kÞg
(dotted lines) are used as the data for clustering, as a
function of total number of frequency bins taken is shown.
In all the experiments in this paper the frequency bins
corresponding to one mixture, x1 , were sorted in the
descending order of their variance and the order of the
frequency bins of other mixtures were modified according
to that of x1 before starting the SSP detection. This is
because most of the energy will be concentrated in nearly
10% of the frequency bins [10] and by sorting, the
unnecessary computation in the frequency bins where
the energy is low can be avoided. From Figs. 4, 5, 6 and 8,
it is clear that with a properly selected Dy, only 2–4% of
the frequency bins are sufficient to obtain an accurate
estimate of the mixing matrix. When the number of
sources is fewer, the first few bins will be sufficient to
obtain an accurate estimate of the mixing matrix because
the number of SSPs will increase as the number of sources
decreases [6].
For the case when P ¼ 3, Q ¼ 6 and a randomly
selected mixing matrix
0:6330
6
A ¼ 4 0:5179
0:5754
2
0:7650
0:2892
0:6843
0:0612
0:8156
0:5621
0:7455
0:3364
0:4994
0:1988
0:8156
0:7589
3
0:6284
7
0:5201 5
0:5804
is illustrated in Fig. 7 and the performance is shown in
Fig. 8. In Fig. 8 the error in mixing matrix estimation
obtained when all the samples in RfXg are used is also
shown, where the same procedure described in Section
2.2 is used for the mixing matrix estimation.
3.1. Comparison with other algorithms
3.1.1. Determined case
Here we compare the performance of the proposed
algorithm with several classical algorithms, using the
ICALAB Ver 3 toolbox available at [24]. The references for
the algorithms are available in Help included in the
toolbox. In this experiment the separation performance
of each of the algorithm is obtained for five pairs of speech
utterances each of length 10 s, and for each pair, the
performance is obtained for 100 randomly selected 2 2
mixing matrices. The mean Signal to Interference Ratios
(SIR) in dB, so obtained, for the different algorithms are
shown in Fig. 9. From the figure, it can be seen that the
proposed algorithm outperforms the other classical algorithms.
3.1.2. Underdetermined case
In this experiment we compare the proposed algorithm
with one of the recently reported algorithms. The
algorithm, presented in [12], is an extension of the DUET
and TIFROM algorithms. Unlike the DUET method, for [12]
the spectra of the sources can overlap in the TF domain,
i.e, the W-disjoint orthogonality condition need not be
met. Furthermore, unlike the TIFROM algorithm, the
ARTICLE IN PRESS
1770
V.G. Reju et al. / Signal Processing 89 (2009) 1762–1773
0
∆θ = 0.8 (Using data from both real and imaginary parts)
∆θ = 0.2 (Using data from both real and imaginary parts)
∆θ = 0.1 (Using data from both real and imaginary parts)
∆θ = 0.05 (Using data from both real and imaginary parts)
∆θ = 0.8 (Using data from real part only)
∆θ = 0.2 (Using data from real part only)
∆θ = 0.1 (Using data from real part only)
∆θ = 0.05 (Using data from real part only)
−5
−10
NMSE (dB)
−15
−20
−25
−30
−35
−40
−45
0
5
10
15
20
25
30
35
40
Total no. of frequency bins used
Fig. 6. Comparison of mixing matrix estimation error when samples at SSPs from RfXðt; kÞg alone is used with that when samples at SSPs from both
RfXðt; kÞg and IfXðt; kÞg are used, for various values of Dy; P ¼ 2 and Q ¼ 6.
60
60
40
40
20
20
0
0
−20
−20
−40
−40
60
−60
60
40
40
20
20
0
0
−20
−40
−20
−40
−60 −60
60
−60
60
40
40
20
20
0
−20
0
−20
−40
−40
−60 −60
Fig. 7. Scatter diagram of the mixtures taking samples from 40 frequency bins; P ¼ 3; Q ¼ 6; and Dy ¼ 0:8 (a) all the DFT coefficients, (b) samples at SSPs
after elimination of the outliers.
‘single source region’ is also not needed. This is true for
the proposed algorithm also. Hence we compare the
proposed algorithm with that reported in [12]. Here we
conducted 12 experiments (3-sensor, 4-sensor and 5sensor cases each for 4–7 sources) as shown in Fig. 10.
Each experiment is repeated with 100 different randomly
generated mixing matrices and the mean NMSEs so
obtained are shown in Fig. 10. From the figure, it can be
seen that the proposed algorithm outperforms that in [12]
in all the cases.
Both the algorithms are implemented in the STFT
domain and the number of frequency bins used for both
ARTICLE IN PRESS
1771
V.G. Reju et al. / Signal Processing 89 (2009) 1762–1773
5
After first clustering − Using all the point
After outlier elimination − Using all the point
After second clustering − − Using all the point
After first clustering − Using estimated single−source−points
After outlier elimination − Using estimated single−source−points
After second clustering − − Using estimated single−source−points
0
−5
NMSE (dB)
−10
−15
−20
−25
−30
−35
−40
−45
0
2
4
6
8
10
12
14
Total no. of frequency bins used
16
18
20
Fig. 8. Comparison of NMSE on estimation of the mixing matrix using all the DFT coefficients in the TF plane with that using the estimated SSPs; P ¼ 3;
Q ¼ 6; and Dy ¼ 0:8 .
Algorithms
Proposed
SYM−WHITE
FOBI−E
UNICA
SIMBEC
ERICA
THIN−ICA
NG−FICA
SANG
FPICA
JADE−TD
JADE−OP
SONS
SOBI−BPF
SOBI−RO
SOBI
EVD2
AMUSE
0
10
20
30
40
50
60
SIR (dB)
Fig. 9. Comparison of the proposed algorithm with classical algorithms for determined case, P ¼ Q ¼ 2.
the algorithms are the same. To decide the number of
frequency bins to be used, for each experiment, the
number of frequency bins is increased until the proposed
algorithm detects a minimum of 1000 SSPs.
For the proposed algorithm, the magnitude of the
real part of the mixture vectors at any point ðt; kÞ which
are less than 5% of the maximum magnitude of all the
vectors in the TF plane, i.e., the points with kRfXðt; kÞgk
o0:05maxðkRfXgkÞ, are discarded.
In the algorithm used for comparison, to cluster the
column vectors in E (refer [12]) according to their
directions, the hierarchical clustering algorithm proposed
ARTICLE IN PRESS
1772
V.G. Reju et al. / Signal Processing 89 (2009) 1762–1773
0
−5
−10
−15
NMSE (dB)
−20
−25
−30
−35
−40
−45
−50
Proposed method
Method reported in [12]
−55
3x4
3x5
3x6
3x7
4x4
4x5
4x6
4x7
5x4
5x5
5x6
5x7
Order of the mixing matrices (PxQ)
Fig. 10. Comparison of the proposed algorithm with that proposed in [12].
in this paper is used. The other parameters used are the
same as that used in [12], i.e., M0 ¼ 400, J1 ¼ 100 and
J2 ¼ 100 (refer [12] for more details). In cases where the
algorithm fails to identify sufficient number of submatrices, M0, J1 and J2 are divided by two to obtain their
new values and the experiment is repeated using the new
values.
4. Conclusion
In this paper we have derived a simple and effective
algorithm for single-source-point identification in the TF
plane of the mixture signals for the estimation of mixing
matrix in underdetermined blind source separation. The
algorithm can be used for the mixtures where the spectra
of the sources overlap and the single-source-points occur
only at a small number of locations. The proposed
algorithm does not have any restriction on the number
of sources and mixtures and the single-source-points need
not be in adjacent locations in the TF plane. Since only the
samples at single-source-points are used for the clustering
algorithm for the estimation of the mixing matrix, the
estimation error, computation time and memory requirement are reduced as compared to using all the samples in
the TF plane.
References
[1] A. Hyvärinen, J. Karhunen, E. Oja, Independent Component Analysis,
Wiley, New York, 2001.
[2] A. Cichocki, S. Amari, Adaptive Blind Signal and Image Processing
Learning Algorithms and Applications, Wiley, New York, 2002.
[3] P. Bofill, M. Zibulevsky, Underdetermined blind source separation
using sparse representation, Signal Processing 81 (11) (2001)
2353–2362.
[4] A. Belouchrani, M.G. Amin, Blind source separation based on
time–frequency signal representations, IEEE Transactions on Signal
Processing 46 (11) (1998) 2888–2897.
[5] L.-T. Nguyen, A. Belouchrani, K. Abed-Meraim, B. Boashash,
Separating more sources than sensors using time–frequency
distributions, in: Proceedings of the International Symposium on
Signal Processing and its Applications, 2001, pp. 583–586.
[6] A. Aı̈ssa-El-Bey, N. Linh-Trung, K. Abed-Meraim, A. Belouchrani, Y.
Grenier, Underdetermined blind separation of nondisjoint sources
in the time–frequency domain, IEEE Transactions on Signal
Processing 55 (3) (2007) 897–907.
[7] Y. Luo, W. Wang, J.A. Chambers, S. Lambotharan, I. Proudler,
Exploitation of source nonstationarity in underdetermined
blind source separation with advanced clustering techniques, IEEE
Transactions on Signal Processing 54 (6) (2006) 2198–2212.
[8] C. Févotte, C. Doncarli, Two contributions to blind source separation
using time–frequency distributions, IEEE Signal Processing Letters
11 (3) (2004) 386–389.
[9] N. Mitianoudis, T. Stathaki, Batch and online underdetermined
source separation using Laplacian mixture models, IEEE Transactions
on Audio, Speech and Language Processing 15 (6) (2007) 1818–1832.
[10] R. Saab, O. Yilmaz, M.J. McKeown, R. Abugharbieh, Underdetermined
q
anechoic blind source separation via l -basis-pursuit with qo1, IEEE
Transactions on Signal Processing 55 (8) (2007) 4004–4017.
[11] P. Georgiev, F. Theis, A. Cichocki, Sparse component analysis and
blind source separation of underdetermined mixtures, IEEE Transactions on Neural Networks 16 (4) (2005) 992–996.
[12] Y. Li, S. Amari, A. Cichocki, D.W.C. Ho, S. Xie, Underdetermined blind
source separation based on sparse representation, IEEE Transactions on Signal Processing 54 (2) (2006) 423–437.
[13] Y. Deville, M. Puigt, Temporal and time–frequency correlationbased blind source separation methods. Part I: determined and
underdetermined linear instantaneous mixtures, Signal Processing
87 (3) (2007) 374–407.
[14] A. Jourjine, S. Rickard, O. Yilmaz, Blind separation of disjoint
orthogonal signals: demixing N sources from 2 mixtures, in:
Proceedings of the ICASSP, 2000, pp. 2986–2988.
[15] O. Yilmaz, S. Rickard, Blind separation of speech mixtures via
time–frequency masking, IEEE Transactions on Signal Processing 52
(7) (2004) 1830–1847.
ARTICLE IN PRESS
V.G. Reju et al. / Signal Processing 89 (2009) 1762–1773
[16] F. Abrard, Y. Deville, A time–frequency blind signal separation
method applicable to underdetermined mixtures of dependent
sources, Signal Processing 85 (7) (2005) 1389–1403.
[17] D. Smith, J. Lukasiak, I.S. Burnett, A two channel, block-adaptive
audio separation technique based upon time–frequency information, in: Proceedings of the 12th European Signal Processing
Conference, pp. 393–396.
[18] P. Bofill, Identifying single source data for mixing matrix estimation
in instantaneous blind source separation, in: ICANN (1), 2008, pp.
759–767.
[19] M. Xiao, S. Xie, Y. Fu, A novel approach for underdetermined blind
sources separation in frequency domain—ISNN 2005, Advances in
Neural Networks 2005 (3497) (2005) 484–489.
1773
[20] X. Ming, X. ShengLi, F. YuLi, Searching-and-averaging method of
underdetermined blind speech signal separation in time domain,
Science in China Series F: Information Sciences 50 (5) (2007) 771–782.
[21] Y. Deville, M. Puigt, B. Albouy, Time–frequency blind signal
separation: extended methods, performance evaluation for speech
sources, in: Proceedings of the International Joint Conference on
Neural Networks, 2004, pp. 255–260.
[22] A.K. Jain, M.N. Murty, P.J. Flynn, Data clustering: a review, ACM
Computing Surveys 31 (3) (1999) 264–323.
[23] R. Xu, D. Wunsch II, Survey of clustering algorithms, IEEE
Transactions on Neural Networks 16 (3) (2005) 645–678.
[24] A. Cichocki, S. Amari, K. Siwek, T. Tanaka, A.H. Phan, et al., ICALAB
Toolboxes hhttp://www.bsp.brain.riken.jp/ICALABi.