Speech 1

Applied Acoustics 148 (2019) 40–45
Contents lists available at ScienceDirect
Applied Acoustics
journal homepage: www.elsevier.com/locate/apacoust
The role of direct sound spherical harmonics representation in

externalization using binaural reproduction
Eran Miller, Boaz Rafaely
Department of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer-Sheva 84105, Israel
a r t i c l e i n f o a b s t r a c t
Article history: The importance of the information in the direct sound to human perception of spatial sound sources is an
Received 31 July 2018 ongoing research topic. The classification between direct sound and diffuse or reverberant sound forms
Received in revised form 11 October 2018 the basis of numerous studies in the field of spatial audio. In particular, parametric spatial audio repre-
Accepted 6 December 2018
sentation methods use this classification and employ signal processing in order to enhance the audio
quality at reproduction. However, current literature does not provide information concerning the impact
of ideal direct sound representation on externalization, in the context of Ambisonics. This paper aims to
assess the importance of the spatial information in the direct sound in the externalization of a sound field
when using binaural reproduction. This is done in the spherical harmonics (SH) domain, where an ideal
direct sound representation within an otherwise Ambisonics signal is simulated, and its perceived exter-
nalization is evaluated in a formal listening test. This investigation leads to the conclusion that external-
ization of a first order Ambisonics signal may be significantly improved by enhancing the direct sound
component, up to a level similar to a third order Ambisonics signal.
Ó 2018 Elsevier Ltd. All rights reserved.
1. Introduction metric spatial sound formats have emerged [6–8]. These methods
usually employ a B-format Ambisonics microphone and enhance
Spatial audio is attracting increasing attention in research and the spatial audio quality by using signal processing that is based
industry, for applications of virtual reality, music, telecommunica- on the attributes of human spatial hearing. One such attribute is
tion and more. Spatial audio recording and reproduction methods the classification between direct sound and diffuse sound, forming
have been developed to deliver a 3D sound experience. This is the basis for Directional Audio Coding (DirAC) [6,9], an established
achieved with the playback of sound via a loudspeaker array or parametric spatial sound representation. In DirAC, the signal is
by using a set of headphones, via binaural reproduction. divided into two streams, one corresponding to the directional part
A popular format of spatial audio is Ambisonics [1,2]. This is a and the other to the diffuse part. This is done by the estimation of
linear non-parametric representation based on sound field decom- two main parameters for each time–frequency bin: the direction-
position into spherical harmonics (SH) of the first order. Ambison- of-arrival (DOA) and the diffuseness. In the reproduction stage,
ics can encode either simulated sound fields, or measured sound the direct sound stream is reproduced as a plane-wave arriving
fields using recording systems such as the SoundField microphone. from the estimated DOA, while the diffuse sound stream is ren-
Using a suitable decoder, the Ambisonics signals can be played dered using plane waves propagating in a wide range of directions
back using loudspeaker arrays or headphones. Due to the linear after decorrelation. The perceptually based non-linear processing
processing, Ambisonics does not introduce non-linear distortion, of methods such as DirAC were found to be preferable over
but has the limitation of low spatial resolution due to the inherent Ambisonics [10,11] and even over HOA with limited order [12],
first SH order [3,4]. Higher order Ambisonics (HOA) [5], aims to showing the potential benefit of appropriate manipulation of the
achieve a more physically accurate reconstruction of the sound direct and diffuse parts of the sound field.
field, with higher spatial resolution than Ambisonics. This, in turn, In addition to DirAC, developed for spatial coding, other studies
requires recording with a larger number of microphones, imposing investigated the importance of the direct and reverberant parts of a
practical constraints on the recording system. sound field for spatial hearing. These have shown that the acoustic
With the aim of employing simple recording systems yet information in the direct sound dominates over that found in a
achieving high quality at reproduction, perceptually driven para- single or in multiple reflections [13–15]. Moreover, more recent
studies employing binaural reproduction based on binaural room
impulse responses (BRIRs), reveal that this dominance has a
E-mail addresses: eranmil@post.bgu.ac.il (E. Miller), br@bgu.ac.il (B. Rafaely).
https://doi.org/10.1016/j.apacoust.2018.12.011
0003-682X/Ó 2018 Elsevier Ltd. All rights reserved.
E. Miller, B. Rafaely / Applied Acoustics 148 (2019) 40–45 41
perceptual impact on a variety of attributes, such as localization, the pressure at a listener’s left and right ears can be formulated
source width and timbre, for example [16–18]. Concerning exter- as an integration over a sphere [22]:
nalization in particular, which is the focus of this work, in [19] it Z 2p Z p
l;r
was shown that as long as the direct part of the BRIR is kept pl;r ðkÞ ¼ aðk; h; /Þh ðk; h; /Þ sin hdhd/; ð1Þ
unchanged, the level of externalization is barely affected by spec- 0 0
tral smoothing of the reverberant part of the BRIR. In [20], replac- where aðk; h; /Þ is the complex amplitude density of a plane-wave of
ing the reverberant part of BRIRs with a monaural response did not wavenumber k, with ðh; /Þ denoting the plane wave arrival direc-
affect the externalization in some cases, e.g. lateral sound sources, l;r
tion. h ðk; h; /Þ are the complex amplitudes of the head related
as long as the direct part was kept unchanged. While these exper-
transfer functions (HRTFs), representing the frequency response at
iments investigated the effect of the direct sound in a controlled
the ear canals at wavenumber k due to a far-field sound source pro-
manner by manipulating BRIRs, and even tested externalization,
ducing a plane wave arriving from direction ðh; /Þ. The superscripts
they did not investigate the importance of the direct sound in
l; r represent the left and right ears, respectively.
the context of spatial audio coding, and, in particular, in the l;r
context of Ambisonics. aðk; h; /Þand h ðk; h; /Þ can be represented as a weighted sum of
Although previous studies with Ambisonics and SH representa- SH, defining their inverse spherical Fourier transform (ISFT) [23]:
tions manipulated the direct sound in the time-frequency domain X
Na X
n
(e.g. DirAC, SASC [8]), they also included manipulations of the dif- aðk; h; /Þ ¼ anm ðkÞY m
n ðh; /Þ; ð2Þ
fuse part. Therefore, conclusions regarding the explicit importance n¼0 m¼n
of the direct sound in the context of Ambisonics cannot be drawn

from these studies. Furthermore, these studies focused mostly on l;r
X
Nh X
n
l;r
h ðk; h; /Þ ¼ hnm ðkÞY m
n ðh; /Þ; ð3Þ
the general quality and not specifically on externalization, and so
n¼0 m¼n
conclusions regarding the explicit importance of the direct sound
l;r
to Ambisonics with respect to externalization also cannot be where hnm ðkÞ and anm ðkÞ are the spherical Fourier transform (SFT)
drawn from these studies. coefficients and N a and N h are the SH orders of a and h, respectively.
In this work, the importance for externalization of an ideal anm ðkÞ can be either analytically synthesized, in the case of a virtual
direct sound representation within an otherwise Ambisonics signal environment, or calculated by sampling the sound field using a
is investigated. It should be noted that while other attributes such spherical microphone array and, later, employing plane-wave
as localization and source width are also important, this study decomposition (PWD) in the SH domain [24,25]. For anm ðkÞ com-
focuses only on externalization. The study of other attributes is puted from microphone array measurements, the order N a may be
proposed for future work. A simulated binaural signal is manipu- limited by the number of microphones. For example, for the Eigen-
lated using a mixed SH order scheme. This leads to a sound field mike array [21] a maximum order of N a ¼ 4 can be computed.
that is divided into a direct component and a reverberant compo- l;r
hnm ðkÞ can be computed from measurements or numerical simula-
nent. The direct part is rendered with a high SH order, leading to an l;r
ideal spatial representation, while the reverberant part is rendered tions of h ðk; h; /Þ, which is typically sampled at thousands of direc-
with first order SH. The sound field is then rendered for head- tions [26,27]. The typical order for HRTF databases is around
phones listening via binaural reproduction. Externalization of this Nh ¼ 30 [28,29]. Y m n ðh; /Þ are a set of SH functions of order n P 0
mixed-order signal is then compared to a reference signal, ren- and degree n 6 m 6 n, and ðÞ denotes the complex conjugate.
dered with high-order Ambisonics of order 30, and another high- Substituting Eqs. (2) and (3) into Eq. (1), using the approach
order Ambisonics signal of order 3. The latter was chosen as an developed by Rafaely and Avni [30], the sound pressure function
intermediate representation. First, to be consistent with previous at the left and right ears can be calculated using the SH represen-
studies [12], second, because it represents the output of practical tation of the sound field and the HRTFs:
arrays such as the Eigenmike [21], and third, because it was per- X
minðN a ;N h Þ X
n
ceptually similar to the mixed-order signal when evaluated in a pl;r ðkÞ ¼ ~nm ðkÞ hl;r
a nm ðkÞ; ð4Þ
preliminary informal listening test. The hypothesis of this research n¼0 m¼n
was that a high-order SH representation of the direct sound will
~nm ðkÞ is the SFT of aðk; h; /Þ , defined as:
where a
significantly enhance the externalization of an otherwise Ambison-
Z 2p Z p
ics signal. In addition to refuting or validating this hypothesis, the
~nm ðkÞ ¼
a aðk; h; /Þ Y m
n ðh; /Þ sin hdhd/: ð5Þ
aim of this research is to quantify the extent of this enhancement, 0 0
and its dependence on the acoustic environment. The results indi-
cate that an enhancement of the direct component of the sound This, using the orthogonality property of the SH, ensures Parseval’s
field leads to a signal that is perceived to be more externalized than relation is satisfied [31].
a first order Ambisonics signal, and in many cases similar to a third Ambisonics, for example, is based on spatial encoding using SH
order Ambisonics signal for different audio content and acoustic of the first order [32], which could be obtained using a B-format
environments. microphone array. Utilizing the Ambisonics signals for binaural
reproduction within Eq. (4) yields the truncation of the summation
at minðN a ; N h Þ ¼ 1. This kind of order truncation leads to the repro-
2. Binaural sound reproduction based on spherical harmonics duction of a sound field of low spatial resolution [25] which could
lead to undesired effects on key perceptual attributes such as: tim-
In this section an overview of the mathematical basis for binau- bral artifacts, loss of externalization and degraded localization
ral sound reproduction is presented. Consider a sound pressure [30,33–36,4].
function pðk; r; h; /Þ, where ðr; h; /Þare the standard spherical coor-
dinates, h 2 ½0; p being the elevation angle, measured downwards 3. Sound field representation using mixed SH order
from the Cartesian z-axis, and / 2 ½p; pÞ being the azimuth angle,
measured counter-clockwise from the Cartesian x-axis on the xy- In this section the mathematical formulation for the representa-
plane, and k ¼ x=c is the wave number, with x being the radial tion of a sound field with mixed SH order is presented. Eq. (4) can
frequency and c being the speed of sound. A representation of be further modified such that the direct component and the
42 E. Miller, B. Rafaely / Applied Acoustics 148 (2019) 40–45
reverberant component of the sound field are represented using The binaural signals were generated in two virtual acoustic
different orders. Assuming a source in the far field, the direct part environments, referred to as environment 1 and environment 2.
of the sound field is composed of a single plane wave arriving from The two different environments were chosen to diversify the
direction ðhk ; /k Þ. In the SH domain, its amplitude density function acoustic environment condition, to ensure that conclusions are
m
nm ðkÞ ¼ AðkÞ½Y n ðhk ; /k Þ [31]. Now, the SH represen-
is of the form aDIR not specific to a single environment. The two environments differ
tation of the amplitude density function of the sound field can be in the source position relative to the listener, which was located
written as at position [9,7]. For environment 1, the source was located at
1.5 times the critical distance from the listener’s location (i.e
anm ðkÞ ¼ aDIR
nm ðkÞ þ anm ðkÞ;
REV
ð6Þ 3:315 m), while for environment 2, the source was located at 3
times the critical distance from the listener’s location (i.e. 6:63
nm ðkÞ is the amplitude density function of the reverberant
where aREV
m), leading to a reduction of 6 dB in the direct sound energy rela-
part of the sound field. This leads to the mixed SH formulation of
tive to the reverberant sound energy, compared to environment 1.
the sound field:
It should be noted that both environments represent relatively
X
Nd X
n reverberant conditions, as the listener is positioned further away
l;r
pl;r ðkÞ ¼ ~DIR
½anm ðkÞ hnm ðkÞ from the source compared to the critical distance. This setting
n¼0 m¼n
ð7Þ was chosen due to its improved externalization, and for studying
X
Nr X
n
l;r the effect of enhancing the direct sound in acoustic environments
þ ~REV
½anm ðkÞ hnm ðkÞ; with distant sources and negative DRR. For both environments, the
n¼0 m¼n
source was located at 30 from the listener, relative to the HRTF
where N d and N r are the orders for the direct and reverberant com- coordinate system, and at the same height as the listener’s head.
ponents, respectively, and are not necessarily equal, allowing for An important goal of this experiment is to evaluate the externaliza-
the reproduction of spatial audio with enhanced direct sound. tion of the mixed order signal in both environments, which differ
in their direct-to-reverberant ratio (DRR).
Two source signals were used:
4. Methodology
1. A pink noise repeating burst (1s duration including 20 ms fade
With the aim of studying the importance for externalization of in and fade out, followed by a 0.3 s pause before the next burst),
the information in the direct sound using binaural reproduction, a chosen for its wide bandwidth.
listening test based on Recommendation ITU-R BS.1534-1 2. Speech segment in the English language (3.26 s duration) from
(MUSHRA, MUltiple Stimuli with Hidden Reference and Anchor) the TIMIT corpus [39], chosen as it represents a typical real life
was developed. audio content.
A rectangular room of dimensions 15.5 9.8 7.5 m with a
wall reflection coefficient of R ¼ 0:8 and T 60 ¼ 0:75 s was simu- The different environments and signals are summarized in Tables 1
lated using the image method [37]. The critical distance for this and 2, respectively.
room is rd ¼ 2:21 m. A room impulse response from a point source Previous studies [40] have shown that the truncation of the SH
to a listener’s position was calculated and represented in the form series to a lower order may alter the timbre of a binaural signal.
of aðt; h; /Þ, and further encoded in the SH domain as anm ðtÞ, with t This may affect the task of rating the signals according to external-
representing time. The sound field, anm ðkÞ, was then calculated at ization level only. In order to overcome this issue, a spectral equal-
the listener’s position with orders N d and N r according to Eq. (7), ization filter was employed, as described in [41], ensuring all
by convolving the room impulse response anm ðtÞ with the source signals were equalized to the reference signal. The signals were
signal, sðtÞ, leading to anm ðkÞ after transformation to frequency. rendered using the Sound Scape Renderer (SSR) software [42]
anm ðkÞ was later multiplied with a set of HRTFs of matching orders and played back using AKG701 reference headphones. All signals
N d and N r in order to provide a binaural signal. For this investiga- were convolved with a matching headphone compensation filter.
tion, the Cologne HRTF compilation of the Neumann KU-100 [28] For spatial realism, horizontal head movements were allowed,
was used. and the headphones were mounted with a Razor AHRS head
In order to asses the impact of the direct part to the perceived tracker. The signals were generated to support head rotations, cov-
externalization, four binaural signals were generated: ering the horizontal plane with a resolution of 1 . The latency for
the SSR, under the settings used in this experiment, is 17:5 ms.
1. Mixed order signal: a binaural signal with an ideal representa- Together with the latency of the head tracker, the total latency is
tion of the direct sound, encoded with N d ¼ 30, and low order lower than 60 ms, which is sufficient for acceptable levels of local-
representation of the reverberant sound encoded with N r ¼ 1. ization accuracy [43].
2. Reference: a binaural signal with an ideal representation of 15 normal hearing subjects participated in this experiment. 6 of
both the direct and reverberant sound components. This was them are expert listeners and the rest are naive listeners. The
implemented in practice using encoding with a SH order of experiment included a total of 4 MUSHRA screens: 2 screens for
N d ¼ N r ¼ 30 . This high SH order represents the most accurate each of the two environments, according to Table 1. Each screen
representation of the binaural signal for this system It also presented 4 signals according to Table 2. In each screen the sub-
avoids spatial aliasing in the range of f 6 20 kHz [31] and also jects were instructed to rate the degree to which the sound source
avoids the need for other methods of interpolation for produc- is perceived to be originating from inside or outside the head, com-
ing HRTFs directions that are unavailable in the database. Fur-
thermore, according to [38], HRTF representation of such a
Table 1
high SH order should be sufficient to yield correct spatial
Parameters of the acoustic environments.
details.
3. Anchor: a signal encoded with N d ¼ N r ¼ 1, representing Environment 1 Environment 2
encoding in the Ambisonics format. Signal type Noise, Speech Noise, Speech
4. Third order signal N d ¼ Nr ¼ 3 was chosen as an intermediate DRR 3:52 dB 9:52 dB
Source-listener dist. 3:315 m 6:63 m
representation, providing an additional reference point.
Table 2 externalized. The median scores of all signals differ with signif-
Binaural signals SH order. icance (p < 0.001 for both speech and noise), except for the
Signal Nd Nr mixed order signal and the third order signal. As in environ-
Mixed order 30 1 ment 1, no statistically significant difference was found
Reference 30 30 between the scores of the mixed order signal and the third
Third order 3 3 order signal. The p-value between these signals are p = 0.11
Anchor 1 1 for the noise source and p = 0.89 for the speech source.
pared to the reference, on a scale from 0 to 100. Before rating, the

subjects conducted a training task, in order to familiarize them 6. Discussion
with the experiment and stimuli.
The effect of enhancing the spatial representation of the direct
5. Results part of a sound field is shown to impact the externalization of a
binaural signal, as the mixed order signal was perceived to be more
Externalization results are presented in this section, divided externalized than the first order signal in both environments and
into the two environments. The one-way analysis of variance for both noise and speech. This proves that even with a decrease
(ANOVA) was used as the statistical test. In order to. in the DRR (see Table 1), the information encoded in the direct part
contributes to externalization.
1) Environment 1: Figs. 1a and b present the results for environ- The scores for the mixed order signal and the third order signal
ment 1. The reference is shown to be perceived as the most were found to be of no statistically significant difference. This
externalized for both source signals. The median scores of all shows that to some extent, with respect to externalization,
signals differ with significance, with p < 0.001 for both speech enhancing the direct part in a first-order Ambisonics signal could
and noise, except for the difference of the scores between the yield a signal which resembles a third order Ambisonics signal.
mixed order signal and the third order signal, which is not This is in some agreement with the results presented in [12], com-
statistically significant with p = 0.33 for the noise source and paring DirAc to Ambisonics for overall quality.
p = 0.36 for the speech source. Because the externalization of the reference signal was not
2) Environment 2: Fig. 1c and d present the results for environ- compared directly between the two environments, it is not possi-
ment 2. The reference is shown to be perceived as the most ble to deduce which of the two environments is perceived to be
Fig. 1. Results for the externalization ratings in environment 1 and environment 2. Box plot visualization, marking the median with a red line where the bottom and top
edges of the box represent the 25th and 75th percentiles, respectively. The whiskers represent the variability of the ratings outside the upper and lower quartiles. Outliers are
marked with red ‘+’. The width of the box plot notches has been calculated such that boxes with non-overlapping notches have medians which are different at the 5%
significance level. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
44 E. Miller, B. Rafaely / Applied Acoustics 148 (2019) 40–45
more externalized. Nevertheless, the effect of spherical harmonics [3] Bertet Stéphanie, Daniel Jérôme, Parizet Etienne, Warusfel Olivier.
Investigation on localisation accuracy for first and higher order ambisonics
order (for the four signal types) seems similar under the two envi-
reproduced sound sources. Acta Acust United Acust 2013;99(4):642–57.
ronments. Indeed, a multi-factor ANOVA test confirms that the [4] Braun Sebastian, Frank Matthias. Localization of 3d ambisonic recordings and
interaction between the spherical harmonics order factor and the ambisonic virtual sources. In: 1st international conference on spatial audio,
environment factor is not significant. These results can be (Detmold).
[5] Daniel Jérôme, Moreau Sébastien. Further study of sound field coding with
explained by the similar nature of the two environments. They higher order ambisonics. In: Audio engineering society convention 116. Audio
are both reverberant, with the listener’s distance from the source Engineering Society; 2004.
being greater than the critical. [6] Pulkki Ville. Spatial sound reproduction with directional audio coding. J Audio
Eng Soc 2007;55(6):503–16.
It is interesting to note that the score for the mixed order signal [7] Berge Svein, Barrett Natasha. High angular resolution planewave expansion.
for both environments, seems to slightly improve compared to the In: Proc. of the 2nd international symposium on ambisonics and spherical
third order, when comparing speech to noise. In environment 2 acoustics. p. 6–7.
[8] Goodwin Michael, Jot Jean-Marc. Spatial audio scene coding. In: Audio
(see Fig. 1c, d), for example, for the noise source, the mixed order engineering society convention 125. Audio Engineering Society; 2008.
median score is 44 while the third order signal score is 72. On [9] Politis Archontis, Vilkamo Juha, Pulkki Ville. Sector-based parametric sound
the other hand, for the speech source, the score of the mixed order field reproduction in the spherical harmonic domain. IEEE J Selec Top Signal
Process 2015;9(5):852–66.
signal is 72 while the score of the third order signal is 65. A possi- [10] Vilkamo Juha, Lokki Tapio, Pulkki Ville. Directional audio coding: virtual
ble explanation could be the non-stationarity of speech, which microphone-based synthesis and subjective evaluation. J Audio Eng Soc
includes repeated onsets, compared to the noise signal that has a 2009;57(9):709–24.
[11] Barrett Natasha, Berge Svein. A new method for b-format to binaural
single onset every 1 s. Therefore, with speech (or any other signal
transcoding. In: Audio engineering society conference: 40th international
that contains numerous onsets), the direct sound may have a conference: spatial audio: sense the sound of space. Audio Engineering
greater importance and its enhancement may therefore lead to a Society; 2010.
greater impact on externalization. [12] Politis Archontis, McCormack Leo, Pulkki Ville. Enhancement of ambisonic
binaural reproduction using directional audio coding with optimal adaptive
mixing. In: Applications of signal processing to audio and acoustics (WASPAA),
2017 IEEE workshop on. IEEE; 2017. p. 379–83.
7. Conclusion [13] Wallach Hans, Newman Edwin B, Rosenzweig Mark R. A precedence effect in
sound localization. J Acoust Soc Am 1949;21(4). 468–468.
In this paper, a binaural signal was manipulated in the SH [14] Litovsky Ruth Y, Steven Colburn H, Yost William A, Guzman Sandra J. The
precedence effect. J Acoust Soc Am 1999;106(4):1633–54.
domain, facilitating an evaluation of the improvement in the exter-
[15] Zurek Patrick M. The precedence effect. In: Directional hearing. Springer; 1987.
nalization of a sound field that is composed of a reverberant part of p. 85–105.
low spatial resolution and a direct part of ideal spatial representa- [16] Klockgether Stefan, van Dorp Schuitman Jasper, Van de Par Steven. Perceptual
limits for detecting interaural-cue manipulations measured in reverberant
tion. A subjective listening test showed that an enhancement of the
settings. In: Proceedings of meetings on acoustics ICA2013, vol. 19. ASA; 2013.
direct part alone yields a sound field which is perceived to be p. 015004.
externalized at a level similar to that of a sound field represented [17] Devore Sasha, Ihlefeld Antje, Hancock Kenneth, Shinn-Cunningham Barbara,
in the third order, and more externalized than a sound field repre- Delgutte Bertrand. Accurate sound localization in reverberant environments is
mediated by robust encoding of spatial cues in the auditory midbrain. Neuron
sented in the first order for all cases inspected. The results pre- 2009;62(1):123–34.
sented in this paper could have implications for various [18] Ihlefeld Antje, Shinn-Cunningham Barbara G. Effect of source spectrum on
applications of spatial audio: sound localization in an everyday reverberant room. J Acoust Soc Am
2011;130(1):324–33.
[19] Hassager Henrik Gert, Gran Fredrik, Dau Torsten. The role of spectral detail in
1) For applications in which the room impulse response is mea- the binaural transfer function on perceived externalization in a reverberant
sured with a first-order SH microphone array such as the environment. J Acoust Soc Am 2016;139(5):2992–3000.
[20] Catic Jasmina, Santurette Sébastien, Dau Torsten. The role of reverberation-
Soundfield microphone [44], the benefit to externalization by related binaural cues in the externalization of speech. J Acoust Soc Am
rendering the direct sound with a high spatial quality is shown 2015;138(2):1154–67.
to be significant as quantified in this paper. Room acoustics [21] MH Acoustics. Em32 eigenmike microphone array release notes (v17. 0). 25
Summit Ave, Summit, NJ 07901, USA; 2013.
auralization and binaural playback of anechoic audio signals
[22] Menzies Dylan, Al-Akaidi Marwan. Nearfield binaural synthesis and
through measured room impulse responses could be example ambisonics. J Acoust Soc Am 2007;121(3):1559–63.
applications. [23] Driscoll James R, Healy Dennis M. Computing fourier transforms and
convolutions on the 2-sphere. Adv Appl Math 1994;15(2):202–50.
2) For applications in which audio signals are directly recorded
[24] Park Munhum, Rafaely Boaz. Sound-field analysis by plane-wave
with a microphone array of a first order, enhancing the spatial decomposition using spherical microphone array. J Acoust Soc Am 2005;118
resolution of the direct sound may require a preliminary stage (5):3094–103.
of identifying direct-sound components. Although this [25] Rafaely Boaz. Plane-wave decomposition of the sound field on a sphere by
spherical convolution. J Acoust Soc Am 2004;116(4):2149–57.
approach is already implemented in DirAC [6], new methods [26] Evans Michael J, Angus James AS, Tew Anthony I. Analyzing head-related
that were recently developed for identifying direct-sound com- transfer function measurements using surface spherical harmonics. J Acoust
ponents in the time-frequency domain, can be studied to fur- Soc Am 1998;104(4):2400–11.
[27] Ralph Algazi V, Duda Richard O, Thompson Dennis M, Avendano Carlos. The
ther improve the quality of Ambisonics signals, motivated by cipic hrtf database. In: Applications of Signal Processing to Audio and
the potential benefit to externalization, as presented here. Acoustics, 2001 IEEE Workshop on the. IEEE; 2001. p. 99–102.
These methods include the direct-path-dominance (DPD) test [28] Bernschütz Benjamin. A spherical far field HRIR/HRTF compilation of the
neumann ku 100. In: Proceedings of the 40th Italian (AIA) annual conference
[45], with its recent extensions [46]47. on acoustics and the 39th German annual conference on acoustics (DAGA)
conference on acoustics. p. 29.
Similar studies, concerning other attributes, such as localization, [29] Zhang Wen, Abhayapala Thushara D, Kennedy Rodney A, Duraiswami Ramani.
Insights into head-related transfer function: spatial dimensionality and
source width, timbre perception, and more, are proposed for future
continuous representation. J Acoust Soc Am 2010;127(4):2347–57.
work. [30] Rafaely Boaz, Avni Amir. Interaural cross correlation in a sound field
represented by spherical harmonics. J Acoust Soc Am 2010;127(2):
823–8.
References [31] Rafaely Boaz. Fundamentals of spherical array processing, vol. 8. Springer;
2015.
[1] Gerzon Michael A. Periphony: with-height sound reproduction. J Audio Eng Soc [32] Frank Matthias, Zotter Franz, Sontacchi Alois. Producing 3d audio in
1973;21(1):2–10. ambisonics. In: Audio engineering society conference: 57th international
[2] Gerzon Michael A. Ambisonics in multichannel broadcasting and video. J Audio conference: the future of audio entertainment technology-cinema, television
Eng Soc 1985;33(11):859–71. and the internet. Audio Engineering Society; 2015.
[33] Bertet Stéphanie, Daniel Jérôme, Gros Laëtitia, Parizet Etienne, Warusfel [40] Avni Amir, Ahrens Jens, Geier Matthias, Spors Sascha, Wierstorf Hagen, Rafaely
Olivier. Investigation of the perceived spatial resolution of higher order Boaz. Spatial perception of sound fields recorded by spherical microphone
ambisonics sound fields: a subjective evaluation involving virtual and real 3d arrays with varying spatial resolution. J Acoust Soc Am 2013;133(5):2711–21.
microphones. In: Audio engineering society conference: 30th international [41] Ben-Hur Zamir, Brinkmann Fabian, Sheaffer Jonathan, Weinzierl Stefan,
conference: intelligent audio environments. Audio Engineering Society; 2007. Rafaely Boaz. Spectral equalization in binaural signals represented by order-
[34] Romigh Griffin D, Brungart Douglas S, Stern Richard M, Simpson Brian D. truncated spherical harmonics. J Acoust Soc Am 2017;141(6):4087–96.
Efficient real spherical harmonic representation of head-related transfer [42] Geier Matthias, Ahrens Jens, Spors Sascha. The soundscape renderer: a unified
functions. IEEE J Selec Top Signal Process 2015;9(5):921–30. spatial audio reproduction framework for arbitrary rendering methods. In:
[35] Thresh Lewis, Armstrong Cal, Kearney Gavin. A direct comparison of 124th AES Conv.. Citeseer; 2008.
localization performance when using first, third, and fifth ambisonics order [43] Brungart Douglas S, Simpson Brian D, McKinley Richard L, Kordik Alexander J,
for real loudspeaker and virtual loudspeaker rendering. In: Audio engineering Dallman Ronald C, Ovenshire David A. The interaction between head-tracker
society convention 143. Audio Engineering Society; 2017. latency, source duration, and response time in the localization of virtual sound
[36] Liu Yang, Xie BS. Analysis on the timbre of ambisonics reproduction using a sources. Georgia Institute of Technology; 2004.
binaural loudness model. In: The 21 international congress on sound and [44] Farrar Ken. Soundfield microphone. Wireless World 1979;85(1526):48–50.
vibration. [45] Nadiri Or, Rafaely Boaz. Localization of multiple speakers under high
[37] Allen Jont B, Berkley David A. Image method for efficiently simulating small- reverberation using a spherical microphone array and the direct-path
room acoustics. J Acoust Soc Am 1979;65(4):943–50. dominance test. IEEE/ACM Trans Audio Speech Lang Process 2014;22
[38] Ahrens Jens, Thomas Mark RP, Tashev Ivan. HRTF magnitude modeling using a (10):1494–505.
non-regularized least-squares fit of spherical harmonics coefficients on [46] Rafaely Boaz, Alhaiany Koby. Speaker localization using direct path dominance
incomplete data. In: Signal & information processing association annual test based on sound field directivity. Signal Process 2018;143:42–7.
summit and conference (APSIPA ASC), 2012 Asia-Pacific. IEEE; 2012. p. 1–5. [47] Schymura Christopher, Guo Peng, Maymon Yanir, Rafaely Boaz, Kolossa
[39] Garofolo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS, Dahlgren NL. Darpa Dorothea. Exploiting structures of temporal causality for robust speaker
timit acoustic phonetic continuous speech corpus (vol. ldc93s1). Philadelphia: localization in reverberant environments. In: International conference on
Linguistic Data Consortium 1993. latent variable analysis and signal separation. Springer; 2018. p. 228–37.

Speech 1

Uploaded by

Copyright:

Available Formats

Speech 1

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Speech 1

Uploaded by

Copyright:

Available Formats

Applied Acoustics 148 (2019) 40–45

Contents lists available at ScienceDirect

The role of direct sound spherical harmonics representation in

of the direct sound in the context of Ambisonics cannot be drawn

pared to the reference, on a scale from 0 to 100. Before rating, the

You might also like