686
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 45, NO. 6, JUNE 1998
Wavelet Analysis of Click-Evoked
Otoacoustic Emissions
Gabriella Tognola,* Ferdinando Grandori, and Paolo Ravazzani
Abstract— Time-frequency distribution methods are being
widely used for the analysis of a variety of biomedical signals.
Recently, they have been applied also to study otoacoustic
emissions (OAE’s), the active acoustic response of the hearing
end organ. Click-evoked otoacoustic emissions (CEOAE’s) are
time-varying signals with a clear frequency dispersion along
with the time axis. Analysis of CEOAE’s is of considerable
interest due to their close relation with cochlear mechanisms.
In this paper, several basic time-frequency distribution methods
are considered and compared on the basis of both simulated
signals and real CEOAE’s. The particular structure of CEOAE’s
requires a method with both a satisfactory time and frequency
resolution. Results from simulations and real CEOAE’s revealed
that the wavelet approach is highly suitable for the analysis of
such signals. Some examples of the application of the wavelet
transform to CEOAE’s are provided here. Applications range
from the extraction of normative data from adult and neonatal
OAE’s to the extraction of quantitative parameters for clinical
purposes.
Index Terms— Choi–Williams distribution, click-evoked otoacoustic emissions, full-term neonates, noise-induced hearing loss,
normal adults, short-time Fourier transform, time-frequency resolution properties, wavelet transform, Wigner–Ville distribution.
I. INTRODUCTION
D
URING the last decade, time-frequency distribution
methods are being systematically used for the analysis
of otoacoustic emissions (OAE’s) [1]–[5]. OAE’s are acoustic
signals emitted by the cochlea and reflect the active processes
that are involved in the transduction of mechanical energy into
electrical energy [6]–[8]. One of the most attractive features
of OAE’s is their tight relation to the cochlear status: OAE’s
are universally present to a various degree in all healthy
cochleae, whereas they are not generally observed or are
greatly reduced in ears with mild hearing losses. This aspect,
together with the extreme facility to perform the test and the
high reproducibility both on short- and long-term, has made the
OAE’s an increasingly widespread neonatal hearing screening
method [9], [10]. Due to their good long-term reproducibility,
OAE’s can also be used to monitor the cochlear functionality
in patients exposed to prolonged noise (see, e.g., [11] and
[12]) or ototoxic agents (see, e.g., [13] and [14]).
OAE’s can be classified according to the type of the stimulus
that elicits them; in the case of click-evoked otoacoustic
Manuscript received February 19, 1997; revised January 13, 1998. Asterisk
indicates corresponding author.
*G. Tognola is with the Department of Biomedical Engineering, Polytechnic
of Milan, Piazza Leonardo da Vinci, 32, I-20133 Milan, Italy (e-mail:
tognola@biomed.polimi.it).
F. Grandori and P. Ravazzani are with the CNR Center of Biomedical
Engineering, Polytechnic of Milan, I-20133 Milan, Italy.
Publisher Item Identifier S 0018-9294(98)03614-3.
emissions (CEOAE’s), the stimulus is an acoustic click of brief
duration (about 100 s). CEOAE’s are nonstationary signals
and exhibit a clear frequency dispersion along with time: in the
OAE response to a click, a predominance of high-frequency
components at short latencies and low-frequency components
at longer latencies is observed. Interestingly, this distribution
of frequencies at different latencies is very similar to the placefrequency distribution along the cochlea [6], [7], [15], [16].
Analysis of the time-frequency properties of CEOAE’s is,
therefore, of considerable interest due to their close relation
with cochlear mechanisms. In particular, since OAE’s in
response to clicks evoke a cumulative response from the whole
cochlea, the analysis of CEOAE’s can yield a global view of
cochlear function. On the other hand, measurements of timefrequency properties of CEOAE’s have encountered a variety
of technical problems such as the difficulty in determining the
contribution of each single elementary frequency component.
To obtain accurate results, appropriate techniques of signal
processing are required.
In a recent paper [4], application of the wavelet analysis
to CEOAE’s has been exhaustively presented; here, a more
theoretical background is illustrated to justify our final choice.
Basically, a few time-frequency distributions—the short-time
Fourier transform (STFT), the wavelet transform (WT), the
Wigner–Ville distribution, and the Choi–Williams distribution
(CWD)—are considered. The relative performances (such as
the resolution in the time and frequency domains) of these
methods are compared in some specific situations by means
of both simulated signals and real CEOAE’s. Finally, examples
of application of time-frequency distributions to various kind
of CEOAE’s (from adults, neonates, and hearing-impaired
subjects) are provided and discussed.
The paper is organized as follows. A brief mathematical
background of the time-frequency distributions taken into
consideration is presented in Section II. Section III deals with
the detailed description of the material used in this study
(i.e., simulated signals and real CEOAE’s), the description
of the analysis windows used for the STFT and WT, and the
definition of the quantitative measures of time and frequency
resolutions. Results from simulations and examples of the applications of the WT to CEOAE’s are described in Sections IV
and V, respectively.
II. TIME-FREQUENCY DISTRIBUTIONS
MATHEMATICAL BACKGROUND
This section deals with a brief description of the principal
features of the time-frequency distributions that were used in
0018–9294/98$10.00 1998 IEEE
TOGNOLA et al.: WAVELET ANALYSIS OF CEOAE’S
687
this study. For fundamentals or a more detailed description
see [17].
A. General Remarks
Time-frequency distributions are conventionally classified
into two categories: linear and quadratic. Linear timefrequency distributions satisfy the superposition or linearity
principle, whereas quadratic (or energetic) time-frequency
distributions describe a signal in terms of its time-frequency
energy distribution and satisfy the quadratic superposition
principle [17]. In this case, the time-frequency distribution
of the signal contains two types of terms: the auto-terms
(i.e., the true time-frequency distribution of each signal
component) and the interference terms (terms of disturbance).
The number of interference terms grows quadratically with
the number of signal components. Interference terms may
overlap with auto-terms thus increasing the difficulty in the
analysis of the time-frequency distribution. As a general
remark, there exists a tradeoff between interference terms
and time-frequency resolution since an attenuation in the
interference terms inevitably leads to a worsening of the
time-frequency resolution.
Also, for any time-frequency distribution there is a tradeoff between time resolution and frequency resolution. Time
and frequency resolution
cannot be simulresolution
taneously arbitrarily good since it is proved that they satisfy
the uncertainty principle [18]
(The WT was originally introduced as a time-scale version
[23], [24], which can be obtained from the time-frequency
version (4) by introducing the scale parameter
.)
—called
The analysis (or basis) functions
wavelets—are scaled and shifted versions of the same pro, the mother wavelet.
is a function
totype function
; its FT is a
with finite energy and centered around time
bandpass function centered around frequency . The wavelets
have a constant relative bandwidth, i.e., the quality factor
( center frequency/bandwidth) is constant.
Time and frequency resolutions satisfy the uncertainty principle (1), but, unlike in the STFT, they are not fixed over
the entire time-frequency plane: time resolution becomes good
at higher frequencies whereas frequency resolution becomes
good at lower frequencies.
The energetic version of the WT is the scalogram, defined
as the squared magnitude of the WT
SCAL
WT
The scalogram is a quadratic distribution; like in the STFT,
the interference terms are restricted to the area of the timefrequency plane where the time-frequency distributions of the
auto-terms overlap.
can be reconstructed from its WT [25] by
A signal
(6)
(1)
Expression (1) means that an improvement in time resolution
results in a loss of frequency resolution, and vice versa.
(5)
where
is a constant that depends only on the FT of
.
D. The Wigner Distribution
B. The Short-Time Fourier Transform
The STFT is a linear time-frequency distribution. The STFT
is defined as
of the signal
STFT
The Wigner distribution (WD) is a quadratic distribution
and is defined as [26]
WD
(7)
(2)
is the sliding analysis window and
denotes
where
complex conjugation. Time and frequency resolutions depend
(see [20] for an
on the length and on the bandwidth of
exhaustive discussion). Since the analysis window
is the
same for all the analysis frequencies , time and frequency
resolutions are fixed on the time-frequency plane once the
has been chosen.
analysis window
The energetic version of the STFT is the spectrogram,
defined as the squared magnitude of the STFT
SPEC
STFT
(3)
It can be demonstrated [21], [22] that the interference terms
of the spectrogram are oscillatory structures and occur if the
overlap between the transforms of the auto-terms is not zero.
The WD has a very high time-frequency concentration [27],
whereas SPEC and SCAL introduce some broadening with
respect to time and frequency. The main drawback of the
WD is the presence of cross-terms. In the WD cross-terms
have oscillations of relatively high frequencies and can have a
peak value as high as twice that of the auto-term. Cross-terms
are present even if the signal components do not overlap in
the time-frequency plane. They can be attenuated by means
of a smoothing which is a sort of two-dimension low-pass
filtering. The smoothing is achieved by convolving the WD
[17]
with a kernel
SWD
(8)
C. The Wavelet Transform
The WT is a linear distribution and is defined as
WT
(4)
The smoothing results in a loss of time-frequency concentration. In particular, a broad smoothing over the time-frequency
plane yields good attenuation of interference terms but poor
time-frequency concentration; on the contrary, a narrow
688
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 45, NO. 6, JUNE 1998
smoothing yields poor interference terms attenuation and good
time-frequency concentration. The most common smoothed
Wigner–Ville distributions (SWD’s) are the following.
• The smoothed pseudo-WD (SPWD) [28] for which
. The smoothing along
the time and frequency direction is determined by the
length of the windows
and
. A particular case
of the SPWD is the so-called pseudo-WD (PWD), where
imp
(i.e., there is no smoothing along time
direction).
• The Choi–Williams distribution (CWD) [29], [30] for
. The smoothwhich
ing is controlled by the parameter
. A good
compromise between time-frequency resolution and interference term attenuation can be obtained for
[29].
III. MATERIALS
AND
METHODS
Both synthesized and real OAE’s are examined with the
distributions described in Section II.
A. Synthesized Signals
The following synthesized signal were considered.
1) Sum of three tone-bursts
(9)
where
for
and
for
.
2) Sum of two tones and two Dirac impulses
Fig. 1. Simulated CEOAE obtained by the summation of five gammatones
5
3
02 f t 1 cos(2fi t);
i (t): x(t) =
i=1 i (t), where i (t) = a 1 t 1 e
= 0:1; a = (2fi )3:5 , and f105 = 1:0; 1:5; 2:2; 3:3; and 5:0 kHz are the
central frequencies. The trace in the first row is the simulated CEOAE, the
traces below are two gammatone-components with central frequencies equal
to 1.0 and 5.0 kHz. Note that the higher the central frequency the shorter is
the duration of the gammatone and the wider is the spectrum.
where
is a
constant,
, and is the central frequency
of the th gammatone (see Fig. 1).
B. Real Data
(10)
3) Sum of two chirp signals
(11)
where
.
4) Synthesized CEOAE. Some model studies suggest that
the emissions evoked by a click can be viewed as the
sum of the single responses of the emission generators
that are excited by the acoustic stimulus (see, e.g., [31]
and [32]). Each generator behaves as a narrow bandpass filter and is characterized by a specific central
frequency, which varies along the basilar membrane.
The central frequency (or characteristic frequency) of a
cochlear filter is inversely proportional to the distance
from the stapes. At a very first approximation, it has
been shown by us and by others [1], [4], [33] that a
CEOAE can be synthesized by summing the impulse
responses of gammatone functions. We have considered
here a synthesized CEOAE obtained by the summation
of a number of gammatones
(12)
CEOAE’s are recorded using a probe inserted into the
outer ear canal. The probe contains a miniaturized microphone
and a transmitter that delivered the acoustic stimulus. Eight
normal adults, 333 full-term babies at the third day on end,
and 13 hearing-impaired subjects were tested with OAE’s.
For normal and hearing-impaired adults, measurements were
done in a sound-proof cabin with the subject seated in an
armchair during the recording session, which lasted for about
20 min. For neonates, measurements were done in a quiet
room, close to the nursery. The recording session (monoaural
measurements) lasted for about 5 min. In the present study,
CEOAE’s were recorded using a standard ILO88 system
(Otodynamics Ltd.). Clicks were delivered at different intensities (from 47 to 80 dB SPL in 3-dB steps, for normal
adults; at 80-dB SPL, for neonates; at 83 dB SPL for hearingimpaired subjects). Responses were filtered with the ILO88
default procedure (second-order high-pass set at 330 Hz, gain
1.57 and fourth-order low-pass set at 10.6 kHz, gain 2.6)
and digitized at a rate of 25 000 samples/s. Responses to
260 repetitions of the click-train (four clicks per train) were
averaged according to the “linear” mode of operation for
normal adults; for neonates and hearing-impaired subjects the
most commonly used “nonlinear” mode of operation [8], [34],
[35] was used (in the “nonlinear” mode of operation a train
of three clicks followed by a click of greater amplitude and
inverted polarity are used. This method takes advantage of the
TOGNOLA et al.: WAVELET ANALYSIS OF CEOAE’S
689
nonlinear behavior of OAE’s). Finally, averaged data were
ms) and
windowed using the default ILO88 window (
digitally filtered off-line (second-order digital bandpass set at
Hz).
Hearing-impaired subjects, which suffered from noiseinduced hearing loss (due to weapon noise and occupational
noise), were analyzed also by means of a pure-tone audiogram
and a Békésy sweep-frequency audiogram. For all these
above which the hearing
patients, the cutoff frequency
loss was greater than 30 dB HL was determined.
C. Analysis Windows Used in the STFT and WT Computation
For the STFT, we have considered here four different
analysis windows: the Hamming, the Gauss, the Hann, and
the Kaiser window (for the analytic expressions, see [20]).
For the WT we have considered the following.
1) The Morlet wavelet [36]
(13)
2) The family of functions proposed by Wit et al. [1] and
by Tognola et al. [4]
(14)
and
are the centers of gravity of
, respectively
is the energy of
,
and its spectrum
(19)
(20)
Expressions (17) and (18) are equivalent to the variances
and
of the functions
. The time-bandwidth product
satisfies
the Heisenberg inequality (1). The value of the
product is an indicator of the performance of a time-frequency
distribution method, i.e., the smaller the
product the
higher the time-frequency resolution.
are
Typically, in the STFT case the analysis windows
is symmetric
real, even, and their energy spectrum
around zero. The centers of gravity are
and
.
Time and frequency resolutions are fixed over the entire timefrequency plane since the same window
is used at all
frequencies.
In the WT case, it can be demonstrated [23], [39], [40] that
and
are proportional to the scale parameter
3) The modified version of the Morlet wavelet [37]
and
(15)
where determines the duration of the Gaussian window.
Using the family of wavelet (15), Meste and colleagues [37]
have introduced a modified scalogram (MOD SCAL), which
combines together the WT obtained for two different values
of
MOD SCAL
WT
WT
(16)
where
. The MOD SCAL takes advantage of both
and good frequency resolution
good time resolution of
of
and, thus, can achieve a better time-frequency
resolution than the SCAL with the Morlet wavelet. However,
it shows more pronounced interference terms.
(21)
where
and
are the duration and bandwidth of the
, respectively. For the most commonly
mother wavelet
used wavelet functions
. If the wavelet is real, expressions
(18) and (20) must be modified. Actually, the spectrum
of a real function is even and has two peaks (one for positive
and one for negative frequencies at
frequencies at
). As a result, if expressions (18) and (20) are used,
and
will be zero. Therefore, it is more appropriate to
replace (18) with [23], [38]
(22)
On the contrary, if
are still applicable.
is analytic, expressions (18) and (20)
D. Resolution Properties
A quantitative measure of the time-frequency resolution
has been derived for the STFT and WT. Time-frequency
resolutions of the STFT and the WT depend critically on
the choice of the analysis window and the mother wavelet,
respectively. The duration and the bandwidth of the analysis
window (or the wavelet) determine the local resolutions of the
transforms around any time-frequency point over the timefrequency plane. The duration
and bandwidth
of a
can be expressed in terms of root mean
generic function
square (rms) values [23], [38], [39]
(17)
(18)
IV. SIMULATIONS
A. Resolution Properties
(rms values), and
are listed in
Values of
Tables I and II for the STFT and WT, respectively. For the
and
are constant over the entire timeSTFT (Table I),
frequency plane and depend only on the type and duration of
the window. On the contrary, time and frequency resolutions
of the WT (Table II) are not fixed. Performances of the
Hamming and the Gauss windows are very similar. Their
products are the smallest ones in comparison with
the other STFT windows. As a result, the Hamming and
the Gauss windows shows a good compromise between time
and frequency resolution. Although the
product is a
good indicator of the performance of a method, the choice
690
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 45, NO. 6, JUNE 1998
TABLE I
TIME RESOLUTION t, FREQUENCY RESOLUTION f (rms MEASURES),
AND
t 1 f PRODUCT FOR DIFFERENT STFT ANALYSIS WINDOWS
1 1
1
1
TABLE II
TIME RESOLUTION t, FREQUENCY RESOLUTION f (RMS
MEASURES), AND t 1 f PRODUCT AT DIFFERENT ANALYSIS
FREQUENCIES FOR DIFFERENT WAVELET FUNCTIONS
1
1 1
1
For the analytic expression of the analysis windows, see [20]. Note that the
product has a lower bound of 1/4 ( 0.080).
of a window must take into account also the peculiar timefrequency features of the signal to be analyzed. Typically,
in the CEOAE case, the signal has a duration of 20 ms
and a bandwidth of 0.5/5.0 kHz (0.5/6 kHz, for neonates).
1.0–1.5 ms and 100 Hz could be reasonable values for time
and frequency resolution, respectively. The 5-ms Hamming
and Gauss windows fulfill the above requirements (for the
1.57 ms and
100.9 Hz; for
Hamming window,
the Gauss window,
1.44 ms and
109.7 Hz). On
the contrary, the 2.5-ms Hamming window, which exhibits
the smallest
product, has a poor frequency resolution
(
201.7 Hz).
The
products of all the considered wavelet functions
(Table II) are notably smaller than for the STFT windows (for
ranges from 0.080 to 0.106; for the STFT,
the WT,
ranges from 0.1573 to 0.1637). In particular, the
products of Morlet wavelets
and
are the smallest ones and approach the value of the lower
bound (see (9)). As expected, the Morlet wavelet for
has a good time resolution, whereas for
the wavelet has
a good frequency resolution. For the family of wavelets (14)
) the best performance is
(i.e.,
reached for
(see Table III). Even if the
product
of this wavelet (i.e., for
) is slightly bigger than for
the Morlet wavelets (0.084 versus 0.080), its time-frequency
resolution properties are fairly good (Table II). On average, its
time resolution is 0.84 ms and frequency resolution is 136.92
Hz. On the contrary, the Morlet wavelet for
has the
best time resolution (0.32
0.23 ms), but a poor frequency
the Morlet
resolution (337.6 178.0 Hz), whereas for
wavelet has the best frequency resolution (67.5
35.6 Hz),
but a poor time resolution (1.62
1.15 ms).
B. Simulated Signals
Figs. 2 and 3 show the time-frequency distributions of signal
(11) and the simulated CEOAE (12) results from signals (9)
and
values have been computed only for the characteristic frequencies of
typical OAE signal. Note that the
product has a lower bound of 1/4
( 0.080).
TABLE III
1t 1f PRODUCT FOR THE WAVELETS OF THE FAMILY (14),
1
THAT IS,
Note that the
FOR
DIFFERENT VALUES
OF THE
PARAMETER n
product has a lower bound of 1/4
( 0.080).
and (10) were omitted because are similar). Hereafter, the
will be
wavelet obtained from the family (14) for
referred as to the “proposed wavelet.” For all the considered signals, it can be noted that: 1) The proposed wavelet
[Figs. 2(a) and 3(a)] shows a good compromise between
time-frequency resolution and interference term attenuation.
As a result, signal components are accurately resolved both
in time and frequency domain. 2) The Morlet wavelet for
[Figs. 2(b) and 3(b)] has the best time resolution
among the other wavelets but it suffers from a very poor
frequency resolution. 3) On the contrary, the Morlet wavelet
[Figs. 2(c) and 3(c)] has the best frequency
for
resolution among the other wavelets, but it suffers from a
very poor time resolution. 4) The MOD SCAL [Figs. 2(d)
and 3(d)] takes advantage of both the good time resolution
and the good frequency
of the Morlet wavelet for
. As expected, MOD SCAL
resolution obtained with
exhibits more interference terms than the Morlet wavelet for
, and the proposed wavelet. 5) Although the
SPEC [Figs. 2(e) and 3(e)] is not particularly contaminated by
interference terms, its time-frequency resolution is definitely
TOGNOLA et al.: WAVELET ANALYSIS OF CEOAE’S
691
(a)
(e)
(b)
(f)
(c)
(g)
(d)
(h)
Fig. 2. Time-frequency distributions of simulated signal x(t) = x1 (t) + x2 (t), where xi (t) = ej [21(f + t)1t] ; f1 = f2 = 1 kHz, 1 = 8 1 105
kHz/ms, and 2 = 4 1 105 kHz/ms. (a) SCAL (proposed wavelet); (b) SCAL (Morlet wavelet, = 1); (c) SCAL (Morlet wavelet = 5); (d) MOD SCAL
(1 = 1 and 2 = 5); (e) SPEC (Hamming window, 5 ms); (f) WD; (g) SPWD; (h) CWD ( = 2).
692
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 45, NO. 6, JUNE 1998
(a)
(e)
(b)
(f)
(c)
(g)
(d)
(h)
=
1); (c) SCAL
Fig. 3. Time-frequency distributions of a simulated CEOAE (see Fig. 1). (a) SCAL (proposed wavelet); (b) SCAL (Morlet wavelet,
(Morlet wavelet = 5); (d) MOD SCAL (1 = 1 and 2 = 5); (e) SPEC (Hamming window, 5 ms); (f) WD; (g) SPWD; (h) CWD ( = 1).
TOGNOLA et al.: WAVELET ANALYSIS OF CEOAE’S
(a)
693
(b)
Fig. 4. CEOAE’s from (a) a normal hearing adult and (b) a full-term baby. To reduce the influence of the stimulus artifact, responses have been windowed
2.5/20 ms post-stimulus time. In each row, two replicate recordings from the same ear (A and B replicate recordings in ILO equipment) are superimposed.
Numbers on the left of each panel are the reproducibility values (in percentage points) between the two replicates.
worse than the proposed wavelet and the MOD SCAL. 6) If
compared with all the other time-frequency distributions, the
WD [Figs. 2(f) and 3(f)] has the best time-frequency resolution
but is highly corrupted by the interference terms. Interference
terms are notably reduced in the SPWD [Figs. 2(g) and 3(g)]
and the CWD [Figs. 2(h) and 3(h)], but the time-frequency
concentration in these last two cases is lower than for the
proposed wavelet.
V. APPLICATIONS
TO
REAL CEOAE’S
Results from Sections IV-A and IV-B have revealed that
the proposed wavelet can yield a fairly accurate description of
the time-frequency features of a multicomponent signal. The
particular structure of the wavelets filters (narrow bandwidth
and long duration for low-frequency filters; broad bandwidth
and brief duration for high-frequency filters) makes the WT
approach highly suitable for signals with low- frequency components of long duration and high-frequency components of
brief duration, as in the case of OAE’s evoked by clicks (see,
the simulated CEOAE in Fig. 1). Also, it can be demonstrated
[40] that, at a very first approximation, the human ear analyzes
sounds by means of a sort of WT.
In this section, applications of the proposed wavelet to
typical CEOAE’s from normal adults, full-term neonates, and
hearing-impaired subjects are presented. Fig. 4 shows two
typical examples of CEOAE’s from a normal adult (subject
A030R1, female, 25 years old) and a full-term neonate (subject
N360L0, female). Adult CEOAE’s show a clear frequency
dispersion, i.e., the presence of high frequency components at
shorter latencies than low-frequency components. Frequency
dispersion is less pronounced in the OAE response of the
neonate: the response has a typical burst-like behavior and
presents a sustained activity up to 20 ms (and probably more,
but our analysis window is limited to this upper value). Timefrequency distribution of the adult subject [Fig. 5(a)] shows the
presence of several components in the 0.5–5.0-kHz range, with
a predominance in the 1.0–2.0-kHz region. Low-frequency
components (see, for example, the 1.0 kHz component) have
a longer duration than high-frequency components (see, for
example, the 3.5-kHz component) and reach the maximal
amplitude at longer latencies than high-frequency components.
(a)
(b)
Fig. 5. (a) Time-frequency distribution (energy density, normalized arbitrary
units) of a CEOAE at 80-dB SPL of subject A030R1 (normal hearing adult).
(b) Time-frequency distribution (energy density, normalized arbitrary units)
of a CEOAE at 80-dB SPL of subject N360L0 (full-term neonate).
As expected, the time-frequency structure of the neonatal
CEOAE [Fig. 5(b)] is different: the region of the predominant
components is shifted toward higher frequencies (from 2.5 to
5.0 kHz) than in the adult response.
Although the three-dimensional energy distributions yield
an immediate representation of the time-frequency structure
of a signal, in some circumstances it may be useful to have a
more precise description of the behavior of each single signal
component.
694
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 45, NO. 6, JUNE 1998
Fig. 6. Elementary components of the emissions of a normal hearing adult (subject A030R0, 80-dB SPL), a full-term neonate (subject N360L0, 80-dB SPL),
and a hearing-impaired adult (subject P300P4, 83-dB SPL). In each panel and each row two traces are superimposed. The two traces in the first row from top are
the two original emission replicates (A and B in the ILO equipment); the traces below are the elementary components derived from each replicate. To reduce the
number of plots in each panel, OAE components are derived from 500-Hz-wide bands instead of 200-Hz-wide bands (as stated in the text); the central frequency
of each band is shown on the right side of the figure. The numbers on the left are the correlation values (in percentage points) between the two replicates.
To this purpose, a method based on the inverse WT (6)
has been developed for the decomposition of a signal into
elementary components (for details, see [4] and [5]). In this
study, CEOAE’s are decomposed into 200-Hz-wide components in the 0.5–5.0-kHz range. The temporal behavior of
the elementary components is shown in Fig. 6 for the two
previous subjects and for a hearing-impaired subject suffering
from noise-induced hearing loss. The correlation between the
reconstructed CEOAE (obtained by the summation of all the
elementary components) and the original CEOAE is greater
than 99% for all the examined subjects.
To better analyze the relative contribution of each elementary component to the compound CEOAE, rms values
of the elementary components were computed. Results in
Fig. 7 show that for normal adults the greatest contribution
is associated to the lowest frequencies (i.e., around 1.0–2.0
kHz), whereas for neonates the greatest contribution is shifted
toward higher frequencies, from 1.5 to 4 kHz. It is interesting
to note that the majority of spontaneous OAE’s (i.e., OAE’s
emitted in absence of any stimulation) are found in the 1.0–2.0kHz region for adults and in the 2.0–4.0-kHz region for
neonates [41]. As a consequence, the greater amplitude of
the midfrequency CEOAE components may be a result of
the synchronous capturing of multiple spontaneous OAE’s.
Moreover, it is believed that the frequency content of a
CEOAE mainly reflects the middle ear transfer function, which
reaches the most efficient transmission just in the 1.0–1.5-kHz
region for adults [42], whereas for neonates is shifted toward
higher frequencies [43].
Emissions of subject A030R1 (normal adult) are characterized by a mid-duration with a progressive amplitude
attenuation (Fig. 6). The activity associated to low-frequency
components (1.0/1.5 kHz) lasts about 10/15 ms and exhibits a
Fig. 7. RMS values (mean values) of elementary components from eight
normal hearing adults (dotted line) and 333 full-term babies (solid line).
For each component, rms was computed in the 2.5–20-ms window. For both
groups, CEOAE’s were evoked at 80-dB SPL.
maximum around 8/10 ms. On the contrary, the first 2.5–6 ms
are dominated by high-frequency components whose maxima
are reached at about 4.5 ms. Decomposition of CEOAE into
elementary components can be useful to study the latency
of OAE components (defined as the time interval from the
stimulus onset to the maximum of the envelope of the same
component). Analysis of the relation between the latency of
the elementary component and the frequency of the component reveals (Fig. 8) that latency is inversely proportional to
TOGNOLA et al.: WAVELET ANALYSIS OF CEOAE’S
695
Fig. 8. Pooled CEOAE latency data from eight normal hearing adults. The
0:01). Stimulus
solid curves are the exponential regression fit (Fisher, p
levels range from 48-dB SPL (upper trace) to 80-dB SPL (bottom trace). Note
that data are plotted over a logarithmic scale.
frequency (more precisely, latency is inversely proportional to
the logarithm of frequency). This trend of shorter latencies for
higher frequencies is very similar to the relation between the
characteristic frequency of the cochlear filters and their spatial
location along the basilar membrane. The latency is stimulus
dependent, in the sense that an increase in the stimulus level is
accompanied by a progressive shortening of latency (Fig. 8).
Similar results can be obtained also with other types of OAE’s,
for example with tone-burst OAE’s [44] and constant tones
evoked OAE’s [45]. More interestingly, it has been shown
[4], [5] that latency data of CEOAE components is in close
agreement with latency data derived from electrophysiological
measurements (compound actions potentials [46] and auditory
brain stem responses [47]).
Also, the decomposition of CEOAE’s into elementary components can give an accurate estimate of the test-retest correlation of frequency bands. Since OAE’s are signals with a
very good intra-subject reproducibility, the correlation between
two OAE replicates is generally high and can be used as
an indicator of the value of the signal-to-noise ratio of the
recording. A low correlation is typically associated either to
bad recording conditions or absence of a true cochlear response
[48]. In normal ears and with good-quality emissions, the
reproducibility value is typically greater than about 70% and
a good reproducibility should be found also for the frequency
bands in the 1.0/4.0 kHz range (1.5/4.5 kHz for neonates)
[49], [10]. Our results from normal adults (Fig. 9) indicate
that the reproducibility is very high ( 80%) for all bands.
For the adult responses, high correlations are found also for
the nondominant components (see, for example, in Fig. 9,
the components at 0.5 kHz and 3.0/5.0 kHz. In neonatal
responses (Fig. 9), a good reproducibility ( 80%) is found
in the 1.5/4.5 kHz range. OAE’s from neonates are typically
noisier than in adults. This is due to both environmental
noise (OAE’s are usually recorded in the nursery and not
in a cabin booth) and to patient noise (such as, snoring,
sneezing, cable rub., etc.), which typically affected the lower
frequencies. For neonates, the nondominant components (i.e.,
the components 1.5 kHz and 4.5 kHz) are characterized
by a poor correlation, this being probably due to a different
input-output “transfer function” of the neonatal end organ and
to a huge patient noise.
6
Fig. 9. Reproducibility (mean values s.d.) as a function of the frequency
of the elementary components from eight normal hearing adults and 333
full-term babies.
The proposed approach can be useful to reveal differences
between CEOAE’s of normal and pathological ears. Results
from a few subjects suffering from noise-induced hearing loss
showed differences both in the time-frequency structure of
the CEOAE’s and the values of correlation in the various
frequency bands. As an example, Fig. 10 illustrates the timefrequency distribution of the CEOAE of a patient (P300P4)
suffering from noise-induced hearing loss. For this patient the
frequency above which hearing loss was greater than 30-dB
HL was 2.5 kHz. His audiogram was characterized by normal
hearing thresholds up to 2.5 kHz, hearing loss 30-dB HL in
a 1.0-octave-band centered around 3.5 kHz, and quite normal
thresholds ( 20-dB) at 6 and 8 kHz. Lack of OAE response
at frequencies greater than
can be easily observed in the
time-frequency distribution [Fig. 10(a)]. This is emphasized
by the analysis of the elementary components in Fig. 10(b).
In particular, the reproducibility is high for the components
lower than or equal to
(i.e., at frequencies at which the
hearing threshold level is normal), has a sudden decrease at
frequencies greater than , exhibits an evident notch around
3.5 kHz, and seems to recover at the highest frequencies (this
being probably due to the quite good hearing threshold at
6 and 8 kHz). However, it can be observed that correlation
of the “normal” frequency components (i.e., the components
at frequency lower than ) is slightly lower than for the
normal adult. This may indicates that some modifications could
be occurred also at the sites where hearing threshold was
supposed to be normal [50].
VI. CONCLUSION
A few basic time-frequency distribution methods—the
STFT, the WT, the WD, the SPWD, and the CWD—are
presented and compared on a basis of several simulated signal
with the aim of identify a method to analyze otoacoustic
emissions. Both simulations and quantitative estimates of the
time-frequency resolution properties revealed that there is
696
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 45, NO. 6, JUNE 1998
the examination of pathological responses. In particular, by
means of the inverse WT it is possible to decompose the
CEOAE responses into elementary components and to study
their temporal behavior. This approach is useful to describe
the time-frequency structure of OAE responses by means of
quantitative data, such as, for example, latency measures and
estimates of level of noise in frequency bands.
ACKNOWLEDGMENT
(a)
The authors would like to thank G. Pastorino, P. Sergi,
and G. Montanari from the Service of Neurophysiopathology
of the Istituti Clinici di Perfezionamento, Milan, for providing the Normal CEOAE’s from full-term neonates. They
would also like to thank P. Avan from the Laboratory of
Biophysics, University of Auvergne, Clermont-Ferrand, for
providing pathological data. This work was done within the
framework of the European Concerted Action AHEAD, Biomedicine and Health Programme of the European Commission.
A more detailed analysis of this set of data will be presented
in a separate publication.
REFERENCES
(b)
Fig. 10. (a) Time-frequency distribution (energy density, normalized arbitrary units) of the CEOAE at 83-dB SPL of subject P300P4 (suffering from
noise-induced hearing loss). (b) Reproducibility as a function of the frequency
of the elementary components for the pathologic subject P300P4 and control
subjects (same as in Fig. 9).
not an optimal method in a absolute sense. The choice of
a particular approach must inevitably take into account the
time-frequency properties of the signal to be analyzed. The
particular structure of CEOAE’s requires a method able to
discriminate both high-frequency components of brief duration
and low-frequency components of long duration. In other
words, it is required both a “good” time resolution and a
“good” frequency resolution. In the evaluation of a method of
time-frequency analysis, interference terms play an important
role since they can have a peak value as high as twice
that of the auto-term and, thus, they can totally obscure the
“true” time-frequency distribution. Among the methods briefly
examined here, the WT seems to be the best compromise
between time-frequency resolution and interference terms
attenuation. In addition, the peculiar structure of the wavelet
analysis filters makes this approach very suitable for the
analysis of CEOAE’s.
Applications of WT to CEOAE’s range from the extraction
of normative parameters from both adult and neonates to
[1] H. P. Wit, P. van Dijk, and P. Avan, “Wavelet analysis of real and
synthesised click evoked otoacoustic emissions,” Hearing Res., vol. 73,
pp. 141–147, 1994.
[2] E. G. Pasanen, J. D. Travis, and R. J. Thornhill, “Wavelet-type analysis
of transient-evoked otoacoustic emissions,” Biomed. Sci. Instrum., vol.
30, pp. 75–80, 1994.
[3] J. Cheng, “Time-frequency analysis of transient evoked otoacoustic
emissions via smoothed pseudo Wigner distribution,” Scand. Audiol.,
vol. 24, pp. 91–96, 1995.
[4] G. Tognola, F. Grandori, and P. Ravazzani, “Time-frequency distributions of click-evoked otoacoustic emissions,” Hearing Res., vol. 106,
pp. 112–122, 1997.
, “Latency distribution of click-evoked otoacoustic emissions,” in
[5]
Proc. 1996 IEEE Eng. Med. Biol., Amsterdam, the Netherlands, 1996.
[6] D. T. Kemp, “Evidence of mechanical nonlinearity and frequency
selective wave amplification in the cochlea,” Arch. Otolaryngol., vol.
224, pp. 37–45, 1979.
[7]
, “Stimulated acoustic emissions from within the human auditory
system,” J. Acoust. Soc. Am., vol. 64, pp. 1386–1391, 1978.
[8] F. Grandori, G. Cianfrone, and D. T. Kemp Eds., Cochlear Mechanisms
and Otoacoustic Emissions. Basel, Switzerland: Karger, 1990.
[9] S. J. Norton, “Application of transient evoked otoacoustic emissions to
pediatric populations,” Ear Hearing, vol. 14, pp. 64–73, 1993.
[10] K. R. White, B. R. Vohr, and T. R. Behrens, “Universal newborn hearing
screening using transient evoked otoacoustic emissions: Results of the
Rhode Island Hearing Assessment Project,” Semin. Hearing, vol. 14,
pp. 18–29, 1993.
[11] B. Engdahl and D. T. Kemp, “The effect of noise exposure on the details
of distortion product otoacoustic emissions in humans,” J. Acoust. Soc.
Amer., vol. 99, pp. 1573–1587, 1996.
[12] M. A. Hotz, “Monitoring the effects of noise exposure using transiently evoked otoacoustic emissions,” Arch. Otolaryngol., vol. 113, pp.
478–482, 1993.
[13] R. Rubsamen, D. M. Mills, and E. W. Rubel, “Effects of furosemide
on distortion product otoacoustic emissions and on neuronal responses
in the anteroventral cochlear nucleus,” J. Neurophysiol., vol. 74, pp.
1628–1638, 1995.
[14] R. Hauser, R. Probst, and F. J. Harris, “Influence of general anesthesia on
transiently evoked otoacoustic emissions in humans,” Ann. Otol. Rhinol.
Laryngol., vol. 101, pp. 994–999, 1992.
[15] F. Grandori, “Nonlinear phenomena in click and tone-burst evoked
otoacoustic emissions from human ears,” Audiol., vol. 24, pp. 71–80,
1985.
[16] F. Grandori and A. Antonelli, “Temporal stability, influence of the head
position and modeling considerations for evoked otoacoustic emissions,”
Scand. Audiol., vol. 25, pp. 97–108, 1986.
TOGNOLA et al.: WAVELET ANALYSIS OF CEOAE’S
[17] F. Hlawatsch and G. F. Boudreaux-Bartels, “Linear and quadratic timefrequency signal representations,” IEEE Signal Processing Mag., vol. 9,
pp. 21–67, 1992.
[18] A. Papoulis, Signal Analysis. New York: McGraw-Hill, 1977.
[19] D. Gabor, “Theory of communication,” J. Inst. Elect. Eng, vol. 93, pp.
429–457, 1946.
[20] F. J. Harris, “On the use of windows for harmonic analysis with the
discrete Fourier transform,” in Proc. IEEE, 1978, vol. 66, pp. 51–83.
[21] F. Hlawatsch and P. Flandrin, “The interference structure of the Wigner
distribution and related time-frequency signal representations,” in The
Wigner Distribution—Theory and Applications in Signal Processing,
W. Mecklenbrauker, Ed. Amsterdam, the Netherlands: North Holland/Elsevier, 1992.
[22] S. Kadambe and G. F. Boudreaux-Bartels, “A comparison of the
existence of ‘cross terms’ in the Wigner distribution and the squared
magnitude of the wavelet transform and the short-time Fourier transform,” IEEE Trans. Signal Processing, vol. 40, pp. 2498–2516, 1992.
[23] I. Daubechies, “The wavelet transform, time-frequency localization and
signal analysis,” IEEE Trans. Inform. Theory, vol. 36, pp. 961–1005,
1990.
[24] S. G. Mallat, “A theory for multiresolution signal decomposition: The
wavelet representation,” IEEE Pattern Anal. Machine Intell., vol. 11,
pp. 674–693, 1989.
[25] O. Rioul and M. Vetterli, “Wavelets and signal processing,” IEEE Signal
Processing Mag., vol. 8, pp. 14–38, 1991.
[26] T. A. C. M. Claasen and W. F. G. Mecklenbrauker, “The Wigner distribution—A tool for time-frequency signal analysis. Part I: Continuoustime signals,” Philips J. Res., vol. 35, pp. 217–250, 1980.
[27] D. L. Jones and T. W. Parks, “A resolution comparison of several timefrequency representations,” IEEE Trans. Signal Processing, vol. 40, pp.
413–420, 1992.
[28] W. Martin and P. Flandrin, “Wigner-Ville spectral analysis of nonstationary processes,” IEEE Trans. Acoust., Speech, Signal Processing, vol.
ASSP-33, pp. 1461–1470, 1985.
[29] H. I. Choi and W. J. Williams, “Improved time-frequency representation
of multicomponent signals using exponential kernels,” IEEE Trans.
Acoust., Speech, Signal Processing, vol. 37, pp. 862–871, 1989.
[30] J. Jeong and W. J. Williams, “Kernel design for reduced interference
distributions,” IEEE Trans. Signal Processing, vol. 40, pp. 402–412,
1992.
[31] E. Zwicker, “Delayed evoked oto-acoustic emissions and their suppression by Gaussian-shaped pressure impulses,” Hearing Res., vol. 11, pp.
359–371, 1983.
[32] E. Zwicker, “A hardware cochlear nonlinear preprocessing model with
active feedback,” J. Acoust. Soc. Amer., vol. 80, pp. 154–162, 1986.
[33] R. D. Patterson, K. Robinson, J. Holdsworth, D. McKeown, C. Zhang,
and M. Allerhand, “Complex sounds and arbitrary images,” in Auditory
Physiology and Perception, Y. Cazals, L. Demany, and K. Horner, Eds.
Oxford, U.K.: Pergamon, 1992, pp. 429–446.
[34] F. Grandori and P. Ravazzani, “Nonlinearities of click-evoked otoacoustic emissions and the derived nonlinear technique,” Br. J. Audiol., vol.
27, pp. 97–102, 1993.
[35] P. Ravazzani, G. Tognola, and F. Grandori, “‘Derived nonlinear’ versus
‘linear’ click-evoked otoacoustic emissions,” Audiol., vol. 35, pp. 73–86,
1996.
[36] R. Kronland-Martinet, J. Morlet, and A. Grossmann, “Analysis of sound
patterns through wavelet transforms,” Int. J. Pattern Recogn., Artificial
Intell., vol. 1, pp. 273–301, 1987.
[37] O. Meste, H. Rix, P. Caminal, and N. V. Thakor, “Ventricular late
potentials characterization in time-frequency domain by means of a
wavelet transform,” IEEE Trans. Biomed. Eng., vol. 41, pp. 625–633,
1994.
[38] N. Hess-Nielsen and M. V. Wickerhauser, “Wavelets and time-frequency
analysis,” Proc. IEEE, vol. 84, pp. 523–540, 1996.
[39] B. Jawerth and W. Sweldens, “An overview of wavelet-based multiresolution analyzes,” SIAM Rev., vol. 36, pp. 377–416, 1994.
[40] I. Daubechies, Ten lectures on Wavelets. Philadelphia, PA: CBMS,
SIAM, 1992.
[41] M. R. Kok, G. A. van Zanten, and M. P. Brocaar, “Aspects of
spontaneous otoacoustic emissions in healthy newborns,” Hearing Res.,
vol. 69, pp. 115–123, 1993.
[42] D. T. Kemp, P. Bray, L. Alexander, and A. M. Brown, “Acoustic
emissions cochleography: Practical aspects,” in Cochlear Mechanics
and Otoacoustic Emissions, G. Cianfrone and F. Grandori, Eds., Scand.
Audiol. (Suppl. 25), pp. 71–95, 1986.
[43] T. Morlet, L. Collet, B. Salle, and A. Morgon, “Functional maturation
of cochlear active mechanisms and of the medial olivocochlear system
in humans,” Acta Otolaryngol.,, vol. 113, pp. 271–277, 1993.
697
[44] S. T. Neely, S. J. Norton, M. P. Gorga, and W. Jesteadt, “Latency
of auditory brain-stem responses and otoacoustic emissions using toneburst stimuli,” J. Acoust. Soc. Amer., vol. 83, pp. 652–656, 1988.
[45] D. Brass and D. T. Kemp, “Time-domain observation of otoacoustic
emissions during constant tone stimulation,” J. Acoust. Soc. Amer., vol.
90, pp. 2415–2427, 1991.
[46] J. J. Eggermont, “Analysis of compound action potential responses to
tone bursts in the human and guinea pig cochlea,” J. Acoust. Soc. Amer.,
vol. 60, pp. 1132–1139, 1976.
[47] M. Don and J. J. Eggermont, “Analysis of the click-evoked brainstem
potentials in man using high-pass noise masking,” J. Acoust. Soc. Amer.,
vol. 63, pp. 1084–1092, 1978.
[48] G. Tognola, P. Ravazzani, and F. Grandori, “An optimal filtering
technique to reduce the influence of low-frequency noise on clickevoked otoacoustic emissions,” Br. J. Audiol., vol. 29, pp. 153–160,
1995a.
[49] B. R. Vohr, K. R. White, A. B. Maxon, and M. J. Johnson, “Factors
affecting the interpretation of transient evoked otoacoustic emissions
results in neonatal hearing screening,” Semin. Hearing, vol. 14, pp.
57–72, 1993.
[50] P. Avan, P. Bonfils, D. Loth, and M. François, “Temporal structure of
transient evoked otoacoustic emissions: Relationship to basal cochlear
function,” in Advances in Otoacoustic Emission-Fundamentals and Clinical Applications, F. Grandori, L. Collet, D. T. Kemp, G. Salomon, K.
Schorn, and R. Thornton, Eds. Lecco, Italy: Casa editrice G. Stefanoni,
1994, pp. 85–94.
Gabriella Tognola was born in 1969 in Italy. She
received the M.Sc. degree in electronic engineering
from the Polytechnic of Milan, Italy, in 1993. She
is currently a Ph.D. degree student at Department
of Biomedical Engineering of the Polytechnic of
Milan, Italy.
Since 1993, she joined the Department of
Biomedical Engineering of the Polytechnic of
Milan, Italy. Her primary research interests are
in techniques of signal processing for biomedical
signals, time-frequency and time-scale representations, analysis and modeling of auditory functions, otoacoustic emissions,
speech, and EEG.
Ferdinando Grandori was born in Milan, Italy, in
1946. He received the doctoral degree in electronic
engineering from the Polytechnic of Milan, Italy, in
1970.
He joined the Department of Electronics of the
Polytechnic of Milan in 1970. Since 1976 he has
been a Researcher of the Italian National Research
Council (CNR) at the Centre of System Theory,
Milan, Italy. Since 1997 he has been the Director
of the CNR Centre of Biomedical Engineering. His
research interests include techniques of signal processing for evoked potentials, methods of source localization for bioelectrical
signals, models of auditory functions, otoacoustic emissions, and magnetic
stimulation of the nervous system.
Paolo Ravazzani was born in Milan in 1961. He
received the doctoral degree in electronic engineering in 1988 and the Ph.D. degree in biomedical
engineering in 1996 from the Polytechnic of Milan.
In 1988 he joined the Department of Electronics
(now Department of Biomedical Engineering) of the
Polytechnic of Milan. He is currently a Researcher
of the Italian National Research Council (CNR)
at the Center of Biomedical Engineering. His researches concern analysis and modeling of magnetic
stimulation of the nervous system and of auditory
functions, otoacoustic emissions, analysis of EEG and evoked potentials, and
biomedical signal processing.