Academia.eduAcademia.edu

Wavelet analysis of click-evoked otoacoustic emissions

1998, IEEE Transactions on Biomedical Engineering

Time-frequency distribution methods are being widely used for the analysis of a variety of biomedical signals. Recently, they have been applied also to study otoacoustic emissions (OAE's), the active acoustic response of the hearing end organ. Click-evoked otoacoustic emissions (CEOAE's) are time-varying signals with a clear frequency dispersion along with the time axis. Analysis of CEOAE's is of considerable interest due to their close relation with cochlear mechanisms. In this paper, several basic time-frequency distribution methods are considered and compared on the basis of both simulated signals and real CEOAE's. The particular structure of CEOAE's requires a method with both a satisfactory time and frequency resolution. Results from simulations and real CEOAE's revealed that the wavelet approach is highly suitable for the analysis of such signals. Some examples of the application of the wavelet transform to CEOAE's are provided here. Applications range from the extraction of normative data from adult and neonatal OAE's to the extraction of quantitative parameters for clinical purposes.

686 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 45, NO. 6, JUNE 1998 Wavelet Analysis of Click-Evoked Otoacoustic Emissions Gabriella Tognola,* Ferdinando Grandori, and Paolo Ravazzani Abstract— Time-frequency distribution methods are being widely used for the analysis of a variety of biomedical signals. Recently, they have been applied also to study otoacoustic emissions (OAE’s), the active acoustic response of the hearing end organ. Click-evoked otoacoustic emissions (CEOAE’s) are time-varying signals with a clear frequency dispersion along with the time axis. Analysis of CEOAE’s is of considerable interest due to their close relation with cochlear mechanisms. In this paper, several basic time-frequency distribution methods are considered and compared on the basis of both simulated signals and real CEOAE’s. The particular structure of CEOAE’s requires a method with both a satisfactory time and frequency resolution. Results from simulations and real CEOAE’s revealed that the wavelet approach is highly suitable for the analysis of such signals. Some examples of the application of the wavelet transform to CEOAE’s are provided here. Applications range from the extraction of normative data from adult and neonatal OAE’s to the extraction of quantitative parameters for clinical purposes. Index Terms— Choi–Williams distribution, click-evoked otoacoustic emissions, full-term neonates, noise-induced hearing loss, normal adults, short-time Fourier transform, time-frequency resolution properties, wavelet transform, Wigner–Ville distribution. I. INTRODUCTION D URING the last decade, time-frequency distribution methods are being systematically used for the analysis of otoacoustic emissions (OAE’s) [1]–[5]. OAE’s are acoustic signals emitted by the cochlea and reflect the active processes that are involved in the transduction of mechanical energy into electrical energy [6]–[8]. One of the most attractive features of OAE’s is their tight relation to the cochlear status: OAE’s are universally present to a various degree in all healthy cochleae, whereas they are not generally observed or are greatly reduced in ears with mild hearing losses. This aspect, together with the extreme facility to perform the test and the high reproducibility both on short- and long-term, has made the OAE’s an increasingly widespread neonatal hearing screening method [9], [10]. Due to their good long-term reproducibility, OAE’s can also be used to monitor the cochlear functionality in patients exposed to prolonged noise (see, e.g., [11] and [12]) or ototoxic agents (see, e.g., [13] and [14]). OAE’s can be classified according to the type of the stimulus that elicits them; in the case of click-evoked otoacoustic Manuscript received February 19, 1997; revised January 13, 1998. Asterisk indicates corresponding author. *G. Tognola is with the Department of Biomedical Engineering, Polytechnic of Milan, Piazza Leonardo da Vinci, 32, I-20133 Milan, Italy (e-mail: tognola@biomed.polimi.it). F. Grandori and P. Ravazzani are with the CNR Center of Biomedical Engineering, Polytechnic of Milan, I-20133 Milan, Italy. Publisher Item Identifier S 0018-9294(98)03614-3. emissions (CEOAE’s), the stimulus is an acoustic click of brief duration (about 100 s). CEOAE’s are nonstationary signals and exhibit a clear frequency dispersion along with time: in the OAE response to a click, a predominance of high-frequency components at short latencies and low-frequency components at longer latencies is observed. Interestingly, this distribution of frequencies at different latencies is very similar to the placefrequency distribution along the cochlea [6], [7], [15], [16]. Analysis of the time-frequency properties of CEOAE’s is, therefore, of considerable interest due to their close relation with cochlear mechanisms. In particular, since OAE’s in response to clicks evoke a cumulative response from the whole cochlea, the analysis of CEOAE’s can yield a global view of cochlear function. On the other hand, measurements of timefrequency properties of CEOAE’s have encountered a variety of technical problems such as the difficulty in determining the contribution of each single elementary frequency component. To obtain accurate results, appropriate techniques of signal processing are required. In a recent paper [4], application of the wavelet analysis to CEOAE’s has been exhaustively presented; here, a more theoretical background is illustrated to justify our final choice. Basically, a few time-frequency distributions—the short-time Fourier transform (STFT), the wavelet transform (WT), the Wigner–Ville distribution, and the Choi–Williams distribution (CWD)—are considered. The relative performances (such as the resolution in the time and frequency domains) of these methods are compared in some specific situations by means of both simulated signals and real CEOAE’s. Finally, examples of application of time-frequency distributions to various kind of CEOAE’s (from adults, neonates, and hearing-impaired subjects) are provided and discussed. The paper is organized as follows. A brief mathematical background of the time-frequency distributions taken into consideration is presented in Section II. Section III deals with the detailed description of the material used in this study (i.e., simulated signals and real CEOAE’s), the description of the analysis windows used for the STFT and WT, and the definition of the quantitative measures of time and frequency resolutions. Results from simulations and examples of the applications of the WT to CEOAE’s are described in Sections IV and V, respectively. II. TIME-FREQUENCY DISTRIBUTIONS MATHEMATICAL BACKGROUND This section deals with a brief description of the principal features of the time-frequency distributions that were used in 0018–9294/98$10.00  1998 IEEE TOGNOLA et al.: WAVELET ANALYSIS OF CEOAE’S 687 this study. For fundamentals or a more detailed description see [17]. A. General Remarks Time-frequency distributions are conventionally classified into two categories: linear and quadratic. Linear timefrequency distributions satisfy the superposition or linearity principle, whereas quadratic (or energetic) time-frequency distributions describe a signal in terms of its time-frequency energy distribution and satisfy the quadratic superposition principle [17]. In this case, the time-frequency distribution of the signal contains two types of terms: the auto-terms (i.e., the true time-frequency distribution of each signal component) and the interference terms (terms of disturbance). The number of interference terms grows quadratically with the number of signal components. Interference terms may overlap with auto-terms thus increasing the difficulty in the analysis of the time-frequency distribution. As a general remark, there exists a tradeoff between interference terms and time-frequency resolution since an attenuation in the interference terms inevitably leads to a worsening of the time-frequency resolution. Also, for any time-frequency distribution there is a tradeoff between time resolution and frequency resolution. Time and frequency resolution cannot be simulresolution taneously arbitrarily good since it is proved that they satisfy the uncertainty principle [18] (The WT was originally introduced as a time-scale version [23], [24], which can be obtained from the time-frequency version (4) by introducing the scale parameter .) —called The analysis (or basis) functions wavelets—are scaled and shifted versions of the same pro, the mother wavelet. is a function totype function ; its FT is a with finite energy and centered around time bandpass function centered around frequency . The wavelets have a constant relative bandwidth, i.e., the quality factor ( center frequency/bandwidth) is constant. Time and frequency resolutions satisfy the uncertainty principle (1), but, unlike in the STFT, they are not fixed over the entire time-frequency plane: time resolution becomes good at higher frequencies whereas frequency resolution becomes good at lower frequencies. The energetic version of the WT is the scalogram, defined as the squared magnitude of the WT SCAL WT The scalogram is a quadratic distribution; like in the STFT, the interference terms are restricted to the area of the timefrequency plane where the time-frequency distributions of the auto-terms overlap. can be reconstructed from its WT [25] by A signal (6) (1) Expression (1) means that an improvement in time resolution results in a loss of frequency resolution, and vice versa. (5) where is a constant that depends only on the FT of . D. The Wigner Distribution B. The Short-Time Fourier Transform The STFT is a linear time-frequency distribution. The STFT is defined as of the signal STFT The Wigner distribution (WD) is a quadratic distribution and is defined as [26] WD (7) (2) is the sliding analysis window and denotes where complex conjugation. Time and frequency resolutions depend (see [20] for an on the length and on the bandwidth of exhaustive discussion). Since the analysis window is the same for all the analysis frequencies , time and frequency resolutions are fixed on the time-frequency plane once the has been chosen. analysis window The energetic version of the STFT is the spectrogram, defined as the squared magnitude of the STFT SPEC STFT (3) It can be demonstrated [21], [22] that the interference terms of the spectrogram are oscillatory structures and occur if the overlap between the transforms of the auto-terms is not zero. The WD has a very high time-frequency concentration [27], whereas SPEC and SCAL introduce some broadening with respect to time and frequency. The main drawback of the WD is the presence of cross-terms. In the WD cross-terms have oscillations of relatively high frequencies and can have a peak value as high as twice that of the auto-term. Cross-terms are present even if the signal components do not overlap in the time-frequency plane. They can be attenuated by means of a smoothing which is a sort of two-dimension low-pass filtering. The smoothing is achieved by convolving the WD [17] with a kernel SWD (8) C. The Wavelet Transform The WT is a linear distribution and is defined as WT (4) The smoothing results in a loss of time-frequency concentration. In particular, a broad smoothing over the time-frequency plane yields good attenuation of interference terms but poor time-frequency concentration; on the contrary, a narrow 688 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 45, NO. 6, JUNE 1998 smoothing yields poor interference terms attenuation and good time-frequency concentration. The most common smoothed Wigner–Ville distributions (SWD’s) are the following. • The smoothed pseudo-WD (SPWD) [28] for which . The smoothing along the time and frequency direction is determined by the length of the windows and . A particular case of the SPWD is the so-called pseudo-WD (PWD), where imp (i.e., there is no smoothing along time direction). • The Choi–Williams distribution (CWD) [29], [30] for . The smoothwhich ing is controlled by the parameter . A good compromise between time-frequency resolution and interference term attenuation can be obtained for [29]. III. MATERIALS AND METHODS Both synthesized and real OAE’s are examined with the distributions described in Section II. A. Synthesized Signals The following synthesized signal were considered. 1) Sum of three tone-bursts (9) where for and for . 2) Sum of two tones and two Dirac impulses Fig. 1. Simulated CEOAE obtained by the summation of five gammatones 5 3 02 f t 1 cos(2fi t); i (t): x(t) = i=1 i (t), where i (t) = a 1 t 1 e = 0:1; a = (2fi )3:5 , and f105 = 1:0; 1:5; 2:2; 3:3; and 5:0 kHz are the central frequencies. The trace in the first row is the simulated CEOAE, the traces below are two gammatone-components with central frequencies equal to 1.0 and 5.0 kHz. Note that the higher the central frequency the shorter is the duration of the gammatone and the wider is the spectrum. where is a constant, , and is the central frequency of the th gammatone (see Fig. 1). B. Real Data (10) 3) Sum of two chirp signals (11) where . 4) Synthesized CEOAE. Some model studies suggest that the emissions evoked by a click can be viewed as the sum of the single responses of the emission generators that are excited by the acoustic stimulus (see, e.g., [31] and [32]). Each generator behaves as a narrow bandpass filter and is characterized by a specific central frequency, which varies along the basilar membrane. The central frequency (or characteristic frequency) of a cochlear filter is inversely proportional to the distance from the stapes. At a very first approximation, it has been shown by us and by others [1], [4], [33] that a CEOAE can be synthesized by summing the impulse responses of gammatone functions. We have considered here a synthesized CEOAE obtained by the summation of a number of gammatones (12) CEOAE’s are recorded using a probe inserted into the outer ear canal. The probe contains a miniaturized microphone and a transmitter that delivered the acoustic stimulus. Eight normal adults, 333 full-term babies at the third day on end, and 13 hearing-impaired subjects were tested with OAE’s. For normal and hearing-impaired adults, measurements were done in a sound-proof cabin with the subject seated in an armchair during the recording session, which lasted for about 20 min. For neonates, measurements were done in a quiet room, close to the nursery. The recording session (monoaural measurements) lasted for about 5 min. In the present study, CEOAE’s were recorded using a standard ILO88 system (Otodynamics Ltd.). Clicks were delivered at different intensities (from 47 to 80 dB SPL in 3-dB steps, for normal adults; at 80-dB SPL, for neonates; at 83 dB SPL for hearingimpaired subjects). Responses were filtered with the ILO88 default procedure (second-order high-pass set at 330 Hz, gain 1.57 and fourth-order low-pass set at 10.6 kHz, gain 2.6) and digitized at a rate of 25 000 samples/s. Responses to 260 repetitions of the click-train (four clicks per train) were averaged according to the “linear” mode of operation for normal adults; for neonates and hearing-impaired subjects the most commonly used “nonlinear” mode of operation [8], [34], [35] was used (in the “nonlinear” mode of operation a train of three clicks followed by a click of greater amplitude and inverted polarity are used. This method takes advantage of the TOGNOLA et al.: WAVELET ANALYSIS OF CEOAE’S 689 nonlinear behavior of OAE’s). Finally, averaged data were ms) and windowed using the default ILO88 window ( digitally filtered off-line (second-order digital bandpass set at Hz). Hearing-impaired subjects, which suffered from noiseinduced hearing loss (due to weapon noise and occupational noise), were analyzed also by means of a pure-tone audiogram and a Békésy sweep-frequency audiogram. For all these above which the hearing patients, the cutoff frequency loss was greater than 30 dB HL was determined. C. Analysis Windows Used in the STFT and WT Computation For the STFT, we have considered here four different analysis windows: the Hamming, the Gauss, the Hann, and the Kaiser window (for the analytic expressions, see [20]). For the WT we have considered the following. 1) The Morlet wavelet [36] (13) 2) The family of functions proposed by Wit et al. [1] and by Tognola et al. [4] (14) and are the centers of gravity of , respectively is the energy of , and its spectrum (19) (20) Expressions (17) and (18) are equivalent to the variances and of the functions . The time-bandwidth product satisfies the Heisenberg inequality (1). The value of the product is an indicator of the performance of a time-frequency distribution method, i.e., the smaller the product the higher the time-frequency resolution. are Typically, in the STFT case the analysis windows is symmetric real, even, and their energy spectrum around zero. The centers of gravity are and . Time and frequency resolutions are fixed over the entire timefrequency plane since the same window is used at all frequencies. In the WT case, it can be demonstrated [23], [39], [40] that and are proportional to the scale parameter 3) The modified version of the Morlet wavelet [37] and (15) where determines the duration of the Gaussian window. Using the family of wavelet (15), Meste and colleagues [37] have introduced a modified scalogram (MOD SCAL), which combines together the WT obtained for two different values of MOD SCAL WT WT (16) where . The MOD SCAL takes advantage of both and good frequency resolution good time resolution of of and, thus, can achieve a better time-frequency resolution than the SCAL with the Morlet wavelet. However, it shows more pronounced interference terms. (21) where and are the duration and bandwidth of the , respectively. For the most commonly mother wavelet used wavelet functions . If the wavelet is real, expressions (18) and (20) must be modified. Actually, the spectrum of a real function is even and has two peaks (one for positive and one for negative frequencies at frequencies at ). As a result, if expressions (18) and (20) are used, and will be zero. Therefore, it is more appropriate to replace (18) with [23], [38] (22) On the contrary, if are still applicable. is analytic, expressions (18) and (20) D. Resolution Properties A quantitative measure of the time-frequency resolution has been derived for the STFT and WT. Time-frequency resolutions of the STFT and the WT depend critically on the choice of the analysis window and the mother wavelet, respectively. The duration and the bandwidth of the analysis window (or the wavelet) determine the local resolutions of the transforms around any time-frequency point over the timefrequency plane. The duration and bandwidth of a can be expressed in terms of root mean generic function square (rms) values [23], [38], [39] (17) (18) IV. SIMULATIONS A. Resolution Properties (rms values), and are listed in Values of Tables I and II for the STFT and WT, respectively. For the and are constant over the entire timeSTFT (Table I), frequency plane and depend only on the type and duration of the window. On the contrary, time and frequency resolutions of the WT (Table II) are not fixed. Performances of the Hamming and the Gauss windows are very similar. Their products are the smallest ones in comparison with the other STFT windows. As a result, the Hamming and the Gauss windows shows a good compromise between time and frequency resolution. Although the product is a good indicator of the performance of a method, the choice 690 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 45, NO. 6, JUNE 1998 TABLE I TIME RESOLUTION t, FREQUENCY RESOLUTION f (rms MEASURES), AND t 1 f PRODUCT FOR DIFFERENT STFT ANALYSIS WINDOWS 1 1 1 1 TABLE II TIME RESOLUTION t, FREQUENCY RESOLUTION f (RMS MEASURES), AND t 1 f PRODUCT AT DIFFERENT ANALYSIS FREQUENCIES FOR DIFFERENT WAVELET FUNCTIONS 1 1 1 1 For the analytic expression of the analysis windows, see [20]. Note that the product has a lower bound of 1/4 ( 0.080). of a window must take into account also the peculiar timefrequency features of the signal to be analyzed. Typically, in the CEOAE case, the signal has a duration of 20 ms and a bandwidth of 0.5/5.0 kHz (0.5/6 kHz, for neonates). 1.0–1.5 ms and 100 Hz could be reasonable values for time and frequency resolution, respectively. The 5-ms Hamming and Gauss windows fulfill the above requirements (for the 1.57 ms and 100.9 Hz; for Hamming window, the Gauss window, 1.44 ms and 109.7 Hz). On the contrary, the 2.5-ms Hamming window, which exhibits the smallest product, has a poor frequency resolution ( 201.7 Hz). The products of all the considered wavelet functions (Table II) are notably smaller than for the STFT windows (for ranges from 0.080 to 0.106; for the STFT, the WT, ranges from 0.1573 to 0.1637). In particular, the products of Morlet wavelets and are the smallest ones and approach the value of the lower bound (see (9)). As expected, the Morlet wavelet for has a good time resolution, whereas for the wavelet has a good frequency resolution. For the family of wavelets (14) ) the best performance is (i.e., reached for (see Table III). Even if the product of this wavelet (i.e., for ) is slightly bigger than for the Morlet wavelets (0.084 versus 0.080), its time-frequency resolution properties are fairly good (Table II). On average, its time resolution is 0.84 ms and frequency resolution is 136.92 Hz. On the contrary, the Morlet wavelet for has the best time resolution (0.32 0.23 ms), but a poor frequency the Morlet resolution (337.6 178.0 Hz), whereas for wavelet has the best frequency resolution (67.5 35.6 Hz), but a poor time resolution (1.62 1.15 ms). B. Simulated Signals Figs. 2 and 3 show the time-frequency distributions of signal (11) and the simulated CEOAE (12) results from signals (9) and values have been computed only for the characteristic frequencies of typical OAE signal. Note that the product has a lower bound of 1/4 ( 0.080). TABLE III 1t 1f PRODUCT FOR THE WAVELETS OF THE FAMILY (14), 1 THAT IS, Note that the FOR DIFFERENT VALUES OF THE PARAMETER n product has a lower bound of 1/4 ( 0.080). and (10) were omitted because are similar). Hereafter, the will be wavelet obtained from the family (14) for referred as to the “proposed wavelet.” For all the considered signals, it can be noted that: 1) The proposed wavelet [Figs. 2(a) and 3(a)] shows a good compromise between time-frequency resolution and interference term attenuation. As a result, signal components are accurately resolved both in time and frequency domain. 2) The Morlet wavelet for [Figs. 2(b) and 3(b)] has the best time resolution among the other wavelets but it suffers from a very poor frequency resolution. 3) On the contrary, the Morlet wavelet [Figs. 2(c) and 3(c)] has the best frequency for resolution among the other wavelets, but it suffers from a very poor time resolution. 4) The MOD SCAL [Figs. 2(d) and 3(d)] takes advantage of both the good time resolution and the good frequency of the Morlet wavelet for . As expected, MOD SCAL resolution obtained with exhibits more interference terms than the Morlet wavelet for , and the proposed wavelet. 5) Although the SPEC [Figs. 2(e) and 3(e)] is not particularly contaminated by interference terms, its time-frequency resolution is definitely TOGNOLA et al.: WAVELET ANALYSIS OF CEOAE’S 691 (a) (e) (b) (f) (c) (g) (d) (h) Fig. 2. Time-frequency distributions of simulated signal x(t) = x1 (t) + x2 (t), where xi (t) = ej [21(f + t)1t] ; f1 = f2 = 1 kHz, 1 = 8 1 105 kHz/ms, and 2 = 4 1 105 kHz/ms. (a) SCAL (proposed wavelet); (b) SCAL (Morlet wavelet,  = 1); (c) SCAL (Morlet wavelet  = 5); (d) MOD SCAL (1 = 1 and 2 = 5); (e) SPEC (Hamming window, 5 ms); (f) WD; (g) SPWD; (h) CWD ( = 2). 692 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 45, NO. 6, JUNE 1998 (a) (e) (b) (f) (c) (g) (d) (h) = 1); (c) SCAL Fig. 3. Time-frequency distributions of a simulated CEOAE (see Fig. 1). (a) SCAL (proposed wavelet); (b) SCAL (Morlet wavelet,  (Morlet wavelet  = 5); (d) MOD SCAL (1 = 1 and 2 = 5); (e) SPEC (Hamming window, 5 ms); (f) WD; (g) SPWD; (h) CWD ( = 1). TOGNOLA et al.: WAVELET ANALYSIS OF CEOAE’S (a) 693 (b) Fig. 4. CEOAE’s from (a) a normal hearing adult and (b) a full-term baby. To reduce the influence of the stimulus artifact, responses have been windowed 2.5/20 ms post-stimulus time. In each row, two replicate recordings from the same ear (A and B replicate recordings in ILO equipment) are superimposed. Numbers on the left of each panel are the reproducibility values (in percentage points) between the two replicates. worse than the proposed wavelet and the MOD SCAL. 6) If compared with all the other time-frequency distributions, the WD [Figs. 2(f) and 3(f)] has the best time-frequency resolution but is highly corrupted by the interference terms. Interference terms are notably reduced in the SPWD [Figs. 2(g) and 3(g)] and the CWD [Figs. 2(h) and 3(h)], but the time-frequency concentration in these last two cases is lower than for the proposed wavelet. V. APPLICATIONS TO REAL CEOAE’S Results from Sections IV-A and IV-B have revealed that the proposed wavelet can yield a fairly accurate description of the time-frequency features of a multicomponent signal. The particular structure of the wavelets filters (narrow bandwidth and long duration for low-frequency filters; broad bandwidth and brief duration for high-frequency filters) makes the WT approach highly suitable for signals with low- frequency components of long duration and high-frequency components of brief duration, as in the case of OAE’s evoked by clicks (see, the simulated CEOAE in Fig. 1). Also, it can be demonstrated [40] that, at a very first approximation, the human ear analyzes sounds by means of a sort of WT. In this section, applications of the proposed wavelet to typical CEOAE’s from normal adults, full-term neonates, and hearing-impaired subjects are presented. Fig. 4 shows two typical examples of CEOAE’s from a normal adult (subject A030R1, female, 25 years old) and a full-term neonate (subject N360L0, female). Adult CEOAE’s show a clear frequency dispersion, i.e., the presence of high frequency components at shorter latencies than low-frequency components. Frequency dispersion is less pronounced in the OAE response of the neonate: the response has a typical burst-like behavior and presents a sustained activity up to 20 ms (and probably more, but our analysis window is limited to this upper value). Timefrequency distribution of the adult subject [Fig. 5(a)] shows the presence of several components in the 0.5–5.0-kHz range, with a predominance in the 1.0–2.0-kHz region. Low-frequency components (see, for example, the 1.0 kHz component) have a longer duration than high-frequency components (see, for example, the 3.5-kHz component) and reach the maximal amplitude at longer latencies than high-frequency components. (a) (b) Fig. 5. (a) Time-frequency distribution (energy density, normalized arbitrary units) of a CEOAE at 80-dB SPL of subject A030R1 (normal hearing adult). (b) Time-frequency distribution (energy density, normalized arbitrary units) of a CEOAE at 80-dB SPL of subject N360L0 (full-term neonate). As expected, the time-frequency structure of the neonatal CEOAE [Fig. 5(b)] is different: the region of the predominant components is shifted toward higher frequencies (from 2.5 to 5.0 kHz) than in the adult response. Although the three-dimensional energy distributions yield an immediate representation of the time-frequency structure of a signal, in some circumstances it may be useful to have a more precise description of the behavior of each single signal component. 694 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 45, NO. 6, JUNE 1998 Fig. 6. Elementary components of the emissions of a normal hearing adult (subject A030R0, 80-dB SPL), a full-term neonate (subject N360L0, 80-dB SPL), and a hearing-impaired adult (subject P300P4, 83-dB SPL). In each panel and each row two traces are superimposed. The two traces in the first row from top are the two original emission replicates (A and B in the ILO equipment); the traces below are the elementary components derived from each replicate. To reduce the number of plots in each panel, OAE components are derived from 500-Hz-wide bands instead of 200-Hz-wide bands (as stated in the text); the central frequency of each band is shown on the right side of the figure. The numbers on the left are the correlation values (in percentage points) between the two replicates. To this purpose, a method based on the inverse WT (6) has been developed for the decomposition of a signal into elementary components (for details, see [4] and [5]). In this study, CEOAE’s are decomposed into 200-Hz-wide components in the 0.5–5.0-kHz range. The temporal behavior of the elementary components is shown in Fig. 6 for the two previous subjects and for a hearing-impaired subject suffering from noise-induced hearing loss. The correlation between the reconstructed CEOAE (obtained by the summation of all the elementary components) and the original CEOAE is greater than 99% for all the examined subjects. To better analyze the relative contribution of each elementary component to the compound CEOAE, rms values of the elementary components were computed. Results in Fig. 7 show that for normal adults the greatest contribution is associated to the lowest frequencies (i.e., around 1.0–2.0 kHz), whereas for neonates the greatest contribution is shifted toward higher frequencies, from 1.5 to 4 kHz. It is interesting to note that the majority of spontaneous OAE’s (i.e., OAE’s emitted in absence of any stimulation) are found in the 1.0–2.0kHz region for adults and in the 2.0–4.0-kHz region for neonates [41]. As a consequence, the greater amplitude of the midfrequency CEOAE components may be a result of the synchronous capturing of multiple spontaneous OAE’s. Moreover, it is believed that the frequency content of a CEOAE mainly reflects the middle ear transfer function, which reaches the most efficient transmission just in the 1.0–1.5-kHz region for adults [42], whereas for neonates is shifted toward higher frequencies [43]. Emissions of subject A030R1 (normal adult) are characterized by a mid-duration with a progressive amplitude attenuation (Fig. 6). The activity associated to low-frequency components (1.0/1.5 kHz) lasts about 10/15 ms and exhibits a Fig. 7. RMS values (mean values) of elementary components from eight normal hearing adults (dotted line) and 333 full-term babies (solid line). For each component, rms was computed in the 2.5–20-ms window. For both groups, CEOAE’s were evoked at 80-dB SPL. maximum around 8/10 ms. On the contrary, the first 2.5–6 ms are dominated by high-frequency components whose maxima are reached at about 4.5 ms. Decomposition of CEOAE into elementary components can be useful to study the latency of OAE components (defined as the time interval from the stimulus onset to the maximum of the envelope of the same component). Analysis of the relation between the latency of the elementary component and the frequency of the component reveals (Fig. 8) that latency is inversely proportional to TOGNOLA et al.: WAVELET ANALYSIS OF CEOAE’S 695 Fig. 8. Pooled CEOAE latency data from eight normal hearing adults. The 0:01). Stimulus solid curves are the exponential regression fit (Fisher, p levels range from 48-dB SPL (upper trace) to 80-dB SPL (bottom trace). Note that data are plotted over a logarithmic scale.  frequency (more precisely, latency is inversely proportional to the logarithm of frequency). This trend of shorter latencies for higher frequencies is very similar to the relation between the characteristic frequency of the cochlear filters and their spatial location along the basilar membrane. The latency is stimulus dependent, in the sense that an increase in the stimulus level is accompanied by a progressive shortening of latency (Fig. 8). Similar results can be obtained also with other types of OAE’s, for example with tone-burst OAE’s [44] and constant tones evoked OAE’s [45]. More interestingly, it has been shown [4], [5] that latency data of CEOAE components is in close agreement with latency data derived from electrophysiological measurements (compound actions potentials [46] and auditory brain stem responses [47]). Also, the decomposition of CEOAE’s into elementary components can give an accurate estimate of the test-retest correlation of frequency bands. Since OAE’s are signals with a very good intra-subject reproducibility, the correlation between two OAE replicates is generally high and can be used as an indicator of the value of the signal-to-noise ratio of the recording. A low correlation is typically associated either to bad recording conditions or absence of a true cochlear response [48]. In normal ears and with good-quality emissions, the reproducibility value is typically greater than about 70% and a good reproducibility should be found also for the frequency bands in the 1.0/4.0 kHz range (1.5/4.5 kHz for neonates) [49], [10]. Our results from normal adults (Fig. 9) indicate that the reproducibility is very high ( 80%) for all bands. For the adult responses, high correlations are found also for the nondominant components (see, for example, in Fig. 9, the components at 0.5 kHz and 3.0/5.0 kHz. In neonatal responses (Fig. 9), a good reproducibility ( 80%) is found in the 1.5/4.5 kHz range. OAE’s from neonates are typically noisier than in adults. This is due to both environmental noise (OAE’s are usually recorded in the nursery and not in a cabin booth) and to patient noise (such as, snoring, sneezing, cable rub., etc.), which typically affected the lower frequencies. For neonates, the nondominant components (i.e., the components 1.5 kHz and 4.5 kHz) are characterized by a poor correlation, this being probably due to a different input-output “transfer function” of the neonatal end organ and to a huge patient noise. 6 Fig. 9. Reproducibility (mean values s.d.) as a function of the frequency of the elementary components from eight normal hearing adults and 333 full-term babies. The proposed approach can be useful to reveal differences between CEOAE’s of normal and pathological ears. Results from a few subjects suffering from noise-induced hearing loss showed differences both in the time-frequency structure of the CEOAE’s and the values of correlation in the various frequency bands. As an example, Fig. 10 illustrates the timefrequency distribution of the CEOAE of a patient (P300P4) suffering from noise-induced hearing loss. For this patient the frequency above which hearing loss was greater than 30-dB HL was 2.5 kHz. His audiogram was characterized by normal hearing thresholds up to 2.5 kHz, hearing loss 30-dB HL in a 1.0-octave-band centered around 3.5 kHz, and quite normal thresholds ( 20-dB) at 6 and 8 kHz. Lack of OAE response at frequencies greater than can be easily observed in the time-frequency distribution [Fig. 10(a)]. This is emphasized by the analysis of the elementary components in Fig. 10(b). In particular, the reproducibility is high for the components lower than or equal to (i.e., at frequencies at which the hearing threshold level is normal), has a sudden decrease at frequencies greater than , exhibits an evident notch around 3.5 kHz, and seems to recover at the highest frequencies (this being probably due to the quite good hearing threshold at 6 and 8 kHz). However, it can be observed that correlation of the “normal” frequency components (i.e., the components at frequency lower than ) is slightly lower than for the normal adult. This may indicates that some modifications could be occurred also at the sites where hearing threshold was supposed to be normal [50]. VI. CONCLUSION A few basic time-frequency distribution methods—the STFT, the WT, the WD, the SPWD, and the CWD—are presented and compared on a basis of several simulated signal with the aim of identify a method to analyze otoacoustic emissions. Both simulations and quantitative estimates of the time-frequency resolution properties revealed that there is 696 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 45, NO. 6, JUNE 1998 the examination of pathological responses. In particular, by means of the inverse WT it is possible to decompose the CEOAE responses into elementary components and to study their temporal behavior. This approach is useful to describe the time-frequency structure of OAE responses by means of quantitative data, such as, for example, latency measures and estimates of level of noise in frequency bands. ACKNOWLEDGMENT (a) The authors would like to thank G. Pastorino, P. Sergi, and G. Montanari from the Service of Neurophysiopathology of the Istituti Clinici di Perfezionamento, Milan, for providing the Normal CEOAE’s from full-term neonates. They would also like to thank P. Avan from the Laboratory of Biophysics, University of Auvergne, Clermont-Ferrand, for providing pathological data. This work was done within the framework of the European Concerted Action AHEAD, Biomedicine and Health Programme of the European Commission. A more detailed analysis of this set of data will be presented in a separate publication. REFERENCES (b) Fig. 10. (a) Time-frequency distribution (energy density, normalized arbitrary units) of the CEOAE at 83-dB SPL of subject P300P4 (suffering from noise-induced hearing loss). (b) Reproducibility as a function of the frequency of the elementary components for the pathologic subject P300P4 and control subjects (same as in Fig. 9). not an optimal method in a absolute sense. The choice of a particular approach must inevitably take into account the time-frequency properties of the signal to be analyzed. The particular structure of CEOAE’s requires a method able to discriminate both high-frequency components of brief duration and low-frequency components of long duration. In other words, it is required both a “good” time resolution and a “good” frequency resolution. In the evaluation of a method of time-frequency analysis, interference terms play an important role since they can have a peak value as high as twice that of the auto-term and, thus, they can totally obscure the “true” time-frequency distribution. Among the methods briefly examined here, the WT seems to be the best compromise between time-frequency resolution and interference terms attenuation. In addition, the peculiar structure of the wavelet analysis filters makes this approach very suitable for the analysis of CEOAE’s. Applications of WT to CEOAE’s range from the extraction of normative parameters from both adult and neonates to [1] H. P. Wit, P. van Dijk, and P. Avan, “Wavelet analysis of real and synthesised click evoked otoacoustic emissions,” Hearing Res., vol. 73, pp. 141–147, 1994. [2] E. G. Pasanen, J. D. Travis, and R. J. Thornhill, “Wavelet-type analysis of transient-evoked otoacoustic emissions,” Biomed. Sci. Instrum., vol. 30, pp. 75–80, 1994. [3] J. Cheng, “Time-frequency analysis of transient evoked otoacoustic emissions via smoothed pseudo Wigner distribution,” Scand. Audiol., vol. 24, pp. 91–96, 1995. [4] G. Tognola, F. Grandori, and P. Ravazzani, “Time-frequency distributions of click-evoked otoacoustic emissions,” Hearing Res., vol. 106, pp. 112–122, 1997. , “Latency distribution of click-evoked otoacoustic emissions,” in [5] Proc. 1996 IEEE Eng. Med. Biol., Amsterdam, the Netherlands, 1996. [6] D. T. Kemp, “Evidence of mechanical nonlinearity and frequency selective wave amplification in the cochlea,” Arch. Otolaryngol., vol. 224, pp. 37–45, 1979. [7] , “Stimulated acoustic emissions from within the human auditory system,” J. Acoust. Soc. Am., vol. 64, pp. 1386–1391, 1978. [8] F. Grandori, G. Cianfrone, and D. T. Kemp Eds., Cochlear Mechanisms and Otoacoustic Emissions. Basel, Switzerland: Karger, 1990. [9] S. J. Norton, “Application of transient evoked otoacoustic emissions to pediatric populations,” Ear Hearing, vol. 14, pp. 64–73, 1993. [10] K. R. White, B. R. Vohr, and T. R. Behrens, “Universal newborn hearing screening using transient evoked otoacoustic emissions: Results of the Rhode Island Hearing Assessment Project,” Semin. Hearing, vol. 14, pp. 18–29, 1993. [11] B. Engdahl and D. T. Kemp, “The effect of noise exposure on the details of distortion product otoacoustic emissions in humans,” J. Acoust. Soc. Amer., vol. 99, pp. 1573–1587, 1996. [12] M. A. Hotz, “Monitoring the effects of noise exposure using transiently evoked otoacoustic emissions,” Arch. Otolaryngol., vol. 113, pp. 478–482, 1993. [13] R. Rubsamen, D. M. Mills, and E. W. Rubel, “Effects of furosemide on distortion product otoacoustic emissions and on neuronal responses in the anteroventral cochlear nucleus,” J. Neurophysiol., vol. 74, pp. 1628–1638, 1995. [14] R. Hauser, R. Probst, and F. J. Harris, “Influence of general anesthesia on transiently evoked otoacoustic emissions in humans,” Ann. Otol. Rhinol. Laryngol., vol. 101, pp. 994–999, 1992. [15] F. Grandori, “Nonlinear phenomena in click and tone-burst evoked otoacoustic emissions from human ears,” Audiol., vol. 24, pp. 71–80, 1985. [16] F. Grandori and A. Antonelli, “Temporal stability, influence of the head position and modeling considerations for evoked otoacoustic emissions,” Scand. Audiol., vol. 25, pp. 97–108, 1986. TOGNOLA et al.: WAVELET ANALYSIS OF CEOAE’S [17] F. Hlawatsch and G. F. Boudreaux-Bartels, “Linear and quadratic timefrequency signal representations,” IEEE Signal Processing Mag., vol. 9, pp. 21–67, 1992. [18] A. Papoulis, Signal Analysis. New York: McGraw-Hill, 1977. [19] D. Gabor, “Theory of communication,” J. Inst. Elect. Eng, vol. 93, pp. 429–457, 1946. [20] F. J. Harris, “On the use of windows for harmonic analysis with the discrete Fourier transform,” in Proc. IEEE, 1978, vol. 66, pp. 51–83. [21] F. Hlawatsch and P. Flandrin, “The interference structure of the Wigner distribution and related time-frequency signal representations,” in The Wigner Distribution—Theory and Applications in Signal Processing, W. Mecklenbrauker, Ed. Amsterdam, the Netherlands: North Holland/Elsevier, 1992. [22] S. Kadambe and G. F. Boudreaux-Bartels, “A comparison of the existence of ‘cross terms’ in the Wigner distribution and the squared magnitude of the wavelet transform and the short-time Fourier transform,” IEEE Trans. Signal Processing, vol. 40, pp. 2498–2516, 1992. [23] I. Daubechies, “The wavelet transform, time-frequency localization and signal analysis,” IEEE Trans. Inform. Theory, vol. 36, pp. 961–1005, 1990. [24] S. G. Mallat, “A theory for multiresolution signal decomposition: The wavelet representation,” IEEE Pattern Anal. Machine Intell., vol. 11, pp. 674–693, 1989. [25] O. Rioul and M. Vetterli, “Wavelets and signal processing,” IEEE Signal Processing Mag., vol. 8, pp. 14–38, 1991. [26] T. A. C. M. Claasen and W. F. G. Mecklenbrauker, “The Wigner distribution—A tool for time-frequency signal analysis. Part I: Continuoustime signals,” Philips J. Res., vol. 35, pp. 217–250, 1980. [27] D. L. Jones and T. W. Parks, “A resolution comparison of several timefrequency representations,” IEEE Trans. Signal Processing, vol. 40, pp. 413–420, 1992. [28] W. Martin and P. Flandrin, “Wigner-Ville spectral analysis of nonstationary processes,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-33, pp. 1461–1470, 1985. [29] H. I. Choi and W. J. Williams, “Improved time-frequency representation of multicomponent signals using exponential kernels,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 37, pp. 862–871, 1989. [30] J. Jeong and W. J. Williams, “Kernel design for reduced interference distributions,” IEEE Trans. Signal Processing, vol. 40, pp. 402–412, 1992. [31] E. Zwicker, “Delayed evoked oto-acoustic emissions and their suppression by Gaussian-shaped pressure impulses,” Hearing Res., vol. 11, pp. 359–371, 1983. [32] E. Zwicker, “A hardware cochlear nonlinear preprocessing model with active feedback,” J. Acoust. Soc. Amer., vol. 80, pp. 154–162, 1986. [33] R. D. Patterson, K. Robinson, J. Holdsworth, D. McKeown, C. Zhang, and M. Allerhand, “Complex sounds and arbitrary images,” in Auditory Physiology and Perception, Y. Cazals, L. Demany, and K. Horner, Eds. Oxford, U.K.: Pergamon, 1992, pp. 429–446. [34] F. Grandori and P. Ravazzani, “Nonlinearities of click-evoked otoacoustic emissions and the derived nonlinear technique,” Br. J. Audiol., vol. 27, pp. 97–102, 1993. [35] P. Ravazzani, G. Tognola, and F. Grandori, “‘Derived nonlinear’ versus ‘linear’ click-evoked otoacoustic emissions,” Audiol., vol. 35, pp. 73–86, 1996. [36] R. Kronland-Martinet, J. Morlet, and A. Grossmann, “Analysis of sound patterns through wavelet transforms,” Int. J. Pattern Recogn., Artificial Intell., vol. 1, pp. 273–301, 1987. [37] O. Meste, H. Rix, P. Caminal, and N. V. Thakor, “Ventricular late potentials characterization in time-frequency domain by means of a wavelet transform,” IEEE Trans. Biomed. Eng., vol. 41, pp. 625–633, 1994. [38] N. Hess-Nielsen and M. V. Wickerhauser, “Wavelets and time-frequency analysis,” Proc. IEEE, vol. 84, pp. 523–540, 1996. [39] B. Jawerth and W. Sweldens, “An overview of wavelet-based multiresolution analyzes,” SIAM Rev., vol. 36, pp. 377–416, 1994. [40] I. Daubechies, Ten lectures on Wavelets. Philadelphia, PA: CBMS, SIAM, 1992. [41] M. R. Kok, G. A. van Zanten, and M. P. Brocaar, “Aspects of spontaneous otoacoustic emissions in healthy newborns,” Hearing Res., vol. 69, pp. 115–123, 1993. [42] D. T. Kemp, P. Bray, L. Alexander, and A. M. Brown, “Acoustic emissions cochleography: Practical aspects,” in Cochlear Mechanics and Otoacoustic Emissions, G. Cianfrone and F. Grandori, Eds., Scand. Audiol. (Suppl. 25), pp. 71–95, 1986. [43] T. Morlet, L. Collet, B. Salle, and A. Morgon, “Functional maturation of cochlear active mechanisms and of the medial olivocochlear system in humans,” Acta Otolaryngol.,, vol. 113, pp. 271–277, 1993. 697 [44] S. T. Neely, S. J. Norton, M. P. Gorga, and W. Jesteadt, “Latency of auditory brain-stem responses and otoacoustic emissions using toneburst stimuli,” J. Acoust. Soc. Amer., vol. 83, pp. 652–656, 1988. [45] D. Brass and D. T. Kemp, “Time-domain observation of otoacoustic emissions during constant tone stimulation,” J. Acoust. Soc. Amer., vol. 90, pp. 2415–2427, 1991. [46] J. J. Eggermont, “Analysis of compound action potential responses to tone bursts in the human and guinea pig cochlea,” J. Acoust. Soc. Amer., vol. 60, pp. 1132–1139, 1976. [47] M. Don and J. J. Eggermont, “Analysis of the click-evoked brainstem potentials in man using high-pass noise masking,” J. Acoust. Soc. Amer., vol. 63, pp. 1084–1092, 1978. [48] G. Tognola, P. Ravazzani, and F. Grandori, “An optimal filtering technique to reduce the influence of low-frequency noise on clickevoked otoacoustic emissions,” Br. J. Audiol., vol. 29, pp. 153–160, 1995a. [49] B. R. Vohr, K. R. White, A. B. Maxon, and M. J. Johnson, “Factors affecting the interpretation of transient evoked otoacoustic emissions results in neonatal hearing screening,” Semin. Hearing, vol. 14, pp. 57–72, 1993. [50] P. Avan, P. Bonfils, D. Loth, and M. François, “Temporal structure of transient evoked otoacoustic emissions: Relationship to basal cochlear function,” in Advances in Otoacoustic Emission-Fundamentals and Clinical Applications, F. Grandori, L. Collet, D. T. Kemp, G. Salomon, K. Schorn, and R. Thornton, Eds. Lecco, Italy: Casa editrice G. Stefanoni, 1994, pp. 85–94. Gabriella Tognola was born in 1969 in Italy. She received the M.Sc. degree in electronic engineering from the Polytechnic of Milan, Italy, in 1993. She is currently a Ph.D. degree student at Department of Biomedical Engineering of the Polytechnic of Milan, Italy. Since 1993, she joined the Department of Biomedical Engineering of the Polytechnic of Milan, Italy. Her primary research interests are in techniques of signal processing for biomedical signals, time-frequency and time-scale representations, analysis and modeling of auditory functions, otoacoustic emissions, speech, and EEG. Ferdinando Grandori was born in Milan, Italy, in 1946. He received the doctoral degree in electronic engineering from the Polytechnic of Milan, Italy, in 1970. He joined the Department of Electronics of the Polytechnic of Milan in 1970. Since 1976 he has been a Researcher of the Italian National Research Council (CNR) at the Centre of System Theory, Milan, Italy. Since 1997 he has been the Director of the CNR Centre of Biomedical Engineering. His research interests include techniques of signal processing for evoked potentials, methods of source localization for bioelectrical signals, models of auditory functions, otoacoustic emissions, and magnetic stimulation of the nervous system. Paolo Ravazzani was born in Milan in 1961. He received the doctoral degree in electronic engineering in 1988 and the Ph.D. degree in biomedical engineering in 1996 from the Polytechnic of Milan. In 1988 he joined the Department of Electronics (now Department of Biomedical Engineering) of the Polytechnic of Milan. He is currently a Researcher of the Italian National Research Council (CNR) at the Center of Biomedical Engineering. His researches concern analysis and modeling of magnetic stimulation of the nervous system and of auditory functions, otoacoustic emissions, analysis of EEG and evoked potentials, and biomedical signal processing.