Academia.eduAcademia.edu

Real-time musical applications using frequency domain signal processing

1995, Proceedings of 1995 Workshop on Applications of Signal Processing to Audio and Accoustics

zyxw zyxwv zyxw zyx zyxw eal-time Musical Applications using Frequency Domain Signal Processing Zack Settelb and Cort Lippeg Lysis, Ircam, 1 place Igor Stravinsky,75004 Paris, France tl University of Buffalo: Music Dept.,222 Baird Hall. Box 604700,Buffalo, New York 14260-4700 ABSTRACT zyxwvuts zyxwv zyxwv This paper presents real-time musical applicationsusing the IRCAM Signal Processing Workstation which make use of FFT/IFFT-based resynthesis for timbral transformation in a musical context An intuitive and Straightforward user interface, intended for use by musicians,has been developedby the authors in the Max programming environment. Techniques for high quality time-stretching, filtering, cross-synthesis, dynamic range processing, and spectnun shaping are presented along with dynamic contrd structures that allow for both fine timbral modification and control of complex sound transformations using few parameters. Key words: convolution,time stretching, cross-synthesis,FITAFFT, ISPW,Max,spectral envelope. 1. Introduction The Fast Fourier Transform is a powerful generalpurpose algorithm widely used in signal analysis. FFh are useful when the spectral information of a signal is needed, such as in pitch tracking or vocoding algorithms. The FFF can be combined with the Inverse Fast Fourier Transform (IFFT) in order to resynthesize signals based on analyses. This application of the FFTLFJT is of particular interest in electro-acoustic music because it allows for a high degree of control of a given signal’s spectral infomation (an important aspect of timbre) allowing for flexible, and efficient implementation of signal processing algorithms. This paper presents real-time musical applications using the IRCAM Signal Processing Workstation (ISPW) [l] which make use of F l T m - b a s e d resynthesis for timbral transformation in a compositional context. Taking a pragmatic approach, the authors have developed a user interfixe in the Max programming environment [2] for the prototyping and development of signal processing applications intended for use by musicians. Development in the Max programming environment [3] tends to be simple and quiterapid: digitalsignal processing @SP) programming in Max requires no compilation; control and DSP objects run on the same processor, and the DSP library provides a wide range of unit generators, including the FFT and IFI;T modules. Teehniques for filtering, cross-synthesis,noise reduction, and dynamic spectral shaping have been explored, as well as control structures derived from real-time signal analyses via pitch-tracking, envelope following, noise gating, and signal compression and expansion [4]. These real-time musical applications offer composers an intuitiveapproach to timbral transformation in electto-acoustic music, and new possibilitiesin the domain of live signal processing thatpromise to be of general interest to musicians. 2. The FFT in Real Time T r a d i t i d y the FFTDJ3T has been widely used outside of real-time for various signal analysis/re-synthesis applications that modify the durations and spectra of pre-recorded sound [5]. With the ability to use the FFT/IFFT in real-time, live signal-processing in the spectral domain becomes possible, offwing attractive alternatives to standard time-domain signal processing techniques. Some of these altematives offer a great deal of power, run-time economy, and flexibility, as compared with standard time-domain techniques [6]. In addition, the FFToffers both a high degree of precision in the spectral domain, and straightforward means for exploitation of this information. Finally, since real-time use of the FFT has been prohibitivefor musicians in the past due to computational limitationsof computer music systems, this research offers some relatively new possibilities in the domain of real time. 2.1. Algorithms and basic operations All of the signal processing applications discussed in this paper modify incoming signals and are based on the same general DSP configuration. The DSP configuration includes the following basic steps: (1) transformation of the input signals into the spectral domain using the F€T,(2) operations on the signals’ spectra, (3) and resynthesis of the modified spectra using the IFFT.operations in the spectral domain (FFT data in the form of rectangular coordinates) include applying functions (often stored in tables), convolution (complex multiplication),addition, taking the squareroot (used in obtaining an amplitude spectrum), noise gating (of both frequency and amplitude), and compression and expansion. Differences in the choice of spectral domain operations, kinds of input signals used, and signal routing determine the nature of a given application: small changes to the topology of the DSP configuration can result in significant changes to its functionality. Thus, we are able to reuse much of ow code in diverse zyxwvuts zyxwvu zyxw zyxwvuts zyxwvuts zyx zyxwvu zyxwvuts zyxwvu zyxwvuts zyxwvuts zyxw applications. (For example, though functionally dissimilar, high-resolution filtering and subtractive synthesis differ only -slightly in terms of their implementation.) 3. Applications 3.1. Complex spectral envelopes and cross synthesis A static spectral envelopeused in a simple filteringapplication or in subtractive synthesis can be drawn by hand or obtained through signal analysis. In cross synthesis a continuously changing (dynamic) filter can be created by doing an FET analysis of a signal (signal B) and extracting its spectral envelope, or amplitude spectrum, which describes how another signal (signal A) will be filtered. Thus, the pitcwphase information of signal A and the time varying spectral envelope of signal B are combined to form the output signal. Audio signals produced by standard signal processing modules such .as frequency modulation are of particular interest for cross synthesis because they can produce rich, easily modified, smoothly and nonlinearly varying spectra [7] which can yield complex time varying spectral envelopes. Other standard signal processing techniques, such as amplitude modulation (AM), additive synthesis, or a band-pass filtering offer rich varying spectral information using relatively simple means with few control parameters. One of the advantages of using standard modules is that electronic musicians are familiar with them, and have a certain degree of control and understanding of their spectra. It should also be noted that interesting transformations can be produced by simply convolving signal A's spectrum with signal B's spectrum (keeping both the phase and amplitude information of signal B as well as signal A) . In this case, the phase (frequency) and amplitude information from each signal figures in the output signal. 3.2. Mapping qualities of one signal to another Musically, we have found that in some cases, the relationship between signal A and signal B can become much more unified if certain parameters of signal A are used to control signal B. In other words, real-time continuous control parameters can be derived from, for instance, signal A, and used to control signal B. For example, the pitch of signal A can be tracked and applied to signal B (an FM pair) to control the two oscillators' frequencies. Envelope following of signal A can yield expressive information which can be used to control the intensity of frequency modulation (FMindex) of signal B. 3.3. Frequency dependent spatialization In the spectral domain, the phases of a given signal's frequency components can be independently rotated in order to change the component's energy distribution in the real and imaginary part of the output signal. Since the realand imaginaryparts of the m s output can be assigned to separate output channels, which are in turn connected to different loud-speakers, it is possible to control a given frequency's energy level in each loud-speaker using phase rotation. 3.4. Band-limited energy dependent noisegate In the spectral domain, the energy of a given signal's frequency components can be independently modified. Our noise reduction algorithm is based on a technique [8] that allows independent amplitude gating threshold levels to be specified for up to 512 frequency band-limited regions of a given signal. With a user-defined transfer function, the energy in a given frequency range can be altered based on its intensity and a user-specified threshold level. This technique, besides being potentially useful for noise reduction, can be exaggerated in order to create unusual spectral transformations of input signals, resembling extreme chorusing effects. Using non-linear transfer functions, it is possible to modify the relative intensities of the input's frequency components, allowing for example, masked or less important components to be emphasized and brought to the aural foreground. 3.5. Band-limited frequency dependent noisegate Similar to the noise-gate described above, this module functions independently of gain; its output depends on the stability of the frequency components of the input signal. Using a technique borrowed from phase-vocoding [6],time-varying frequency differences of components in a given band-limited region of the spectrum are used to determine the stability of those components. Pitched components in the input signal tend to be stable and can thus be independently boosted or attenuated. 3.6. Improvements to frequency domain-based time expansiodcompressiontechniques Useful techniques for sound manipulation in the frequency domain are proposedby the phase vocoder [9,10]. Oneparticularly useful application of this technique, popularly referred to as time stretching, allows a sound's length to be modified independently of it's pitch (frequency content). However, this application has two particular shortcomings which have been revealed in practiceespecially when a sound's duration is lengthened. The first problem is that transient components, normally quite brief, become "unusually long". An additional problem is that noise components tend to become "less agitated" since their rate of change is not retained during time stretching. These shortcomingscanbe clearly heard for example, when time stretchingspeech. A spoken text rapidly looses intelligibility when: A)transients are protracted; consonants zyxwv zyxw zyxwvutsrq zy zyxwvuts zyxwvut are no longer recognizable and loose their functional role as articulators, and B) noise components are "de-agitated";sibi-&ntcomponentssuch as "sh", loose their distinctive "windy" noisy quality. . In orderdo overcome these two problems, we have made the following additions to our phase vocoding algorithm: 1) Selective time stretching, which acts only on non-transient portions of the signal. 2) A resynthesis stage that reproduces the originalrateof change (with respect to frequency) of noise components, using statistical approximations. 3.7. Combining processing modules The modules described above may be placed in series in order to perform multiple processing operations on a sound. Since most of the cost of processing occurs in time/frequency domain conversion (FET and JFFz3,modules operating in the frequency domain may be ganged together with great efficiency, requiring no additional steps of conversion. For example the 512 band filter module described above may be easily connected to another module, providing the additional possibility of high resolution filtering. 3.8. Input scaling and additional control parameters for cross-synthesis When two input signals of similar intensities and spectral distributions (eg.: two singers' voices) are convolved, the resulting spectral distribution m be quite different. In such a case, strong "low-mid-range''components typically become much louder, while weaker higher frequency virtually disappear. For example, when convolving a signal by itself, the resulting energy of a given component is its quare, as shown below (the energy of signal z is noted < z >): < Y > = < z > ~if y = z * z convolution [113. The choice of compression/expansionfunctions determines the way the energy in the output signal will be d e d ; certain choices (such as a square root function)help preserve the spectral dishibution of the input signals. Thus, two input signals of.similar-intensitiesand spectral distributions will combine U, form a spectrum whose components' relative energies remain little changed. By choosing a compression/expansion function that boosts or cuts the energy of weaker components in one of the input signals, the degree of spectrai intersection of the two inputs can be pified. The mpressiodexpansion ratio parameter (or intensity of mmpression/expansion) provides dynamic control over the "degree of spectral intersection" of the two input signals. This paramekr is particularly useful when cross-synthesizingdissimilar sounds such as voice and percussion. 4. Future Directions The authors are currently working on alternative methods of sampling and granularsynthesisthat operate the frequencydomain, based on real-timephase vocoding [12]. At present we are able to modify a sound's spectrum and duration independently, and are working towards being able to perform pitch transposition independently of the spectral envelope (formant structure), thus allowing one to change the pitch of a sound without Seriously altering its timbral quality. Additionally, we are exploring techniques for smooth sample looping and cross-fading between sounds. zyxwv 5. Summary With the arrival of the real-time FFT/IFFT in flexible, relatively general, and easily programmable DSP/control environments such as Max, non-engineers may begin to explore new possibilities in signal processing. Though our work is still at an initial stage, we have gained some valuable practical experience in manipulating sounds in the spectral domain. Real-time convolution can be quite straightforward and is a powerful tool for transforming sounds. The flexibility with which spectral transformations can be done is appealing. Our DSP con6gurationis fairly simple, and changesto its topology and parameters can be made quickly. Control signals resulting from detection and tracking of musical parameten offer composers and performers a rich palette of possibilities lending themselves equally well to studio and live performance zyxwvut zyxwvut zyxwvut By scaling the input or output signal(s),the "original"spectral distribution can be preserved. If in the above example, we take the quare-root of the output signal, a given component's energy is now the logarithmicmean of the energy of its input signals,and not the square. In this case we recover the original input signal. < y >= J< x >2 =< x > Including dynamic range processing in cross-synthesis algorithms 3.9. The algorithms for cross-synthesis discussed above can be expanded to include additional processing units, that allow for scaling and provide more control parameters. A compressor/expander based on the energy dependent noise-gate mentioned earlier is applied to one of the two input signals before applications. Acknowledgments The authors would like to thank Miller Puckette, Bennett Smith and Stefan Bilbao for their invaluable technical and musical insights. References zyxw 1. E. Lindemann, M.Starkier, and E Dechelle. The architecture of the ircam music workstation. Computer Music J., 15(3):41- 49, 1991. 2. M.Puckette. The patcher. Proc. of International Computer Music Conference, 1988. zyxwvuts zyxwvutsrqpon zyxwvuts zyxwvut zyxwvutsrqp zyxwvutsrqpo zyxwvutsrq 3. M. Puckette. Fts: Areal-time monitor for multiprocessormusic synthesis. Proc. ofInternatwnal Computer Music Conference, 1991. 4. C. Lippe and M. Puckette. Musical performance using the ircam workstation. Proc. of Internatwnal Computer Music Conference. 1991. 5. R. Haddad and T. Parsons.Digital Signal Processing, Theory, Applicatwns and Hardware. Computer Science Press, New York, 1991. 6. J. Gordon and J. Straw. An introductionto the phase vocoder. Technicalreport, CCRMA, Department of Music.Stanford University, Feb 1987. 7. J. Chowning. The synthesis of complex audio spectraby means of frequency modulation. J. Acoust. Soc. Am., 21(7):526-534. 1973. 8. J. A. Moorer and M. Berger. Linear-phasebandsplitting: Theory and applications. J.Audw Eng.Soc., 34(3):143-152,1986. 9. M. Dolson. The phase vocoder: A tutorial. Computer Music J.. 10(4):14-27.1986. zyxwvutsrqponmlk R. Nieberle and M. Warstat. Implementation of an analysis/synthesis system on a dsp56001 for general purpose sound processing. Proc. of Internatwnal Computer Music Conference, 1992. 11. G. W.McNally. Dynamicrange control of digital audio signals. J . Audw Eng. Soc., 32(5), May 1984. 12. E.vander Heide. private communication. 1993. 13. B. K.Smith. private communication. 1994. 10.