zyxw
zyxwv
zyxw
zyx
zyxw
eal-time Musical Applications using Frequency Domain Signal
Processing
Zack Settelb and Cort Lippeg
Lysis, Ircam, 1 place Igor Stravinsky,75004 Paris, France
tl University of Buffalo: Music Dept.,222 Baird Hall. Box 604700,Buffalo, New York 14260-4700
ABSTRACT
zyxwvuts
zyxwv
zyxwv
This paper presents real-time musical applicationsusing the IRCAM
Signal Processing Workstation which make use of FFT/IFFT-based
resynthesis for timbral transformation in a musical context An
intuitive and Straightforward user interface, intended for use by musicians,has been developedby the authors in the Max programming
environment. Techniques for high quality time-stretching, filtering,
cross-synthesis, dynamic range processing, and spectnun shaping
are presented along with dynamic contrd structures that allow for
both fine timbral modification and control of complex sound transformations using few parameters.
Key words: convolution,time stretching, cross-synthesis,FITAFFT,
ISPW,Max,spectral envelope.
1. Introduction
The Fast Fourier Transform
is a powerful generalpurpose algorithm widely used in signal analysis. FFh are
useful when the spectral information of a signal is needed,
such as in pitch tracking or vocoding algorithms. The FFF
can be combined with the Inverse Fast Fourier Transform
(IFFT) in order to resynthesize signals based on analyses.
This application of the FFTLFJT is of particular interest in
electro-acoustic music because it allows for a high degree of
control of a given signal’s spectral infomation (an important
aspect of timbre) allowing for flexible, and efficient implementation of signal processing algorithms.
This paper presents real-time musical applications using the
IRCAM Signal Processing Workstation (ISPW) [l] which
make use of F l T m - b a s e d resynthesis for timbral transformation in a compositional context. Taking a pragmatic
approach, the authors have developed a user interfixe in the
Max programming environment [2] for the prototyping and
development of signal processing applications intended for
use by musicians. Development in the Max programming environment [3] tends to be simple and quiterapid: digitalsignal
processing @SP) programming in Max requires no compilation; control and DSP objects run on the same processor,
and the DSP library provides a wide range of unit generators, including the FFT and IFI;T modules. Teehniques for
filtering, cross-synthesis,noise reduction, and dynamic spectral shaping have been explored, as well as control structures
derived from real-time signal analyses via pitch-tracking, envelope following, noise gating, and signal compression and
expansion [4]. These real-time musical applications offer
composers an intuitiveapproach to timbral transformation in
electto-acoustic music, and new possibilitiesin the domain of
live signal processing thatpromise to be of general interest to
musicians.
2. The FFT in Real Time
T r a d i t i d y the FFTDJ3T has been widely used outside
of real-time for various signal analysis/re-synthesis applications that modify the durations and spectra of pre-recorded
sound [5]. With the ability to use the FFT/IFFT in real-time,
live signal-processing in the spectral domain becomes possible, offwing attractive alternatives to standard time-domain
signal processing techniques. Some of these altematives offer a great deal of power, run-time economy, and flexibility,
as compared with standard time-domain techniques [6]. In
addition, the FFToffers both a high degree of precision in the
spectral domain, and straightforward means for exploitation
of this information. Finally, since real-time use of the FFT
has been prohibitivefor musicians in the past due to computational limitationsof computer music systems, this research
offers some relatively new possibilities in the domain of real
time.
2.1. Algorithms and basic operations
All of the signal processing applications discussed in this
paper modify incoming signals and are based on the same
general DSP configuration. The DSP configuration includes
the following basic steps: (1) transformation of the input signals into the spectral domain using the F€T,(2) operations on
the signals’ spectra, (3) and resynthesis of the modified spectra using the IFFT.operations in the spectral domain (FFT
data in the form of rectangular coordinates) include applying
functions (often stored in tables), convolution (complex multiplication),addition, taking the squareroot (used in obtaining
an amplitude spectrum), noise gating (of both frequency and
amplitude), and compression and expansion. Differences in
the choice of spectral domain operations, kinds of input signals used, and signal routing determine the nature of a given
application: small changes to the topology of the DSP configuration can result in significant changes to its functionality. Thus, we are able to reuse much of ow code in diverse
zyxwvuts
zyxwvu
zyxw
zyxwvuts
zyxwvuts
zyx
zyxwvu
zyxwvuts
zyxwvu
zyxwvuts
zyxwvuts
zyxw
applications. (For example, though functionally dissimilar,
high-resolution filtering and subtractive synthesis differ only
-slightly in terms of their implementation.)
3. Applications
3.1. Complex spectral envelopes and cross
synthesis
A static spectral envelopeused in a simple filteringapplication
or in subtractive synthesis can be drawn by hand or obtained
through signal analysis. In cross synthesis a continuously
changing (dynamic) filter can be created by doing an FET
analysis of a signal (signal B) and extracting its spectral envelope, or amplitude spectrum, which describes how another
signal (signal A) will be filtered. Thus, the pitcwphase information of signal A and the time varying spectral envelope
of signal B are combined to form the output signal. Audio
signals produced by standard signal processing modules such
.as frequency modulation
are of particular interest for
cross synthesis because they can produce rich, easily modified, smoothly and nonlinearly varying spectra [7] which can
yield complex time varying spectral envelopes. Other standard signal processing techniques, such as amplitude modulation (AM), additive synthesis, or a band-pass filtering offer rich varying spectral information using relatively simple
means with few control parameters. One of the advantages
of using standard modules is that electronic musicians are familiar with them, and have a certain degree of control and
understanding of their spectra.
It should also be noted that interesting transformations can
be produced by simply convolving signal A's spectrum with
signal B's spectrum (keeping both the phase and amplitude
information of signal B as well as signal A) . In this case,
the phase (frequency) and amplitude information from each
signal figures in the output signal.
3.2. Mapping qualities of one signal to another
Musically, we have found that in some cases, the relationship
between signal A and signal B can become much more unified
if certain parameters of signal A are used to control signal B.
In other words, real-time continuous control parameters can
be derived from, for instance, signal A, and used to control
signal B. For example, the pitch of signal A can be tracked
and applied to signal B (an FM pair) to control the two oscillators' frequencies. Envelope following of signal A can
yield expressive information which can be used to control the
intensity of frequency modulation (FMindex) of signal B.
3.3. Frequency dependent spatialization
In the spectral domain, the phases of a given signal's frequency
components can be independently rotated in order to change
the component's energy distribution in the real and imaginary
part of the output signal. Since the realand imaginaryparts of
the m s output can be assigned to separate output channels,
which are in turn connected to different loud-speakers, it is
possible to control a given frequency's energy level in each
loud-speaker using phase rotation.
3.4. Band-limited energy dependent noisegate
In the spectral domain, the energy of a given signal's frequency components can be independently modified. Our
noise reduction algorithm is based on a technique [8] that
allows independent amplitude gating threshold levels to be
specified for up to 512 frequency band-limited regions of a
given signal. With a user-defined transfer function, the energy in a given frequency range can be altered based on its
intensity and a user-specified threshold level. This technique,
besides being potentially useful for noise reduction, can be
exaggerated in order to create unusual spectral transformations of input signals, resembling extreme chorusing effects.
Using non-linear transfer functions, it is possible to modify
the relative intensities of the input's frequency components,
allowing for example, masked or less important components
to be emphasized and brought to the aural foreground.
3.5. Band-limited frequency dependent noisegate
Similar to the noise-gate described above, this module functions independently of gain; its output depends on the stability of the frequency components of the input signal. Using
a technique borrowed from phase-vocoding [6],time-varying
frequency differences of components in a given band-limited
region of the spectrum are used to determine the stability of
those components. Pitched components in the input signal
tend to be stable and can thus be independently boosted or
attenuated.
3.6. Improvements to frequency domain-based
time expansiodcompressiontechniques
Useful techniques for sound manipulation in the frequency
domain are proposedby the phase vocoder [9,10]. Oneparticularly useful application of this technique, popularly referred
to as time stretching, allows a sound's length to be modified
independently of it's pitch (frequency content). However,
this application has two particular shortcomings which have
been revealed in practiceespecially when a sound's duration
is lengthened. The first problem is that transient components,
normally quite brief, become "unusually long". An additional
problem is that noise components tend to become "less agitated" since their rate of change is not retained during time
stretching. These shortcomingscanbe clearly heard for example, when time stretchingspeech. A spoken text rapidly looses
intelligibility when: A)transients are protracted; consonants
zyxwv
zyxw
zyxwvutsrq
zy
zyxwvuts
zyxwvut
are no longer recognizable and loose their functional role as
articulators, and B) noise components are "de-agitated";sibi-&ntcomponentssuch as "sh", loose their distinctive "windy"
noisy quality.
. In orderdo overcome these two problems, we have made
the following additions to our phase vocoding algorithm: 1)
Selective time stretching, which acts only on non-transient
portions of the signal. 2) A resynthesis stage that reproduces
the originalrateof change (with respect to frequency) of noise
components, using statistical approximations.
3.7. Combining processing modules
The modules described above may be placed in series in order to perform multiple processing operations on a sound.
Since most of the cost of processing occurs in time/frequency
domain conversion (FET and JFFz3,modules operating in
the frequency domain may be ganged together with great efficiency, requiring no additional steps of conversion. For
example the 512 band filter module described above may be
easily connected to another module, providing the additional
possibility of high resolution filtering.
3.8. Input scaling and additional control parameters for cross-synthesis
When two input signals of similar intensities and spectral
distributions (eg.: two singers' voices) are convolved, the resulting spectral distribution m be quite different. In such a
case, strong "low-mid-range''components typically become
much louder, while weaker higher frequency virtually disappear. For example, when convolving a signal by itself, the
resulting energy of a given component is its quare, as shown
below (the energy of signal z is noted < z >):
< Y > = < z > ~if y = z * z
convolution [113. The choice of compression/expansionfunctions determines the way the energy in the output signal will
be d e d ; certain choices (such as a square root function)help
preserve the spectral dishibution of the input signals. Thus,
two input signals of.similar-intensitiesand spectral distributions will combine U, form a spectrum whose components'
relative energies remain little changed. By choosing a compression/expansion function that boosts or cuts the energy of
weaker components in one of the input signals, the degree of
spectrai intersection of the two inputs can be pified. The
mpressiodexpansion ratio parameter (or intensity of mmpression/expansion) provides dynamic control over the "degree of spectral intersection" of the two input signals. This
paramekr is particularly useful when cross-synthesizingdissimilar sounds such as voice and percussion.
4. Future Directions
The authors are currently working on alternative methods of
sampling and granularsynthesisthat operate the frequencydomain, based on real-timephase vocoding [12]. At present we
are able to modify a sound's spectrum and duration independently, and are working towards being able to perform pitch
transposition independently of the spectral envelope (formant
structure), thus allowing one to change the pitch of a sound
without Seriously altering its timbral quality. Additionally,
we are exploring techniques for smooth sample looping and
cross-fading between sounds.
zyxwv
5. Summary
With the arrival of the real-time FFT/IFFT in flexible, relatively general, and easily programmable DSP/control environments such as Max, non-engineers may begin to explore
new possibilities in signal processing. Though our work is
still at an initial stage, we have gained some valuable practical experience in manipulating sounds in the spectral domain.
Real-time convolution can be quite straightforward and is a
powerful tool for transforming sounds. The flexibility with
which spectral transformations can be done is appealing. Our
DSP con6gurationis fairly simple, and changesto its topology
and parameters can be made quickly. Control signals resulting from detection and tracking of musical parameten offer
composers and performers a rich palette of possibilities lending themselves equally well to studio and live performance
zyxwvut
zyxwvut
zyxwvut
By scaling the input or output signal(s),the "original"spectral
distribution can be preserved. If in the above example, we
take the quare-root of the output signal, a given component's
energy is now the logarithmicmean of the energy of its input
signals,and not the square. In this case we recover the original
input signal.
< y >= J< x
>2
=< x >
Including dynamic range processing in
cross-synthesis algorithms
3.9.
The algorithms for cross-synthesis discussed above can be
expanded to include additional processing units, that allow
for scaling and provide more control parameters. A compressor/expander based on the energy dependent noise-gate mentioned earlier is applied to one of the two input signals before
applications.
Acknowledgments The authors would like to thank Miller
Puckette, Bennett Smith and Stefan Bilbao for their invaluable technical and musical insights.
References
zyxw
1. E. Lindemann, M.Starkier, and E Dechelle. The architecture
of the ircam music workstation. Computer Music J., 15(3):41-
49, 1991.
2. M.Puckette. The patcher. Proc. of International Computer
Music Conference, 1988.
zyxwvuts
zyxwvutsrqpon
zyxwvuts
zyxwvut
zyxwvutsrqp
zyxwvutsrqpo
zyxwvutsrq
3. M. Puckette. Fts: Areal-time monitor for multiprocessormusic
synthesis. Proc. ofInternatwnal Computer Music Conference,
1991.
4. C. Lippe and M. Puckette. Musical performance using the
ircam workstation. Proc. of Internatwnal Computer Music
Conference. 1991.
5. R. Haddad and T. Parsons.Digital Signal Processing, Theory,
Applicatwns and Hardware. Computer Science Press, New
York, 1991.
6. J. Gordon and J. Straw. An introductionto the phase vocoder.
Technicalreport, CCRMA, Department of Music.Stanford University, Feb 1987.
7. J. Chowning. The synthesis of complex audio spectraby means
of frequency modulation. J. Acoust. Soc. Am., 21(7):526-534.
1973.
8. J. A. Moorer and M. Berger. Linear-phasebandsplitting: Theory and applications. J.Audw Eng.Soc., 34(3):143-152,1986.
9. M. Dolson. The phase vocoder: A tutorial. Computer Music
J.. 10(4):14-27.1986.
zyxwvutsrqponmlk
R. Nieberle and M. Warstat. Implementation of an analysis/synthesis system on a dsp56001 for general purpose sound
processing. Proc. of Internatwnal Computer Music Conference, 1992.
11. G. W.McNally. Dynamicrange control of digital audio signals.
J . Audw Eng. Soc., 32(5), May 1984.
12. E.vander Heide. private communication. 1993.
13. B. K.Smith. private communication. 1994.
10.