0% found this document useful (0 votes)
2 views44 pages

Fundamentals of digital audio processing

Chapter 1 introduces the fundamentals of digital audio processing, focusing on discrete-time signals and systems. It defines key concepts such as discrete-time signals, basic sequences, operations, and measures of signals, including energy and sound pressure level. The chapter lays the groundwork for understanding digital signal processing in audio applications, emphasizing the importance of discrete-time representations.

Uploaded by

fede.avanzini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views44 pages

Fundamentals of digital audio processing

Chapter 1 introduces the fundamentals of digital audio processing, focusing on discrete-time signals and systems. It defines key concepts such as discrete-time signals, basic sequences, operations, and measures of signals, including energy and sound pressure level. The chapter lays the groundwork for understanding digital signal processing in audio applications, emphasizing the importance of discrete-time representations.

Uploaded by

fede.avanzini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Chapter 1

Fundamentals of digital audio processing

Federico Avanzini and Giovanni De Poli

Copyright ⃝c 2005-2019 Federico Avanzini and Giovanni De Poli


except for paragraphs labeled as adapted from <reference>
This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license. To
view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/, or send a letter to
Creative Commons, 171 2nd Street, Suite 300, San Francisco, California, 94105, USA.

1.1 Introduction
The purpose of this chapter is to provide the reader with fundamental concepts of digital signal process-
ing, which will be used extensively in the reminder of the book. Since the focus is on audio signals, all
the examples deal with sound. Those who are already fluent in DSP may skip this chapter.

1.2 Discrete-time signals and systems


1.2.1 Discrete-time signals
Signals play an important role in our daily life. Examples of signals that we encounter frequently are
speech, music, picture and video signals. A signal is a function of independent variables such as time,
distance, position, temperature and pressure. For examples, speech and music signals represent air pres-
sure as a function of time at a point in space.
Most signals we encounter are generated by natural means. However, a signal can also generated
synthetically or by computer simulation. In this chapter we will focus our attention on a particulary
class of signals: The so called discrete-time signals. This class of signals is the most important way to
describe/model the sound signals with the aid of the computer.

1.2.1.1 Main definitions


We define a signal x as a function x : D → C from a domain D to a codomain C. For our purposes
the domain D represents a time variable, although it may have different meanings (e.g. it may represent
spatial variables). A signal can be classified based on the nature of D and C. In particular these sets can
1-2 Algorithms for Sound and Music Computing [v.February 2, 2019]

3.5 8

discrete amplitude (mV)


7
3
amplitude (mV)

6
2.5
5
2
4
1.5
3
1 2

0.5 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
t (s) t (s)

(a) (b)
3.5 8

discrete amplitude (mV)


7
3
amplitude (mV)

6
2.5
5
2
4
1.5
3
1 2

0.5 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
t (s) t (s)

(c) (d)

Figure 1.1: (a) Analog, (b) quantized-analog, (c) discrete-time, and (d) numerical signals.

be countable or non-countable. Moreover, C may be a subset of R or C, i.e. the signal x may be either a
real-valued or a complex-valued function.
When D = R we talk of continuous-time signals x(t), where t ∈ R, while when D = Z we talk of
discrete-time signals x[n]. In this latter case n ∈ Z identifies discrete time instants tn : the most common
and important example is when tn = nTs , with Ts is a fixed quantity. In many practical applications a
discrete-time signal xd is obtained by periodically sampling a continuous-time signal xc , as follows:
xd [n] = xc (nTs ) − ∞ < n < ∞, (1.1)
The quantity Ts is called sampling period, measured in s. Its its reciprocal is the sampling frequency,
measured in Hz, and is usually denoted as Fs = 1/Ts . Note also the use of square brackets in the notation
for a discrete-time signal x[n], which avoids ambiguity with the notation x(t) used for a continuous-time
signal.
When C = R we talk of continuous-amplitude signals, while when C = Z we talk of discrete-
amplitude signals. Typically the range of a discrete-amplitude signal is a finite set of M values {xk }M
k=1 ,
and the most common example is that of a uniformly quantized signal with xk = kq (where q is called
quantization step).
By combining the above options we obtains the following classes of signals, depicted in Fig. 1.1:
1. D = R, C = R: analog signal.
2. D = R, C = Z: quantized analog signal.
3. D = Z, C = R: sequence, or sampled signal.
4. D = Z, C = Z: numerical, or digital, signal. This is the type of signal that can be processed with
the aid of the computer.

This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license,


⃝2005-2019
c by the authors except for paragraphs labeled as adapted from <reference>
Chapter 1. Fundamentals of digital audio processing 1-3

In these sections we will focus on discrete-time signals, regardless of whether they are quantized or not.
We will equivalently use the terms discrete-time signal and sequence. We will always refer to a single
value x[n] as the n-th sample of the sequence x, regardless of whether the sequence has been obtained
by sampling a continuous-time signal or not.

1.2.1.2 Basic sequences and operations


Sequences are manipulated through various basic operations. The product and sum between two se-
quences are simply defined as the sample-by-sample product sequence and sum sequence, respectively.
Multiplication by a constant is defined as the sequence obtained by multiplying each sample by that
constant. Another important operation is time shifting or translation: we say that a sequence y[n] is a
shifted version of x[n] if
y[n] = x[n − n0 ], (1.2)
with n0 ∈ Z. For n0 > 0 this is a delaying operation while for n0 < 0 it is an advancing operation.
Several basic sequences are relevant in discussing discrete-time signals and systems. The simplest
and the most useful sequence is the unit sample sequence δ[n], often referred to as unit impulse or simply
impulse: {
1, n = 0,
δ[n] = (1.3)
0, n ̸= 0.
The unit impulse is also the simplest example of a finite-legth sequence, defined as a sequence that is
zero except for a finite interval n1 ≤ n ≤ n2 . One trivial but fundamental property of the δ[n] sequence
is that any sequence can be represented as a linear combination of delayed impulses:


x[n] = x[k]δ[n − k]. (1.4)
k=−∞

The unit step sequence is denoted by u[n] and is defined as


{
1, n ≥ 0,
u[n] = (1.5)
0, n < 0.

The unit step is the simplest example of a right-sided sequence, defined as a sequence that is zero except
for a right-infinite interval n1 ≤ n < +∞. Similarly, left-sided sequences are defined as a sequences
that are zero except for a left-infinite interval −∞ < n ≤ n1 .
The unit step is related to the impulse by the following equalities:

∑ ∑
n
u[n] = δ[n − k] = δ[k]. (1.6)
k=0 k=−∞

Conversely, the impulse can be written as the first backward difference of the unit step:

δ[n] = u[n] − u[n − 1]. (1.7)

The general form of the real sinusoidal sequence with constant amplitude is

x[n] = A cos(ω0 n + ϕ), −∞ < n < ∞, (1.8)

where A, ω0 and ϕ are real numbers. By analogy with continuous-time functions, ω0 is called angular
frequency of the sinusoid, and ϕ is called the phase. Note however that, since n is dimensionless, the

This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license,


⃝2005-2019
c by the authors except for paragraphs labeled as adapted from <reference>
1-4 Algorithms for Sound and Music Computing [v.February 2, 2019]

dimension of ω0 is radians. Very often we will say that the dimension of n is “samples” and therefore
we will specify the units of ω0 to be radians/sample. If x[n] has been sampled from a continuous-time
sinusoid with a given sampling rate Fs , we will also use the term normalized angular frequency, since in
this case ω0 is the continuous-time angular frequency normalized with respect to Fs .
Another relevant numerical signal is constructed as the sequence of powers of a real or complex
number α. Such sequences are termed exponential sequences and their general form is

x[n] = Aαn , −∞ < n < ∞, (1.9)

where A and α are real or complex constant. When α is complex, x[n] has real and imaginary parts
that are exponentially weighted sinusoid. Specifically, if α = |α|ejω0 and A = |A|ejϕ , then x[n] can be
expressed as

x[n] = | A | | α |n · ej(ω0 n+ϕ) = | A | | α |n · (cos(ω0 n + ϕ) + j sin(ω0 n + ϕ)) . (1.10)

Therefore x[n] can be expressed as x[n] = xRe [n]+jxIm [n], with xRe [n] = | A | | α |n cos(ω0 n+ϕ) and
xIm [n] = | A | | α |n sin(ω0 n + ϕ). These sequences oscillate with an exponentially growing magnitude
if | α | > 1, or with an exponentially decaying magnitude if | α | < 1. When | α | = 1, the sequences
xRe [n] and xIm [n] are real sinusoidal sequences with constant amplitude and x[n] is referred to as a the
complex exponential sequence.
An important property of real sinusoidal and complex exponential sequences is that substituting the
frequency ω0 with ω0 + 2πk (with k integer) results in sequences that are indistinguishable from each
other. This can be easily verified and is ultimately due to the fact that n is integer. We will see the
implications of this property when discussing the Sampling Theorem in Sec. 1.4.1, for now we will
implicitly assume that ω0 varies in an interval of length 2π, e.g. (−π, π], or [0, 2π).
Real sinusoidal sequences and complex exponential sequences are also examples of a periodic se-
quence: we define a sequence to be periodic with period N ∈ N if it satisfies the equality x[n] =
x[n + kN ], for −∞ < n < ∞, and for any k ∈ Z. The fundamental period N0 of a periodic signal is
the smallest value of N for which this equality holds. In the case of Eq. (1.8) the condition of periodicity
implies that ω0 N0 = 2πk. If k = 1 satisfies this equality we can say that the sinusoidal sequence is
periodic with period N0 = 2π/ω0 , but this is not always true: the period may be longer or, depending on
the value of ω0 , the sequence may not be periodic at all.

1.2.1.3 Measures of discrete-time signals


We now define a set of useful metrics and measures of signals, and focus exclusively on digital signals.
The first important metrics is energy: in physics, energy is the ability to do work and is measured in
N·m or Kg·m2 /s2 , while in digital signal processing physical units are typically discarded and signals are
renormalized whenever convenient. The total energy of a sequence x[n] is then defined as:


Ex = |x[n]|2 . (1.11)
n=−∞

Note that an infinite-length sequence with finite sample values may or not have finite energy. The rate
of transporting energy is known as power. The average power of a sequence x[n] is then defined as the
average energy per sample:
N −1
Ex 1 ∑
Px = = |x[n]|2 . (1.12)
N N
n=0

This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license,


⃝2005-2019
c by the authors except for paragraphs labeled as adapted from <reference>
Chapter 1. Fundamentals of digital audio processing 1-5

Another common description


√ of a signal is its root mean square (RMS) level. The RMS level of a
signal x[n] is simply Px . In practice, especially in audio, the RMS level is typically computed after
subtracting out any nonzero mean value, and is typically used to characterize periodic sequences in
which Px is computed over a cycle√ of oscillation: as an example, the RMS level of a sinusoidal sequence
x[n] = A cos(ω0 n + ϕ) is A/ 2.
In the case of sound signals, x[n] will typically represent a sampled acoustic pressure signal. As a
pressure wave travels in a medium (e.g., air), the RMS power is distributed all along the surface of the
wavefront so that the appropriate measure of the strength of the wave is power per unit area of wavefront,
also known as intensity.
Intensity is still proportional to the RMS level of the acoustic pressure, and relates to the sound level
perceived by a listener. However, the usual definition of sound pressure level (SPL) does not directly use
intensity. Insted the SPL of a pressure signal is measured in decibels (dB), and is defined as
SP L = 10 log10 (I/I0 ) (dB), (1.13)
where I and I0 are the RMS intensity of the signal and a reference intensity, respectively. In particular,
in an absolute dB scale I0 is chosen to be the smallest sound intensity that can be heard (more on this in
Chapter Auditory based processing). The function of the dB scale is to transform ratios into differences: if I2 is
twice I1 , then SP L2 − SP L1 = 3 dB, no matter what the actual value of I1 might be.1
Because sound intensity is proportional to the square of the RMS pressure, it is easy to express level
differences in terms of pressure ratios:
SP L2 − SP L1 = 10 log10 (p22 /p21 ) = 20 log10 (p2 /p1 ) (dB). (1.14)
Therefore, depending on the physical quantity which is being used the prefactor 20 or 10 may be em-
ployed in a decibel calculation. To resolve the uncertainty of which is the correct one, note that there are
two kinds of quantities for which a dB scale is appropriate: “energy-like” quantities and “dynamical”
quantities. An energy-like quantity is real and never negative: examples of such quantities are acoustical
energy, intensity or power, electrical energy or power, optical luminance, etc., and the appropriate pref-
actor for these quantities in a dB scale is 10. Dynamical quantities may be positive or negative, or even
complex in some representations: examples of such quantities are mechanical displacement or velocity,
acoustical pressure, velocity or volume velocity, electrical voltage or current, etc., and the appropriate
prefactor for these quantities in a dB scale is 20 (since they have the property that their squares are
energy-like quantities).

1.2.1.4 Random signals


1.2.2 Discrete-time systems
Signal processing systems can be classified along the same lines used in Sec. 1.2 to classify signals. Here
we are interested in discrete-time systems, that act on sequences and produce sequences as output.

1.2.2.1 Basic systems and block schemes


We define a discrete-time system as a transformation T that maps an input sequence x[n] into an output
sequence y[n]:
y[n] = T {x}[n]. (1.15)
1
This is a special case of the Weber-Fechner law, which attempts to describe the relationship between the physical magni-
tudes of stimuli and the perceived intensity of the stimuli: the law states that this relation is logaritmic: if a stimulus varies as
a geometric progression (i.e. multiplied by a fixed factor), the corresponding perception is altered in an arithmetic progression
(i.e. in additive constant amounts).

This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license,


⃝2005-2019
c by the authors except for paragraphs labeled as adapted from <reference>
1-6 Algorithms for Sound and Music Computing [v.February 2, 2019]

M 1 times
x[n]
T−1 ... T−1

Tn y[n]
0
(M1 +M2+1)−1
x[n] y[n]
T1 ... T1
T1 ... T1
x[n] y[n]
T {.} n0 times TMA
M 2 times

(a) (b) (c)

Figure 1.2: Block schemes of discrete-time systems; (a) a generic system T {·}; (b) ideal delay system
Tn0 ; (c) moving average system TM A . Note the symbols used to represent sums of signals and multipli-
cation by a constant (these symbols will be formally introduced in Sec. 1.6.3).

Discrete-time systems are typically represented pictorially through block schemes that depict the signal
flow graph. As an example the block scheme of Fig.1.2(a) represents the generic discrete-time system of
Eq. (1.15).
The simplest concrete example of a discrete-time system is the ideal delay system Tn0 , defined as

y[n] = Tn0 {x}[n] = x[n − n0 ], (1.16)

where the integer n0 is the delay of the system and can be both positive and negative: if n0 > 0 then
y corresponds to a time-delayed version of x, while if n0 < 0 then the system operates a time advance.
This system can be represented with the block-scheme of Fig. 1.2(b): this block-scheme also provides
an example of cascade connection of systems, based on the trivial observation that the system Tn0 can
be seen as cascaded of n0 unit delay systems T1 .
A slightly more complex example is the moving average system TM A , defined as

1 ∑
M2
y[n] = TM A {x}[n] = x[n − k]. (1.17)
M1 + M2 + 1
k=−M1

The n-th sample of the sequence y is the average of (M1 + M2 + 1) samples of the sequence x, cen-
tered around the sample x[n], hence the name of the system. Fig. 1.2(c) depicts a block scheme of the
moving average system: in this case all the branches carrying the shifted versions of x[n] form a parallel
connection in which they are all summed up and subsequently multiplied by the factor 1/(M1 + M2 +1).

1.2.2.2 Classes of discrete-time systems

Classes of systems are defined by placing constraints on the properties of the transformation T {·}. Doing
so often leads to very general mathematical representations.
We define a system to be memoryless if the output sequence y[n] at every value of n depends only on
the value of the input sequence x[n] at the same value of n. As an example, the system y[n] = sin(x[n])
is a memoryless system. On the other hand, the ideal delay system and the moving average system
described in the previous section are not memoryless: these systems are referred to as having memory,
since they must “remember” past (or even future) values of the sequence x in order to compute the
“present” output y[n].

This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license,


⃝2005-2019
c by the authors except for paragraphs labeled as adapted from <reference>
Chapter 1. Fundamentals of digital audio processing 1-7

We define a system to be linear if it satisfies the principle of superposition. If y1 [n], y2 [n] are the
responses of a system T to the inputs x1 [n], x2 [n], respectively, then T is linear if and only if

T {a1 x1 + a2 x2 }[n] = a1 T {x1 }[n] + a2 T {x2 }[n], (1.18)

for any pair of arbitrary constants a1 and a2 . Equivalently we say that a linear system possesses an
additive property and a scaling property. As an example, the ideal delay system and the moving average
system described in the previous section are linear systems. On the other hand, the memoryless system
y[n] = sin(x[n]) discussed above is clearly non-linear.
We define a system to be time-invariant (or shift-invariant) if a time shift of the input sequence causes
a corresponding shift in the output sequence. Specifically, let y = T {x}. Then T is time-invariant if and
only if
T {Tn0 {x}} [n] = y[n − n0 ] ∀n0 , (1.19)
where Tn0 is the ideal delay system defined previously. This relation between the input and the output
must hold for any arbitrary input sequence x and its corresponding output. All the systems that we have
examined so far are time-invariant. On the other hand, an example of non-time-invariant system is y[n] =
x[M n] (with M ∈ N). This system creates y by selecting one every M samples of x. One can easily see
that T {Tn0 {x}} [n] = x[M n − n0 ], which is in general different from y[n − n0 ] = x[M (n − n0 )].
We define a system to be causal if for every choice of n0 the output sequence sample y[n0 ] depends
only on the input sequence samples x[n] with n ≤ n0 . This implies that, if y1 [n], y2 [n] are the responses
of a causal system to the inputs x1 [n], x2 [n], respectively, then

x1 [n] = x2 [n] ∀n < n0 ⇒ y1 [n] = y2 [n] ∀n < n0 . (1.20)

The moving average system discussed in the previous section is an example of a non-causal systems,
since it needs to know M1 “future” values of the input sequence in order to compute the current value
y[n]. Apart from this, all the systems that we have examined so far are causal.
We define a system to be stable if and only if every bounded input sequence produces a bounded
output sequence. A sequence x[n] is said to be bounded if there exist a positive constant Bx such that

| x[n] | ≤ Bx ∀n. (1.21)

Stability then requires that for such an input sequence there exists a positive constant By such that
| y[n] | ≤ By ∀n. This notion of stability is often referred to as bounded-input bounded-output (BIBO)
stability. All the systems that
∑we have examined so far are BIBO-stable. On the other hand, an example of
unstable system is y[n] = nk=−∞ x[k]. This is called the accumulator system, since y[n] accumulates
the sum of all past values of x. In order to see that the accumulator system is not stable it is sufficient to
verify that y[n] is not bounded when x[n] is the step sequence.

1.2.3 Linear Time-Invariant Systems


Linear-time invariant (LTI) are a particularly relevant class of systems. A LTI system is any system that
is both linear and time-invariant according to the definitions given in the previous section. As we will
see in this section, LTI systems are mathematically easy to analyze and to characterize.

1.2.3.1 Impulse response and convolution


Let T be a LTI system, y[n] = T {x}[n] be the output sequence given a generic input x, and h[n] the
impulse response of the system, i.e. h[n] = T {δ}[n]. Now, recall that every sequence x[n] can be

This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license,


⃝2005-2019
c by the authors except for paragraphs labeled as adapted from <reference>
1-8 Algorithms for Sound and Music Computing [v.February 2, 2019]

represented as a linear combination of delayed impulses (see Eq. (1.4)). If we use this representation and
exploit the linearity and time-invariance properties, we can write:
{ +∞ }
∑ ∑
+∞ ∑
+∞
y[n] = T x[k]δ[n − k] = x[k]T {δ[n − k]} = x[k]h[n − k], (1.22)
k=−∞ k=−∞ k=−∞

where in the first equality we have used the representation (1.4), in the second equality we have used the
linearity property, and in the last equality we have used the time-invariance property.
Equation (1.22) states that a LTI system can be completely characterized
∑∞ by its impulse response
h[n], since the response to any imput sequence x[n] can be written as k=−∞ x[n]h[n − k]. This can
be interpreted as follows: the k-th input sample, seen as a single impulse x[k]δ[n − k], is transformed by
the system into the sequence x[k]h[n − k], and for each k these sequences are summed up to form the
overall output sequence y[n].
The sum on the right-hand side of Eq. (1.22) is called convolution sum of the sequences x[n] and
h[n], and is usually denoted with the sign ∗. Therefore we have just proved that a LTI system T has the
property
y[n] = T {x}[n] = (x ∗ h)[n]. (1.23)
Let us consider again the systems defined in the previous sections: we can find their impulse re-
sponses through the definition, i.e. by computing their response to an ideal impulse δ[n]. For the ideal
delay system the impulse response is simply a shifted impulse:
hn0 [n] = δ[n − n0 ]. (1.24)
The impulse response of the moving average system is easily computed as
{
1 ∑
M2 1
, −M1 < n < M2 ,
h[n] = δ[n − k] = M1 +M2 +1 (1.25)
M1 + M2 + 1 0, elsewhere.
k=−M 1

Finally the accumulator system has the following impulse response:


{
∑n
1, n ≥ 0,
h[n] = δ[k] = (1.26)
k=−∞
0, n < 0.

There is a fundamental difference between these impulse responses. The first two responses have a finite
number of non-zero samples (1 and M1 + M2 + 1, respectively): systems that possess this property are
called finite impulse response (FIR) systems. On the other hand, the impulse response of the accumulator
has an infinite number of non-zero samples: systems that possess this property are called infinite impulse
response (IIR) systems.

1.2.3.2 Properties of LTI systems


Since the convolution sum of Eq. (1.23) completely characterizes a LTI system, the most relevant prop-
erties of this class of systems can be understood by inspecting properties of the convolution operator.
Clearly convolution is linear, otherwise T would not be a linear system, which is by hypothesis. Convo-
lution is also associative:
(x ∗ (h1 ∗ h2 )) [n] = ((x ∗ h1 ) ∗ h2 ) [n]. (1.27)
Moreover convolution is commutative:

∑ ∞

(x ∗ h)[n] = x[n]h[n − k] = x[n − m]h[m] = (h ∗ x)[n], (1.28)
k=−∞ m=−∞

This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license,


⃝2005-2019
c by the authors except for paragraphs labeled as adapted from <reference>
Chapter 1. Fundamentals of digital audio processing 1-9

x[n] y[n] h1 [n]


h1 [n] h2 [n]
x[n] y[n]

x[n] y[n]
h2 [n] h1 [n] h2 [n]

x[n] y[n] x[n] y[n]


(h1*h 2)[n] (h1+h2 )[n]

(a) (b)

Figure 1.3: Properties of LTI system connections, and equivalent systems; (a) cascade, and (b) parallel
connections.

where we have substituted the variable m = n − k in the sum. This property implies that a LTI system
with input h[n] and impulse response x[n] will have the same ouput of a LTI system with input x[n]
and impulse response h[n]. More importantly, associativity and commutativity have implications on the
properties of cascade connections of systems. Consider the block scheme in Fig. 1.3(a) (upper panel):
the output from the first block is x ∗ h1 , therefore the final output is (x ∗ h1 ) ∗ h2 , which equals both
(x ∗ h2 ) ∗ h1 and x ∗ (h1 ∗ h2 ). As a result the three block schemes in Fig. 1.3(a) represent three systems
with the same impulse response.
Linearity and commutativity imply that the convolution is distributive over addition. From the defi-
nition (1.23) it is straightforward to prove that

(x ∗ (h1 + h2 )) [n] = (x ∗ h1 )[n] + (x ∗ h2 )[n]. (1.29)

Distributivity has implications on the properties of parallel connections of systems. Consider the block
scheme in Fig. 1.3(b) (upper panel): the final output is (x ∗ h1 ) + (x ∗ h2 ), which equals x ∗ (h1 + h2 ).
As a result the two block schemes in Fig. 1.3(a) represent two systems with the same impulse response.
In the case of a LTI system, the notions of causality and stability given in the previous sections can
also be related to properties of the impulse response. As for causality, it is a straightforward exercise to
show that a LTI system is causal if and only if

h[n] = 0 ∀n < 0. (1.30)

For this reason, sequences that satisfy the above condition are usually termed causal sequences.
As for stability, recall that a system is BIBO-stable if any bounded input produces a bounded output.
The response of a LTI system to a bounde input x[n] ≤ Bx is


+∞ ∑
+∞ ∑
+∞
| y[n] | = x[k]h[n − k] ≤ | x[k] | | h[n − k] | ≤ Bx | h[n − k] | . (1.31)
k=−∞ k=−∞ k=−∞

From this chain of inequalities we find that a sufficient condition for the stability of the system is


+∞ ∑
+∞
| h[n − k] | = | h[k] | < ∞. (1.32)
k=−∞ k=−∞

One can prove that this is also a necessary condition for stability. Assume that Eq. (1.32) does not hold
and define the input x[n] = h∗ [−n]/ | h[n] | for h[n] ̸= 0 (x = 0 elsewhere): this input is bounded by

This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license,


⃝2005-2019
c by the authors except for paragraphs labeled as adapted from <reference>
1-10 Algorithms for Sound and Music Computing [v.February 2, 2019]


unity, however one can immediately prove that y[0] = +∞ k=−∞ | h[k] | = +∞. In conclusion, a LTI
system is stable if and only if h is absolutely summable, or h ∈ L1 (Z). A direct consequence of this
property is that FIR systems are always stable, while IIR systems may not be stable.
Using the properties demonstrated in this section, we can look back at the impulse responses of
Eqs. (1.24,1.25,1.26), and we can immediately immediately prove whether they are stable and causal.

1.2.3.3 Constant-coefficient difference equations


Consider the following constant-coefficient difference equation:


N ∑
M
ak y[n − k] = bk x[n − k]. (1.33)
k=0 k=0

Question: given a set of values for {ak } and {bk }, does this equation define a LTI system? The answer is
no, because a given input x[n] does not univocally determine the output y[n]. In fact it is easy to see that,
if x[n], y[n] are two sequences satisfying Eq. (1.33), then the equation is satisfied also by the sequences
x[n], y[n] + yh [n], where yh is any sequence that satisfies the homogeneous equation:


N
ak yh [n − k] = 0. (1.34)
k=0


One could show that yh has the general form yh [n] = N n
m=1 Am zm , where the zm ’s are roots of the
∑N
polynomial k=0 ak z k (this can be verified by substituting the general form of yh into Eq. (1.34)).
The situation is very much like that of linear constant-coefficient differential equations in continuous-
time: since yh has N undetermined coefficients Am , we must specify N additional constraints in order
for the equation to admit a unique solution. Typically we set some initial conditions. For Eq. (1.33),
an initial condition is a set of N consecutive “initial” samples of y[n]. Suppose that the samples
y[−1], y[−2], . . . y[−N ] have been fixed: then all the infinite remaining samples of y can be recursively
determined through the recurrence equations
 N

 ∑ ak ∑
M
bm

 − y[n − k] + x[n − m], n ≥ 0,
 a0 a0
k=1 m=0
y[n] = −1 (1.35)

 ∑
N
ak ∑
M
bm

 − − x[n + N − m], n ≤ −N − 1.
 a0
y[n + N k] +
a0
k=0 m=0

In particular, y[0] is determined with the above equation using the initial values y[−1], y[−2], . . . y[−N ],
then y[1] is determined using the values y[−0], y[−1], . . . y[−N + 1], and so on.
Another way of guaranteeing that Eq. (1.33) specifies a unique solution is requiring that the LTI
system is also causal. If we look back at the definition of causality, we see that in this context it implies
that for an input x[n] = 0 ∀n < n0 , then y[n] = 0 ∀n < n0 . Then again we have sufficient initial
conditions to recursively compute y[n] for n ≥ n0 : in this case can speak of initial rest conditions.
All the LTI systems that we have examined so far can actually be written in the form (1.33). As an
example, let us examine the accumulator system: we can write


n ∑
n−1
y[n] = x[k] = x[n] + x[k] = x[n] + y[n − 1], (1.36)
k=−∞ k=−∞

This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license,


⃝2005-2019
c by the authors except for paragraphs labeled as adapted from <reference>
Chapter 1. Fundamentals of digital audio processing 1-11

x[n] y[n] x[n] y[n]


Accumulator
(M 2 +1)−1

δ [n−1]
y[n−1] δ [n−M 2−1]

(a) (b)

Figure 1.4: Block schemes for constant-coefficient difference equations representing (a) the accumulator
system, and (b) the moving average system.

where in the second equality we have simply separated the term x[n] from the sum, and in the third equal-
ity we have applied the definition of the accumulator system for the output sample y[n − 1]. Therefore,
input and output of the accumulator system satisfy the equation
y[n] − y[n − 1] = x[n], (1.37)
which is in form (1.33) with N = 1, M = 0 and with a0 = 1, a1 = −1, b0 = 1. This also means that
we can implement the accumulator with the block scheme of Fig. 1.4(a).
As a second example, consider the moving average system, with M1 = 0 (so that the system is
causal). Then Eq. (1.17) is already a constant-coefficient difference equation. But we can also represent
the system with a different equation by noting that

1 ∑2M
1 ∑ n
y[n] = x[n − k] = (x[n] − x[n − M2 − 1]) . (1.38)
M2 + 1 M2 + 1 −∞
k=0

Now we note that the sum on the right-hand side represents an accumulator applied to the signal x1 [n] =
(x[n] − x[n − M2 − 1]). Therefore we can apply Eq. (1.37) and write
1 1
y[n] − y[n − 1] = x1 [n] = (x[n] − x[n − M2 − 1]) . (1.39)
M2 + 1 M2 + 1
We have then found a totally different equation, which is still in the form (1.33) and still represents the
moving average system. The corresponding block scheme is given in Fig. 1.4(b). This example also
shows that different equations of the form (1.33) can represent the same LTI system.

1.3 Signal generators


In this section we describe methods and algorithms used to directly generate a discrete-time signal.
Specifically we will examine periodic waveform generators and noise generators, which are both partic-
ularly relevant in audio applications.

1.3.1 Digital oscillators


Many relevant musical sounds are almost periodic in time. The most direct method for synthesizing a
periodic signal is repeating a single period of the corresponding waveform. An algorithm that implements
this method is called oscillator. The simplest algorithm consists in computing the appropriate value of
the waveform for every sample, assuming that the waveform can be approximately described through a
polynomial or rational truncated series. However this is definitely not the most efficient approach. More
efficient algorithms are presented in the remainder of this section.

This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license,


⃝2005-2019
c by the authors except for paragraphs labeled as adapted from <reference>
1-12 Algorithms for Sound and Music Computing [v.February 2, 2019]

1.3.1.1 Table lookup oscillator


A very efficient approach is to pre-compute the samples of the waveform, store them in a table which is
usually implemented as a circular buffer, and access them from the table whenever needed. If a copy of
one period of the desired waveform is stored in such a wavetable, a periodic waveform can be generated
by cycling over the wavetable with the aid of a circular pointer. When the pointer reaches the end of the
table, it wraps around and points again at the beginning of the table.
Given a table of length L samples, the period T0 of the generated waveform depends on the sampling
period Ts at which samples are read. More precisely, the period is given by T0 = LTs , and consequently
the fundamental frequency is f0 = Fs /L. This implies that in order to change the frequency (while
maintaing the sample sampling rate), we would need the same waveform to be stored in tables of different
lengths.
A better solution is the following. Imagine that a single wavetable is stored, composed of a very
large number L of equidistant samples of the waveform. Then for a given sampling rate Fs and a desired
signal frequency f0 , the number of samples to be generated in a single cycle is Fs /f0 . From this, we can
define the sampling increment (SI), which is the distance in the table between two samples at subsequent
instants. The SI is given by the following equation:

L f0 L
SI = = . (1.40)
Fs /f0 Fs

Therefore the SI is proportional to f0 . Having defined the sampling increment, samples of the desired
signal are generated by reading one every SI samples of the table. If the SI is not an integer, the closest
sample of the table will be chosen (obviously, the largest L, the better the approximation). In this way,
the oscillator resample the table to generate a waveform with different fundamental frequencies.

M-1.1
Implement in Matlab a circular look-up from a table of length L and with sampling increment SI.

M-1.1 Solution
phi=mod(phi +SI,L);
s=tab[round(phi)];

where phi is a state variable indicating the reading point in the table, A is a scaling parameter, s is
the output signal sample. The function mod(x,y) computes the remainder of the division x/y and is
used here to implement circular reading of the table. Notice that phi can be a non integer value. In
order to use it as array index, it has to be truncated, or rounded to the nearest integer (as we did in
the code above). A more accurate output can be obtained by linear interpolation between adjacent
table values.

1.3.1.2 Recurrent sinusoidal signal generators


Sinusoidal signals can be generated also by recurrent methods. A first method is based on the following
equation:
y[n + 1] = 2 cos(ω0 )y[n] − y[n − 1] (1.41)
where ω0 = 2πf0 /Fs is the normalized angular frequency of the sinusoid. Then one can prove that given
the initial values y[0] = cos ϕ and y[−1] = cos(ϕ − ω0 ) the generator produces the sequence

y[n] = cos(ω0 + ϕ). (1.42)

This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license,


⃝2005-2019
c by the authors except for paragraphs labeled as adapted from <reference>
Chapter 1. Fundamentals of digital audio processing 1-13

In particular, with initial values y[0] = 1 and y[−1] = cos ω0 the generator produces the sequence
y[n] = cos(ω0 n), while with initial conditions y[0] = 0 and y[−1] = − sin ω0 it produces the sequence
y[n] = sin(ω0 n). This property can be justified by recalling the trigonometric relation cos ω0 · cos ϕ =
0.5[cos(ϕ + ω0 ) + cos(ϕ − ω0 )].
A second recursive method for generating sinusoidal sequence combines both the sinusoidal and
cosinusoidal generators and is termed coupled form. It is described by the equations

x[n + 1] = cos ω0 · x[n] − sin ω0 · y[n],


(1.43)
y[n + 1] = sin ω0 · x[n] + cos ω0 · y[n].

With x[0] = 1 and y[0] = 0 the sequences x[n] = cos(ω0 n) and y[n] = sin(ω0 n) are generated.
This property can be verified by noting that for the complex exponential sequence the trivial relation
ejω0 (n+1) = ejω0 ejω0 n holds. From this relation, the above equations are immediately proved by calling
x[n] and y[n] the real and imaginary parts of the complex exponential sequence, respectively.
A major drawback of both these recursive methods is that they are not robust against quatization.
Small quantization errors in the computation will cause the generated signals either to grow exponentially
or to decay rapidly into silence. To avoid this problem, a periodic re-initialization is advisable. It is
possible to use a slightly different set of coefficients to produce absolutley stable sinusoidal waveforms

x[n + 1] = x[n] − c · y[n],


(1.44)
y[n + 1] = c · x[n + 1] + y[n],

where c = 2 sin(ω0 /2). With x[0] = 1 and y[0] = c/2 we have x[n] = cos(ω0 n).

1.3.1.3 Control signals and envelope generators


Amplitude and frequency of a sound are usually required to be time-varying parameters. Amplitude
control can be needed to define suitable sound envelopes, or to create effects such as tremolo (quasi-
periodic amplitude variations around an average value). Frequency control can be needed to simulate
continuous gliding between two tones (portamento, in musical terms), or to obtain subtle pitch variations
in the sound attack/release, or to create effects such as vibrato (quasi-periodic pitch variations around an
average value), and so on. We then want to construct a digital oscillator of the form

x[n] = a[n] · tab{ϕ[n]}, (1.45)

where a[n] scales the amplitude of the signal, while the phase ϕ[n] relates to the instantaneous frequency
f0 [n] of the signal: if f0 [n] is not constant, then ϕ[n] does not increase linearly in time. Figure 1.5(a)
shows the symbol usually adopted to depict an oscillator with fixed waveform and varying amplitude and
frequency.
The signals a[n], and f0 [n] are usually referred to as control signals, as opposed to audio signals. The
reason for this distinction is that control signals vary on a much slower time-scale than audio signals (as
an example, a musical vibrato usually have a frequency of a no more than ∼ 5 Hz). Accordingly, many
sound synthesis languages define control signals at a different (smaller) rate than the audio sampling rate
Fs . This second rate is called control rate, or frame rate: a frame is a time window with pre-defined
length (e.g. 5 or 50 ms), in which the control signals can be reasonably assumed to have small variations.
We will use the notation Fc for the control rate.
Suitable control signals can be synthesized using envelope generators. An envelope generator can be
constructed through the table-lookup approach described previously. In this case however the table will
be read only once since the signal to be generated is not periodic. Given a desired duration (in seconds)
of the control signal, the appropriate sampling increment will be chosen accordingly.

This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license,


⃝2005-2019
c by the authors except for paragraphs labeled as adapted from <reference>
1-14 Algorithms for Sound and Music Computing [v.February 2, 2019]

envelope value
a[n] f 0 [n] decay rate

sustain level release rate

attack rate

A D S R time

x[n]
key key
pressed released
(a) (b)

Figure 1.5: Controlling a digital oscillator; (a) symbol of the digital controlled in amplitude and fre-
quency; (b) example of an amplitude control signal generated with an ADSR envelope.

Alternatively, envelope generators can be constructed by specifying values of control signals at a few
control points and interpolating the signal in between them. In the simplest formulation, linear interpo-
lation is used. In order to exemplify this approach, we discuss the so-called Attack, Decay, Sustain, and
Release (ADSR) envelope typically used in sound synthesis applications to describe the time-varying am-
plitude a[n]. This envelope is shown in Fig. 1.5(b)): amplitude values are specified only at the boundaries
between ADSR phases, and within each phase the signal varies linearly.
The attack and release phases mark the identity of the sound, while the central phases are associated
with the steady-state portion of the sound. Therefore, in order to synthesize two sounds with the similar
identity (or timbre) but different durations, it is advisable to only slightly modify the duration of attack
and release, while the decay and especially sustain can be lengthened more freely.

M-1.2
Write a function that realizes a line-segment envelope generator. The input to the function are a vector of time
instants and a corresponding vector of envelope values.

M-1.2 Solution
function env = envgen(t,a,method); %t= vector of control time instants
%a= vector of envelope vaues

global Fs; global SpF; %global variables: sample rate, samples-per-frame

if (nargin<3) method=’linear’; end

frt=floor(t*Fs/SpF+1); %control time instants as frame numbers


nframes=frt(length(frt)); %total number of frames
env=interp1(frt,a,[1:nframes],method); %linear (or other method) interpolation

The envelope shape is specified by break-points, described as couples (time instant (sec) and am-
plitude). The function generates the envelope at frame rate. Notice that the interpolation function
interp1 allows to easily use cubic of spline interpolations.

The use of waveform and envelope generators allows to generate quasi periodic sounds with very
limited hardware and constitutes the building block of many more sophisticated algorithms.

This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license,


⃝2005-2019
c by the authors except for paragraphs labeled as adapted from <reference>
Chapter 1. Fundamentals of digital audio processing 1-15

M-1.3
Assume that a function sinosc(t0,a,f,ph0) realizes a sinusoidal oscillator controlled in frequency and am-
plitude, with t0 initial time, a,f frame-rate amplitude and frequency vectors, and ph0 initial phase (see exam-
ple M-1.4). Then generate a sinusoid with varying amplitude and constant frequency.

M-1.3 Solution
global Fs; global SpF; %global variables: sample rate, samples-per-frame

Fs=22050;
framelength=0.01; %frame length (in s)
SpF=round(Fs*framelength); %samples per frame

%%% define controls %%%


slength=2; %sound length (in s)
nframes=slength*Fs/SpF; %total no. of frames
f=50*ones(1,nframes); %constant frequency (Hz)
a=envgen([0,.2,3,3.5,4],[0,1,.8,.5,0],’linear’); %ADSR amp. envelope

s=sinosc(0,a,f,0); % compute sound signal

Note the structure of this simple example: in the “headers” section some global parameters are de-
fined, that need to be known also to auxiliary functions; a second section defines the control parame-
ters, and finally the audio signal is computed.

1.3.1.4 Frequency controlled oscillators


While realizing an amplitude modulated oscillator is quite straightforward, realizing a frequency mod-
ulated oscillator requires some more work. First of all we have to understand what is the instantaneous
frequency of such an oscillator and how it relates to the phase function ϕ. This can be better understood
in the continuous time domain. When the oscillator frequency is constant the phase is a linear function of
time, ϕ(t) = 2πf0 t. In the more general case in which the frequency varies at frame rate, the following
equation holds:
1 dϕ
f0 (t) = (t), (1.46)
2π dt
which simply says that the instantaneous angular frequency ω0 (t) = 2πf0 (t) is the instantaneous angular
velocity of the time-varying phase ϕ(t). If f0 (t) is varying slowly enough (i.e. it is varying at frame rate),
we can say that in the k-th frame the following first-order approximation holds:
1 dϕ
(t) = f0 (t) ∼ f0 (tk ) + Fc [f0 (tk+1 ) − f0 (tk )] · (t − tk ), (1.47)
2π dt
where tk , tk+1 are the initial instants of frames k and k +1, respectively. The term Fc [f0 (tk+1 ) − f0 (tk )]
approximates the derivative df0 /dt inside the kth frame. We can then find the phase function by integrat-
ing equation (1.47):
(t − tk )2
ϕ(t) = ϕ(tk ) + 2πf0 (tk )(t − tk ) + 2πFc [f0 (tk+1 ) − f0 (tk )] . (1.48)
2
From this equation, the discrete-time signal ϕ[n] can be computed within the kth frame, i.e. for the time
indexes (k − 1) · SpF + n, with n = 0 . . . (SpF − 1).
In summary, Eq. (1.48) allows to compute ϕ[n] at sample rate inside the kth frame, given the frame
rate frequency values f0 (tk ) and f0 (tk+1 ). The key ingredient of this derivation is the linear interpola-
tion (1.47).

This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license,


⃝2005-2019
c by the authors except for paragraphs labeled as adapted from <reference>
1-16 Algorithms for Sound and Music Computing [v.February 2, 2019]

M-1.4
Realize the sinosc(t0,a,f,ph0) function that we have used in M-1.3. Use equation (1.48) to compute the
phase given the frame-rate frequency vector f.

M-1.4 Solution
function s = sinosc(t0,a,f,ph0);

global Fs; global SpF; %global variables: sample rate, samples-per-frame

nframes=length(a); %total number of frames


if (length(f)==1) f=f*ones(1,nframes); end
if (length(f)˜=nframes) error(’wrong f length!’); end

s=zeros(1,nframes*SpF); %initialize signal vector to 0


lasta=a(1); lastf=f(1); lastph=ph0; %initialize amplitude, frequency, phase

for i=1:nframes %cycle on the frames


naux=1:SpF; %count samples within frame
%%%%%%%%%%%% compute amplitudes and phases within frame %%%%%%%%%%%%%
ampl=lasta + (a(i)-lasta)/SpF.*naux;
phase=lastph +pi/Fs.*naux.*(2*lastf +(1/SpF)*(f(i)-lastf).*naux);
%%%%%%%%%%%%%%%% read from table %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
s(((i-1)*SpF+1):i*SpF)=ampl.*cos(phase); %read from table
%%%%%%%%% save last values of amplitude, frequency, phase
lasta=a(i); lastf=f(i); lastph=phase(SpF);
end
s=[zeros(1,round(t0*Fs)) s]; %add initial silence of t0 sec.

Both the amplitude a and frequency f envelopes are defined at frame rate and are interpolated at
sample rate inside the function body. Note in particular the computation of the phase vector within
each frame.

We can finally listen to a sinudoidal oscillator controlled both in amplitude and in frequency.

M-1.5
Synthesize a sinusoid modulated both in amplitude and frequency, using the functions sinosc and envgen.

M-1.5 Solution
global Fs; global SpF; %global variables: sample rate, samples-per-frame

Fs=22050;
framelength=0.01; %frame length (in s)
SpF=round(Fs*framelength); %samples per frame

%%% define controls %%%


a=envgen([0,.2,3,3.5,4],[0,1,.8,.5,0],’linear’); %ADSR amp. envelope
f=envgen([0,.2,3,4],[200,250,250,200],’linear’); %pitch envelope
f=f+max(f)*0.05*... %pitch envelope with vibrato added
sin(2*pi*5*(SpF/Fs)*[0:length(f)-1]).*hanning(length(f))’;

%%% compute sound %%%


s=sinosc(0,a,f,0);

Amplitude a and frequency f control signals are shown in Fig. 1.6.

This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license,


⃝2005-2019
c by the authors except for paragraphs labeled as adapted from <reference>
Chapter 1. Fundamentals of digital audio processing 1-17

1
260
amplitude (adim)

0.8

frequency (Hz)
240
0.6

220
0.4

0.2 200

0 180
0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4
t (s) t (s)
(a) (b)

Figure 1.6: Amplitude (a) and frequency (b) control signals

1.3.2 Noise generators


Up to now, we have considered signals whose behavior at any instant is supposed to be perfectly know-
able. These signals are called deterministic signals. Besides these signals, random signals of unknown or
only partly known behavior may be considered. For random signals, only some general characteristics,
called statistical properties, are known or are of interest. The statistical properties are characteristic of
an entire signal class rather than of a single signal. A set of random signals is represented by a random
process. Particular numerical procedures simulate random processes, producing sequences of random
(or more precisely, pseudorandom) numbers.
Random sequences can be used both as signals (i.e., to produce white or colored noise used as input
to a filter) and a control functions to provide a variety in the synthesis parameters most perceptible by the
listener. In the analysis of natural sounds, some characteristics vary in an unpredictable way; their mean
statistical properties are perceptibly more significant than their exact behavior. Hence, the addition of a
random component to the deterministic functions controlling the synthesis parameters is often desirable.
In general, a combination of random processes is used because the temporal organization of the musical
parameters often has a hierarchical aspect. It cannot be well described by a single random process, but
rather by a combination of random processes evolving at different rates. For example this technique is
employed to generate 1/f noise.

1.3.2.1 White noise generators


The spread part of the spectrum is perceived as random noise. In order to generate a random sequence, we
need a random number generator. There are many algorithms that generate random numbers, typically
uniformly distributed over the standardized interval [0, 1). However it is hard to find good random
number generators, i.e. that pass all or most criteria of randomness. The most common is the so called
linear congruential generator. It can produce fairly long sequences of independent random numbers,
typically of the order of two billion numbers before repeating periodically. Given an initial number
(seed) I[0] inn the interval 0 ≤ I[0] < M , the algorithm is described by the recursive equations

I[n] = ( aI[n − 1] + c ) mod M (1.49)


u[n] = I[n]/M

where a and c are two constants that should be chosen very carefully in order to have a maximal length
sequence, i.e. long M samples before repetition. The actual generated sequence depends on the initial

This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license,


⃝2005-2019
c by the authors except for paragraphs labeled as adapted from <reference>
1-18 Algorithms for Sound and Music Computing [v.February 2, 2019]

value I[0]; that is way the sequence is called pseudorandom. The numbers are uniformly distributed
over the interval 0 ≤ u[n] < 1. The mean is E[u] = 1/2 and the variance is σu2 = 1/12. The
transformation s[n] = 2u[n] − 1 generates a zero-mean uniformly distributed random sequence over the
interval [−1, 1). This sequence corresponds to a white noise signal because the generated numbers are
mutually independent. The power spectral density is given by S(f ) = σu2 . Thus the sequence contains
all the frequencies in equal proportion and exhibits equally slow and rapid variation in time.
With a suitable choice of the coefficients a and b, it produces pseudorandom sequences with flat
spectral density magnitude (white noise). Different spectral shapes ca be obtained using white noise as
input to a filter.

M-1.6
A method of generating a Gaussian distributed random sequence is based on the central limit theorem, which
states that the sum of a large number of independent random variables is Gaussian. As exercise, implement a
very good approximation of a Gaussian noise, by summing 12 independent uniform noise generators.

If we desire that the numbers vary at a slower rate, we can generate a new random number every d
sampling instants and hold the previous value in the interval (holder) or interpolate between two succes-
sive random numbers (interpolator). In this case the power spectrum is given by

σu2
S(f ) = |H(f )|2
d
with
sin(πf d/Fs )
|H(f )| =
sin(πf /Fs )
for the holder and [ ]
1 sin(πf d/Fs ) 2
|H(f )| =
d sin(πf /Fs )
for linear interpolation.

1.3.2.2 Pink noise generators


1/f noise generators A so-called pink noise is characterized by a power spectrum that fall in frequency
like 1/f :
A
S(f ) = . (1.50)
f
For this reason pink noise is also called 1/f noise. To avoid the infinity at f = 0, this behaviour is
assumed valid for f ≥ fmin , where fmin is a desired minimum frequency. The spectrum is characterized
by a 3 db per octave drop, i.e. S(2f ) = S(f )/2. The amount of power contained within a frequency
interval [f1 , f2 ] is
∫ f2 ( )
f1
S(f )df = A ln
f1 f2
This implies that the amount of power in any octave is the same. 1/f noise is ubiquitous in nature and is
related to fractal phenomena. In audio domain it is known as pink noise. It represents the psychoacoustic
equivalent of the white noise because he approximately excites uniformly the critical bands. The physical
interpretation is a phenomenon that depends on many processes that evolve on different time scales. So
a 1/f signal can be generated by the sum of several white noise generators that are filtered through
first¡order filters having the time constants that are successively larger and larger, forming a geometric
progression.

This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license,


⃝2005-2019
c by the authors except for paragraphs labeled as adapted from <reference>
Chapter 1. Fundamentals of digital audio processing 1-19

M-1.7
In the Voss 1/f noise generation algorithm, the role of the low pass filters is played by the hold filter seen in the
previous paragraph. The 1/f noise is generated by taking the average of several periodically held generators
yi [n], with periods forming a geometric progression di = 2i , i.e.

1 ∑
M
y[n] = yi [n] (1.51)
M i=1

The power spectrum does not have an exact 1/f shape, but it is close to it for frequencies f ≥ Fs /2M . As
exercise, implement a 1/f noise generator and use it to assign the pitches to a melody.

M-1.8
The music derived from the 1/f noise is closed to the human music: it does not have the unpredictability and
randomness of white noise nor the predictability of brown noise. 1/f processes correlate logarithmically with the
past. Thus the averaged activity of the last ten events has as much influence on the current value as the last
hundred events, and the last thousand. Thus they have a relatively long-term memory.
1/f noise is a fractal one; it exhibits self-similarity, one property of the fractal objects. In a self-similar sequence,
the pattern of the small details matches the pattern of the larger forms, but on a different scale. In this case,
is used to say that 1/f fractional noise exhibits statistical self-similarity. The pink noise algorithm for generating
pitches has become a standard in algorithmic music. Use the 1/f generator developed in M-1.7 to produce a
fractal melody.

1.4 Spectral analysis of discrete-time signals


Spectral analysis is one of the powerful analysis tool in several fields of engineering. The fact that we
can decompose complex signals with the superposition of other simplex signals, commonly sinusoid or
complex exponentials, highlights some signal features that sometimes are very hard to discover other-
wise. Furthermore, the decomposition on simpler functions in the frequency domain is very useful when
we want to perform modifications on a signal, since it gives the possibility to manipulate single spectral
components, which is hard if not impossible to do on the time-domain waveform.
A rigorous and comprehensive tractation of spectral analysis is out the scope of this book. In this
section we introduce the Discrete-Time Fourier Transform (DTFT), which the discrete-time version of
the classical Fourier Transform of continuous-time signals. Using the DTFT machinery, we then discuss
briefly the main problems related to the process of sampling a continuous-time signal, namely frequency
aliasing. This discussion leads us to the sampling theorem.

1.4.1 The discrete-time Fourier transform


1.4.1.1 Definition
Recall that for a continuous-time signal x(t) the Fourier Transform is defined as:
∫ +∞ ∫ +∞
−j2πf t
F{x}(ω) = X(ω) = x(t)e dt = x(t)e−jωt dt (1.52)
−∞ −∞

where the variable f is frequency and is expressed in Hz, while the angular frequency ω has been defined
as ω = 2πf and expressed in radians/s. Note that we are following the conventional notation by which
time-domain signals are denoted using lowercase symbols (e.g., x(n)) while frequency-domain signals
are denoted in uppercase (e.g., X(ω)).
We can try to find an equivalent expression in the case of a discrete-time signal x[n]. If we think
of x[n] as the sampled version of a continuous-time signal x(t) with a sampling interval Ts = 1/Fs ,

This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license,


⃝2005-2019
c by the authors except for paragraphs labeled as adapted from <reference>
1-20 Algorithms for Sound and Music Computing [v.February 2, 2019]

i.e. x[n] = x(nTs ), we can define the discrete-time Fourier transform (DTFT) starting from Eq. (1.52)
where the integral is substituted by a summation:

+∞
−j2πf Fn

+∞
F{x}(ωd ) = X(ωd ) = x(nTs )e s = x[n]e−jωd n . (1.53)
n=−∞ n=−∞

There are two remarks to be made about this equation. First, we have omitted the scaling factor Ts in
front of the summation, which would be needed to have a perfect correspondence with Eq. (1.52) but
is irrelevant to our tractation. Second, we have defined a new variable ωd = 2πf /Fs : we call this the
normalized (or digital) angular frequency. This is not to be confused with the angular frequency ω used
in Eq. (1.52): ωd is measured in radians/sample, and varies in the range [−2π, 2π] when f varies in the
range [−Fs , Fs ]. In this book we use the notation ω to indicate the angular frequency in radians/s, and
ωd to indicate the normalized angular frequency in radians/sample.
As one can verify from Eq. (1.53), X(ωd ) is a periodic function in ωd with a period 2π. Note that
this periodicity of 2π in ωd corresponds to a periodicity of Fs in the domain of the absolute-frequency
f . Moreover X(ωd ) is in general a complex function, and can thus be written in terms of its real and
imaginary parts, or alternatively in polar form as

X(ωd ) = | X(ωd ) | earg[X(ωd )] , (1.54)

where | X(ωd ) | is the magnitude function and arg[X(ωd )] is the phase function. Both are real-valued
functions. Given the 2π periodicity of X(ωd ) we will arbitrarily assume that −π < arg[X(ωd )] < π.
We informally refer to | X(ωd ) | also as the spectrum of x[n].
The inverse discrete-time Fourier transform (IDTFT) is found by observing that Eq. (1.53) represents
the Fourier series of the periodic function X(ωd ). As a consequence, one can apply Fourier theory for
periodic functions of continuous variables, and compute the Fourier coefficients x[n] as
∫ π
−1 1
F {X}[n] = x[n] = X(ωd )ejωd n dωd . (1.55)
2π −π
Equations (1.53) and (1.55) together form a Fourier representation for the sequence x[n]. Equation (1.55)
can be regarded as a synthesis formula, since it represents x[n] as a superposition of infinitesimally
small complex sinusoids, with X(ωd ) determining the relative amount of each sinusoidal component.
Equation (1.53) can be regarded as an analysis formula, since it provides an expression for computing
X(ωd ) from the sequence x[n] and determining its sinusoidal components.

1.4.1.2 DTFT of common sequences


We can apply the DTFT definition to some of the sequences that we have examined. The DTFT of the
unit impulse δ[n] is the constant 1:

+∞
F{δ}(ωd ) = δ[n]e−jωd n = 1. (1.56)
n=−∞

The unit step sequence u[n] does not have a DTFT, because the sum in Eq. (1.53) takes infinite values.
The exponential sequence (1.9) also does not admit a DTFT. However if we consider the right sided
exponential sequence x[n] = an u[n], in which the unit step is multiplied by an exponential with | a | < 1,
then this admits a DTFT:

+∞ ∑
+∞
( )n 1
−jωd n
F{x}(ωd ) = n
a u[n]e = ae−jωd = . (1.57)
n=−∞
1 − ae−jωd
n=0

This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license,


⃝2005-2019
c by the authors except for paragraphs labeled as adapted from <reference>
Chapter 1. Fundamentals of digital audio processing 1-21

Property Time-domain sequences Frequency-domain DTFTs


x[n], y[n] X(ωd ), Y (ωd )
Linearity ax[n] + by[n] aX(ωd ) + bY (ωd )
Time-shifting x[n − n0 ] e−jωd n0 X(ωd )
Frequency-shifting ejω0 n x[n] X(ωd − ω0 )
dX
Frequency differentation nx[n] j dω d
(ωd )
Convolution (x ∗ y)[n] X(ωd ) · Y (ωd )
∫π
Multiplication x[n] · y[n] 1
2π −π X(θ)Y (ωd − θ)dθ

+∞ ∫ π
1
Parseval relation x[n]y ∗ [n] = X(ωd )Y ∗ (ωd )dωd
n=−∞
2π −π

Table 1.1: General properties of the discrete-time Fourier transform.

The complex exponential sequence x[n] = ejω0 n or the real sinusoidal sequence x[n] = cos(ω0 n+ϕ)
are other examples of sequences that do not have a DTFT, because the sum in Eq. (1.53) takes infinite
values. In general a sequences does not necessarily admit a Fourier representation, meaning with this
that the series in Eq. (1.53) may not converge. One can show that x[n] being absolutely summable
(we have defined absolute summability in Eq. (1.32)) is a sufficient condition for the convergence of
the series (recall the definition of absolute summability given in Eq. (1.32)). Note that an ∑ absolutely
summable
∑ sequence has always finite energy, and that the opposite is not always true, since | x[n] |2 ≤
( | x[n] |)2 . Therefore a finite-energy sequence does not necessarily admit a Fourier representation.2

1.4.1.3 Properties
Table 1.1 lists a number of properties of the DTFT which are useful in digital signal processing applica-
tions. Time- and frequency-shifting are interesting properties in that they show that a shifting operation
in either domain correspond to multiplication for an complex exponential function in the other domain.
Proof of these properties is straightforward from the definition of DTFT.
The convolution property is extremely important: it says that a convolution in the time domain be-
comes a simple multiplication in the frequency domain. This can be demonstrated as follows:
( +∞ )

+∞ ∑ ∑
+∞ ∑
+∞
F{x ∗ y}(ωd ) = x[k]y[n − k] e−jωd n = x[k]y[m]e−jωd (k+m)
n=−∞ k=−∞ m=−∞ k=−∞
(1.58)

+∞ ∑
+∞
= x[k]e−jωd k · y[m]e−jωd m ,
k=−∞ m=−∞

where in the second equality we have substituted m = n − k. The multiplication property is dual to
the convolution property: a multiplication in the time-domain becomes a convolution in the frequency
domain.
The Parseval relation is also very useful: if we think of the sum on the left-hand side as an inner
product between the sequences x and y, we can restate this property by saying that the DTFT preserves
2
For non-absolutely summable sequences like the unit step or the sinusoidal sequence, the DTFT can still be defined if we
resort to the Dirac delta δD (ωd − ω0 ). Since this is not a function but rather a distribution, extending the DTFT formalism to
non-summable sequences requires to dive into the theory of distributions, which we are not willing to do.

This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license,


⃝2005-2019
c by the authors except for paragraphs labeled as adapted from <reference>
1-22 Algorithms for Sound and Music Computing [v.February 2, 2019]

1 x1(t)

0.5 x2(t)
amplitude

x3(t)
0
x1,2,3[n]
−0.5

−1
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
t (s)

Figure 1.7: Example of frequency aliasing occurring for three sinusoids.

the inner product (apart from the scaling factor 1/2π). In particular, when x = y, it preserves the energy
of the signal x. The Parseval relation can be demonstrated by noting that the DTFT of the sequence
y ∗ [−n] is Y ∗ (ωd ). Then we can write:
∫ π ∑
+∞
−1 ∗ 1
F {XY }[n] = X(ωd )Y ∗ (ωd )ejωd n dωd = x[k]y ∗ [k − n], (1.59)
2π −π k=−∞

where in the first equality we have simply used the definition of the IDTFT, while in the second equality
we have exploited the convolution property. Evaluating this expression for n = 0 proves the Parseval
relation.

1.4.2 The sampling problem


1.4.2.1 Frequency aliasing
With the aid of the DTFT machinery, we can now go back to the concept of “sampling” and introduce
some fundamental notions. Let us start with an example.
Consider three continuous-time sinusoids xi (t) (i = 1, 2, 3) defined as

x1 (t) = cos(6πt), x2 (t) = cos(14πt), x3 (t) = cos(26πt). (1.60)

These sinusoids have frequencies 3, 7, and 13 Hz, respectively. Now we construct three sequences
xi [n] = xi (n/Fs ) (i = 1, 2, 3), each obtained by sampling one of the above signals, with a sampling
frequency Fs = 10 Hz. We obtain the sequences

x1 [n] = cos(0.6πn), x2 [n] = cos(1.4πn), x3 [n] = cos(2.6πn). (1.61)

Figure 1.7 shows the plots of both the continuous-time sinusoids and the sampled sequences: note that
all sequences have exactly the same sample values for all n, i.e. they actually are the same sequence.
This phenomenon of a higher frequency sinusoid acquiring the identity of a lower frequency sinusoid
after being sampled is called frequency aliasing.
In fact we can understand the aliasing phenomenon in a more general way using the Fourier theory.
Consider a continuous-time signal x(t) and its sampled version xd [n] = x(nTs ). The we can prove
that the Fourier Transform X(ω) of x(t) and the DTFT Xd (ωd ) of xd [n] are related via the following
equation:

+∞
Xd (ωd ) = Fs X(ωd Fs + 2mπFs ). (1.62)
m=−∞

This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license,


⃝2005-2019
c by the authors except for paragraphs labeled as adapted from <reference>
Chapter 1. Fundamentals of digital audio processing 1-23

This equation tells a fundamental result: Xd (ωd ) is a periodization of X(ω), i.e. it is a periodic func-
tion (of period 2π) made of a sum of shifted and scaled replicas of X(ω). The terms of this sum for
m ̸= 0 are aliasing terms and are said to alias into the so-called base band [−πFs , πFs ]. There-
fore if two continuous-time signals x1 (t), x2 (t) have Fourier transforms with the property X2 (ω) =
X1 (ω + 2mπFs ) for some m ∈ Z, sampling these signals will produces identical DTFTs and therefore
indentical sequences. This is the case of the sinusoids in Eq. (1.61).
In the remainder of this section we provide a proof of Eq. (1.62). We first write x(t) and xd [n] in
terms of their Fourier transforms:
∫ +∞ ∫ +π
1 jωt 1
x(t) = X(ω)e dω, xd [n] = Xd (ωd )ejωd n dωd . (1.63)
2π −∞ 2π −π
The first integral can be broken up into an infinite sum of integrals computed on the disjoint intervals
[(2m − 1)πFs , (2m + 1)πFs ], (with m ∈ Z) each of length 2πFs . Then
+∞ ∫ (2m+1)πFs
1 ∑
x(t) = X(ω)ejωt dω
2π m=−∞ (2m−1)πFs
∫ πFs (1.64)
1 ∑
+∞
= ejθt X(θ + 2mπFs )ej2mπFs t dθ,
2π −πFs m=−∞

where in the second equality we have substituted ω = θ + 2mπFs in the integral, and we have swapped
the integral and the series. If we sample this representation to obtain xd [n] = x(nTs ), we can write
∫ πFs ∑
+∞
1
xd [n] = x(nTs ) = ejθnTs X(θ + 2mπFs )ej2mπFs nTs dθ
2π −πFs m=−∞
∫ ( +∞ ) (1.65)
1 πFs ∑
= ejθnTs X(θ + 2mπFs ) dθ,
2π −πFs m=−∞

because the exponentials inside the sum are all equal to 1 (ej2mπFs nTs = ej2nmπ = 1). If we finally
substitute ωd = θTs we obtain
∫ ( +∞ )
Fs +π jωd n ∑
xd [n] = e X(ωd Fs + 2mπFs ) dωd , (1.66)
2π −π m=−∞

which proves Eq. (1.62).

1.4.2.2 The sampling theorem and the Nyquist frequency


Consider the three cases depicted in Fig. 1.8. The magnitude of the Fourier transform in Fig. 1.8(a) (upper
panel) is zero everywhere outside the base band, and Eq. (1.62) tells us that the magnitude of the sampled
signal looks like the plot in the lower panel. In Fig. 1.8(b) (upper panel) we have a similar situation except
that the magnitude is non-zero in the band [πFs , 3πFs ]. The magnitude of the corresponding sampled
signal then looks like the plot in the lower panel, and is identical to the one in Fig. 1.8(a). Yet another
situation is depicted in Fig. 1.8(c) (upper panel): in this case we are using a smaller sampling frequency
Fs , so that the magnitude now extends to more than one band. As a result the shifted replicas of | X |
overlap and | Xd | is consequently distorted.
These examples suggest that a “correct” sampling of a continuos signal x(t) corresponds to the
situation of Fig. 1.8(a), while for the cases depicted in Figs. 1.8(b) and 1.8(c) we loose information about

This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license,


⃝2005-2019
c by the authors except for paragraphs labeled as adapted from <reference>
1-24 Algorithms for Sound and Music Computing [v.February 2, 2019]

X(ω) X(ω) X(ω)

−πFs πFs ω −πFs πFs ω −πFs πFs ω

X d (ωd ) X d (ωd ) X d (ωd )

... ... ... ... ... ...

ωd Fs ωd Fs ωd Fs

(a) (b) (c)

Figure 1.8: Examples of sampling a continuous time signal: (a) spectrum limited to the base band; (b)
the same spectrum shifted by 2π; (c) spectrum larger than the base band.

the original signal. The sampling theorem formalizes this intuition by saying that x(t) can be exactly
reconstructed from its samples x[n] = x(nTs ) if and only if X(ω) = 0 outside the base band (i.e. for
all | ω | ≥ π/Fs ). The frequency fNy = Fs /2 Hz, corresponding to the upper limit of the base band, is
called Nyquist frequency.
Based on what we have just said, when we sample a continuous-time signal we must choose Fs in
such a way that the Nyquist frequency is above any frequency of interest, otherwise frequencies above
fNy will be aliased. In the case of audio signals, we know from psychoacoustics that humans perceive
audio frequencies up to ∼ 20 kHz: therefore in order to guarantee that no artifacts are introduced by
the sampling procedure we must use Fs > 40 kHz, and in fact the most diffused standard is Fs =
44.1 kHz. In some specific cases we may use lower sampling frequencies: as an example it is known
that the spectrum of a speech signal is limited to ∼ 4 kHz, and accordingly the most diffused standard in
telephony is Fs = 8 kHz.
In the remainder of this section we sketch the proof of the sampling theorem. If X(ω)) ̸= 0 only in
the base band, then all the sum terms in Eq. (1.62) are 0 except for the one with m = 0. Therefore
Xd (ωd ) = Fs X(ωd Fs ) for ωd ∈ (−π, π). (1.67)
In order to reconstruct x(t) we can take the inverse Fourier Transform:
∫ +∞ ∫ πFs ∫ +πFs ( )
1 jωt 1 jωt 1 ω
x(t) = X(ω)e dω = X(ω)e dω = Xd ejωt dω. (1.68)
2π −∞ 2π −πFs 2πFs πFs Fs
where in the second equality we have exploited the hypothesis X ≡ 0 outside the base band and in the
third one we have used Eq. (1.67). If we now reapply the definition of the DTFT we obtain
∫ πFs [ ∑+∞
]

+∞ ∫
1 −jωTs n jωt x(nTs ) πFs jω(t−nTs )
x(t) = x(nTs ))e e dω = e dω, (1.69)
2πFs −πFs n=−∞ n=−∞
2πFs −πFs

where in the second equality we have swapped the sum with the integral. Now look at the integral on the
right hand side. We can solve it explicitly and write
∫ πFs [ ]
1 1 2
ejω(t−nTs ) dω = ejπFs (t−nTs ) − e−jπFs (t−nTs ) =
2πFs −πFs 2πFs 2j(t − nTs )
(1.70)
sin[πFs (t − nTs )]
= = sinc[Fs (t − nTs )].
πFs (t − nTs )

This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license,


⃝2005-2019
c by the authors except for paragraphs labeled as adapted from <reference>
Chapter 1. Fundamentals of digital audio processing 1-25

That is, the integral is a cardinal sine function, defined as sinc(t) ≜ sin(πt)/πt (the use of π in the
definition has the effect that the sinc function has zero crossings on the non-zero integers). In conclusion,
we can rewrite Eq. (1.68) as


+∞ ( )
t
x(t) = x(nTs ) sinc −n . (1.71)
n=−∞
Ts

We have just proved that if X ≡ 0 outside the base band then x(t) can be reconstructed from its samples
through Eq. (1.71). The opposite implication is obvious: if x(t) can be reconstructed through its samples
it must be true that X ≡ 0 outside the base band, since a sampled signal only supports frequencies up to
fNy by virtue of Eq. (1.62).

1.5 Short-time Fourier analysis


In these section we introduce the most common spectral analysis tool: the Short Time Fourier Transform
(STFT). Sounds are time-varying signals, therefore, it is important to develop analysis techniques to in-
spect some of their time-varying features. The STFT allows joint analysis of the temporal and frequency
features of the sound signal, in other words it allows to follow the temporal evolution of the spectral
parameters of a sound. The main building block of the STFT is the Discrete Fourier Transform (DFT),
which can be thought as a specialization of the DTFT for sequences of finite length.

1.5.1 The Discrete Fourier Transform


The Discrete Fourier Transform (DFT) is a special case of the DTFT applied to finite-length sequences.
As such it is a useful tool for representing periodic sequences. as we said in the previous section, a
periodic sequence does not have a DTFT in a strict sense. However periodic sequences are in one-to-one
correspondence with finite-length sequences, meaning with this that a finite-length sequence can be taken
to represent a period of a periodic sequence.

1.5.1.1 Definitions and properties

The Discrete Fourier Transform (DFT) is a special case of the DTFT applied to finite-length sequences
x[n] with 0 ≤ n ≤ N − 1. Let us consider one such sequence: we define the DFT of x[n] as the
sequence X[k] obtained by uniformly sampling the DTFT X(ωd ) on the ωd -axis between 0 ≤ ωd < 2π,
at points at ωk = 2πk/N , 0 ≤ k ≤ N − 1. If x[n] has been sampled from a continuous-time signal, i.e.
x[n] = x(nTs ), the points ωk correspond to the frequency points fk = kFs /N (in Hz).
From Eq. (1.53) one can then write


N −1 ∑
N −1
x[n]e−j N kn =

X[k] = X(ωd )|ωd =2πk/N = x[n]WNkn , 0≤k ≤N −1 (1.72)
n=0 n=0

where we have used the notation WN = e−j2π/N . Note that the DFT is also a finite-length sequence in
the frequency domain, with length N . The inverse discrete Fourier Transform (IDFT) is given by

N −1
1 ∑
x[n] = X[k]WN−kn , 0 ≤ n ≤ N − 1. (1.73)
N
k=0

This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license,


⃝2005-2019
c by the authors except for paragraphs labeled as adapted from <reference>
1-26 Algorithms for Sound and Music Computing [v.February 2, 2019]

This relation can be verified by multiplying both sides by WNln , with l integer, and summing the result
from n = 0 to n = N − 1:
−1 N −1 N −1 N −1
[N −1 ]
N∑ 1 ∑ ∑ −(k−l)n 1 ∑ ∑ −(k−l)n
ln
x[n]WN = X[k]WN = X[k] WN , (1.74)
N N
n=0 n=0 k=0 k=0 n=0

where the last equality has been obtained by interchanging the order of summation. Now, the summation
∑N −1 −(k−l)n
n=0 WN has the interesting property that it takes the value N when k − l = rN (r ∈ Z), and
takes the value 0 for any other value of k and l. Therefore Eq. (1.74) reduces to the definition of the DFT,
and therefore Eq. (1.73) is verified.
We have just proved that, for a length-N sequence x[n], the N values of its DTFT X(ωd ) at points
ωd = ωk are sufficient to determine x[n], and hence X(ωd ), uniquely. This justifies our definition of
Discrete Fourier Transform of finite-length sequences given in Eq. (1.72). The DFT is at the heart of
digital signal processing, because it is a computable transformation.
Most of the DTFT properties listed in Table 1.1 have a direct translation for the DFT. Clearly the DFT
is linear. The time- and frequency-shifting properties still correspond to a multiplication by a complex
number, however these properties becomes periodic with period N . As an example, the time shifting
properties for the DFT becomes

xm [n] = x[n − m] ⇒ Xm [k] = WNkm X[k]. (1.75)

Clearly any shift of m + lN samples cannot be distinguished from a shift by m samples, since WNkm =
k(m+lN )
WN . In other words, the ambiguity of the shift in the time domain has a direct counterpart in the
frequency domain.
The convolution property also holds for the DFT and is stated as follows:


N −1
z[n] = (x ∗ y)[n] ≜ x[n]y[n − m] ⇒ Z[k] = (X · Y )[k], (1.76)
m=0

where in this case the symbol ∗ indicates the periodic convolution. The proof of this property is similar
to the one given for the DTFT.

1.5.1.2 Resolution, leakage and zero-padding


Consider the complex exponential sequence x[n] = ejω0 n over a finite number of points 0 ≤ n < N .
In Sec. 1.2 we have already shown that this sequence is periodic over the interval [0, N ] only if ω0 N =
2πk for some integer k. This implies that there are exactly N periodic complex exponential sequences
representable with N samples, i.e. those for which ω0 = 2πk0 /N , with k0 = 0 . . . N − 1: these are
the sequences x[n] = ej2πk0 n/N = WN−k0 n . From the definitions of the DFT and the IDFT it follows
immediately that X[k] = δ(k − k0 ), i.e. the DFT associated to the sequence WN−k0 n takes the value 1
for k = k0 and is zero everywhere else. As an example, Fig. 1.9(a) shows a 64-points DFT (computed
numerically) for the sequence x[n] = WN−k0 n for k0 = 20.
Since the resolution of the DFT is limited by the number of DFT points, one may wonder how the
DFT looks like for complex exponential sequences that are not periodic over the interval [0, N ]. An
example is shown in Fig. 1.9(b) for the complex exponential sequence with ω0 = 2πk0 /N , where we
have used a non-integer value k0 = 20.5 and N = 64. Although the value of ω0 is very similar to the
one in Fig. 1.9(a), the DFT looks very different. This happens because we have chosen a value for ω0
that falls in the crack between two DFT points, and consequently the DFT does a poor job in resolving

This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license,


⃝2005-2019
c by the authors except for paragraphs labeled as adapted from <reference>
Chapter 1. Fundamentals of digital audio processing 1-27

60 60 60

40 40 40
|X[k]|

|X[k]|

|X[k]|
20 20 20

0 0 0
0 20 40 60 0 20 40 60 0 50 100
Frequency index k Frequency index k Frequency index k

(a) (b) (c)

Figure 1.9: Examples of DFT applied to complex exponential sequences: (a) N = 64, k0 = 20, the
sequence is periodic and the DFT is a delta-sequence in the frequency domain; (b)N = 64, k0 = 20.5,
the sequence is not periodic and the DFT exhibits leakage; (c) N = 128, k0 = 41, the sequence is the
windowed exponential of Eq. (1.77) and the DFT is a shifted version of the rectangular window DFT.

the frequency of this particular signal. We call this effect leakage: since ω0 does not line up with one of
the “allowed” frequencies, the energy of the DFT leaks all over the base band.
In order to understand the leakage effect we now look at a third example. Figure 1.9(c) depicts the
DFT of the sequence {
ej2πk0 n/N , 0 ≤ n < N2 ,
x[n] = (1.77)
2 ≤ n < N,
N
0,
with k0 = 41 and N = 128. In other words, the sequence x[n] is constructed by taking the complex
exponential sequence ej2πk0 n/N over 64 points, and by zero-padding this sequence over the remaining
64 points. Note that the complex exponential sequence of this example is clearly the same that we have
considered in Fig. 1.9(b) (we have simply doubled the values of k0 and N , so that ω0 has the same
value), and is clearly periodic over N = 128 points. Note also that the DFTs of Fig. 1.9(b) and 1.9(c)
are identical, except that the one in Fig. 1.9(c) has twice the points and consequently a better resolution
in frequency.
The sequence x[n] of Eq. (1.77) can be also written as
{
−k0 n 1, 0 ≤ n < N2 ,
x[n] = e j2πk0 n/N
· wN/2 [n] = WN wN/2 [n], where wN/2 [n] = (1.78)
0, N2 ≤ n < N,

where wN/2 [n] is a rectangular window of length N/2. More in general, we call wM [n] a rectangular
window of length M .
The advantage of this representation is that we can now compute explicitely the DFT X[k], since we
know that multiplying by WN−k0 n in time correspoinds to shifting by k0 samples in frequency. Therefore
X[k] equals the DFT of wN/2 [h], shifted by k0 samples. The DFT of the generic rectangular window
wM [n] can be computed from the definition as


N −1 ∑
M −1
1 − WNkM −j πk
kM
(M −1) sin(π N )
F{wM }[k] = wM [n]WNkn = WNkn = = e N , (1.79)
n=0 n=0
1 − WNk sin(π Nk )
∑ −1 k
where in the third equality we have used the property of the geometric series M M
k=0 q = (1−q )/(1−
q), and in the fourth equality we have applied some straigthforward trigonometry. Figure 1.10 shows
three plots of this DFT, for different window lengths M . In particular Fig. 1.10(c) shows the case

This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license,


⃝2005-2019
c by the authors except for paragraphs labeled as adapted from <reference>
1-28 Algorithms for Sound and Music Computing [v.February 2, 2019]

60 60 60
Magnitude

Magnitude

Magnitude
40 40 40

20 20 20

0 0 0
0 50 100 0 50 100 0 50 100
Frequency index k Frequency index k Frequency index k

(a) (b) (c)

Figure 1.10: Examples of DFT (N = 128) applied to a rectangular window: (a) M = N/16, (b)
M = N/8, (a) M = N/2.

M = N/2: note that this plot coincides with the one in Fig. 1.9(c), except for a shift in frequency,
which is what we expected.
To summarize, in this section we have shown that (a) the frequency resolution of the DFT is limited
by the number N of DFT points, (b) the DFT of a complex exponential sequence which is not periodic
over N points exhibits poor resolution and leakage, and (c) the leakage effect can be interpreted as the
effect of a rectangular window of length N applied to the sequence.

M-1.9
Compute the DFT of complex exponential sequence x[n] = ej2πk0 n/N for integer and non-integer values of k0 , in
order to explore leakage effects. Then ompute the DFT of zero-padded complex exponential sequences, in order
to see how frequency resolution and leakage are affected.

1.5.1.3 Fast computation of the DFT: the FFT algorithm


( )
A brute-force approach to DFT computation has O N 2 running time, since we need to perform O(N )
multiplications and additions to compute each of the N DFT samples: we need more efficient solutions
to the problem. The Fast Fourier Transform (FFT) algorithm provides the most efficient computation of
the DFT, as it runs in O(N log N ) times. The algorithm is based on a divide-and-conquer strategy, that
allows to compute the DFT of x[n] using the DFTs of two half-length subsequences of x[n]. Additionally
it exploits some nice properties of the N -th roots of unity WNk .
Assume for simplicity that the sequence x[n] has length N = 2s for some s ∈ N. Then we can write
the DFT sequence X[k] as

−1
N
−1
N

2 ∑
2
k(2n+1)
X[k] = x[2n]WN2kn + x[2n + 1]WN
n=0 n=0
−1
N
−1
N

2 ∑
2

= x[2n]WN2kn + WNk x[2n + 1]WN2kn (1.80)


n=0 n=0
N
−1 −1
N

2 ∑
2
kn
= x[2n]WN/2 + WNk kn
x[2n + 1]WN/2 ,
n=0 n=0

where in the first equality we have separated the terms involving even elements and odd elements, in
the second one we have factorized a term WNk , and in the third one we have exploited the property

This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license,


⃝2005-2019
c by the authors except for paragraphs labeled as adapted from <reference>
Chapter 1. Fundamentals of digital audio processing 1-29

WaNak = W k for any a ∈ N. If we now look at the two sums in the last row, we see that they are the DFTs
N
of the sequences x0 [n] = {x[0], x[2], . . . x[N − 2]} and x1 [n] = {x[1], x[3], . . . x[N − 1]}, respectively.
Both x0 [n] and x1 [n] have length N/2. Therefore the problem of computing X[k] reduces to the problem
of computing two half-length DFTs X0 [k], Xi [k], and then summing their values according to Eq. (1.80).
The resulting computational procedure is detailed in Algorithm 1.1. It is quite obvious that the running
time T (N ) of this algorithm is given by the recurrence equation T (N ) = 2T (N/2) + O(N ), therefore
T (N ) = O(N log N ).

Algorithm 1.1: RECURSIVE-FFT(x)


1 N ← length(x) // N is a power of 2
2
3 if n = 1 then return x
4 WN ← e−2πj/N
5 W ←1
6 x0 [n] ← {x[0], x[2], . . . x[N − 2]}
7 x1 [n] ← {x[1], x[3], . . . x[N − 1]}
8 X0 [k] ←RECURSIVE-FFT(x0 )
9 X1 [k] ←RECURSIVE-FFT(x1 )
10 for k ← 0 to N/2 − 1 do
11 X[k] ← X0 [k] + W X1 [k]
12 X[k + N/2] ← X0 [k] − W X1 [k]
13 W ← W · WN
14 return X

M-1.10
Realize Algorithm 1.1 and assess its functioning by comparing it with the fft function.

1.5.1.4 Iterative FFT algorithms and parallel realizations


Note that in writing the last cycle of Algorithm 1.1 we have exploited a relevant property of the WNk
(k+N/2)
coefficients, namely WN = −WNk . Thanks to this property the value W X1 [k] is used twice (it is
a common subexpression): first it is added to X0 [k] to compute X[k], then it is subtracted from X0 [k] to
compute X[k + N/2]. This is known as butterfly operation, and is the key element in the construction of
a more efficient, iterative implementation of the FFT algorithm.
Figure 1.11(a) depicts the tree of calls in an invocation of the recursive algorithm, in the case N = 8.
Looking at the tree we observe that, if we could arrange the elements in the order in which they appear in
the leaves, we could compute the DFT as follows: first we take the elements in pairs and combine each
pair with one butterfly operation, thus obtaining four 2-element DFTs; second, we take these DFTs in
pairs and combine each pair with two butterfly operations, thus obtaining two 4-element DFTs; finally,
we combine these two DFTs with four butterfly operations, thus obtaining the final 8-element DFT. The
resulting scheme is an iterative FFT implementation.
The only problem left is how to arrange the elements in the order in which they appear in the leaves.
Luckily the solution is straightforward: this order is a bit-reversal permutation, that is x[n] is placed in the
position obtained by reversing the bits of the binary representation of n. We can then write Algorithm 1.2.
Clearly the algorithm is still O(N log N ), since the total number of butterfly operations is O(N log N ),
and since the bit-reversal permutation is also a O(N log N ) procedure (we have to reverse N integers,

This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license,


⃝2005-2019
c by the authors except for paragraphs labeled as adapted from <reference>
1-30 Algorithms for Sound and Music Computing [v.February 2, 2019]

each made of log N bits). Figure 1.11(b) shows an efficient parallel implementation of this algorithm: it
is made of log N stages, each performing N/2 butterfly operations.

Algorithm 1.2: ITERATIVE-FFT(x)


1 BIT-REVERSE-COPY(x, X)
2 N ← length(x)
3 for s ← 1 to log2 (N ) do
4 m ← 2s
5 Wm ← e−2πj/m
6 for k ← 0 to N − 1 by m do
7 W ←1
8 for l ← 0 to m/2 − 1 do
9 t ← W X[k + l + m/2]
10 u ← X[k + l]
11 X[k + l] ← u + t
12 X[k + l + m/2] ← u − t
13 W ← W · Wm

14 return X

M-1.11
Realize Algorithm 1.2 and assess its functioning by comparing it with the fft function.

1.5.2 The Short-Time Fourier Transform


1.5.2.1 Definition and examples

+∞
Xn (ωd ) = w[n − m]x[m]e−jωd m (1.81)
m=−∞

If w ≡ 1, then this equation reduces to the DTFT of x[n]. However in practical applications we are inter-
ested in using finite-length windows, in order to analyze the spectral properties of x[n] over a finite time
interval. In such applications what we really do is computing, for each n, the DFT of the finite-length
sequence w[n − m]x[m], and what we obtain is a finite-length sequence Xn [k] = Xn (ωd )|ωd =2πk/N .
Xn (ωd ) is the DTFT of the sequence xn [m] = w[n − m]x[m], therefore
∫ +π
1
w[n − m]x[n] = Xn (ωd )ejωd m dωd , (1.82)
2π −π

from which, when n = m and in the hypothesis of x[0] ̸= 0


∫ +π
1
x[n] = Xn (ωd )ejωd m dωd . (1.83)
2πw[0] −π

This equation shows that the sequence x can be reconstructed from its STFT.
The magnitude of STFT is called spectrogram. Since STFT is function of two variables (i.e., time
n and frequency ωd ), the plot of the spectrogram lives in a 3-D space. A typical 2-D representation of

This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license,


⃝2005-2019
c by the authors except for paragraphs labeled as adapted from <reference>
Chapter 1. Fundamentals of digital audio processing 1-31

Butterfly a a+Wb
x[0] Operation W
x[0] b a−Wb
x[0] x[4] x[4] x[0] X [0]
x[0] x[2] W 02 W 04 W 08
x[1] x[4] x[2] x[1] X [1]
x[6] x[2] W 14 W 18
x[2] x[2] X [2]
x[6] x[6] W 02 W 28
x[3]
x[3] X [3]
x[4] x[1] W 38
x[5] x[1]
x[1] x[4] X [4]
x[5] W 02 W 04
x[6] x[3] x[5] x[5] X [5]
x[7] x[5] W 14
x[7] x[3] x[3] x[6] X [6]
W 02
x[7] x[7] X [7]
x[7]
s=1 s=2 s=3
(a) (b)

Figure 1.11: (a) Tree of calls in an invocation of RECURSIVE-FFT: the leaves contain a bit-reversal
permutation of x[n]; (b) parallel realization of ITERATIVE-FFT, where butterfly operations involve bit-
reversed elements of x[n].

a spectrogram uses two axes for time and frequency, and magnitude values are represented with some
color map (e.g. a greyscale map).
Figure 1.12 shows an example of short-time spectral analysis applied to a simple time-varying signal,
the chirp. The chirp sequence is defined as

[ω ]
0
x[n] = A cos (nTs )2 . (1.84)
2

We have already seen in Sec. 1.3.1 that in general the instantaneous frequency of a signal cos[ϕ(t)] is
dϕ/dt(t): therefore the instantaneuous frequency of this chirp signal is ω0 nTs , i.e. it is not constant but
increases linearly in time. A portion of a chirp signal with ω0 = 2π · 800 rad/s2 is shown in Fig. 1.12(a).
Now we segment this signal into a set of subsequences with short finite length, e.g. using a rectangular
window w[n], as in Eq. (1.81). If the window is sufficiently short, we can reasonably assume that the
frequency of the chirp is approximately constant in each subsequence. In fact the resulting spectrogram,
shown in Fig. 1.12(b), shows that for a given time index n the STFT is essentially the DFT of a sinu-
soidal sequence: the STFT magnitude has large non-zero values around the instantaneous frequency, and
much smaller non-zero values at other frequency points. Moreover the instantaneous frequency increases
linearly and after 5 seconds it reaches the value 4000 Hz (= 800 · 5 Hz), as expected.
So far in this example we have implicitely assumed that the sampling period Ts is sufficiently small,
so that no aliasing occurs: in fact in drawing Fig. 1.12(b) we have used a sampling rate Fs = 8 kHz.
However, aliasing will occur if we use smaller sampling rates. Figure 1.12(c) shows the spectrogram
obtained using Fs = 4 kHz: in this case frequencies above fNy = 2 kHz are aliased, and this effect
appears in the spectrogram when the black line starts moving down instead of increasing.

M-1.12
Synthesize a chirp signal and compute its STFT using different sampling rates, in order to verify the occurrence
of aliasing.

This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license,


⃝2005-2019
c by the authors except for paragraphs labeled as adapted from <reference>
1-32 Algorithms for Sound and Music Computing [v.February 2, 2019]

1 4000 2000

0.5 3000 1500


Amplitude

f (Hz)

f (Hz)
0 2000 1000

−0.5 1000 500

−1 0 0
0 0.05 0.1 0.15 0.2 0 1 2 3 4 0 1 2 3 4
t (s) t (s) t (s)
(a) (b) (c)

Figure 1.12: Short-time spectral analysis of a chirp signal: (a) initial portion of the chirp signal, with
ω0 = 2π · 800 rad/s2 ; (b) spectrogram obtained with Fs = 8 kHz; (c) spectrogram obtained with
Fs = 4 kHz.

1.5.2.2 Windowing and the uncertainty principle


We have previously examined the resolution and leakage problems associated to the DFT: decreased
frequency resolution and spectral energy leakage occur because the spectrum is convolved with that of
a rectangular window. As the width of the rectangular window increases (see Fig. 1.10), the energy of
its spectrum becomes more and more concentrated around the origin. In the limit, the spectrum of an
infinite-width rectangular window is a δ[k] sequence, and no leakage occurs.
This qualitative analysis provides an example of the so-called uncertainty principle. Increasing res-
olution in frequency diminishes resolution in time, and viceversa. Although a certain trade-off between
time resolution and frequency resolution is inevitable (and determined by the window length), one may
wonder if such a trade-off can be improved by using windows with different shapes (and thus different
spectra) from the rectangular window.
In fact the answer is yes. Some of the most commonly used window functions are listed below:
[ ( )]
1 2πn
(Hann) w[n] = 1 + cos
2 2M + 1
( )
2πn
(Hamming) w[n] = 0.54 + 0.46 cos (1.85)
2M + 1
( ) ( )
2πn 4πn
(Blackman) w[n] = 0.42 + 0.5 cos + 0.08 cos
2M + 1 2M + 1
Figure 1.13(a) depicts the plots of these windows in the time-domain, while a portion of the correspond-
ing spectra is shown in Fig. 1.13(b). Note that these spectra share some common characteristics: all have
large main “lobe” at 0 frequency, plus side lobes with decreasing amplitude. More precisely, two main
spectral features have to be taken into account when analyzing the properties of these windows: first,
the ability to resolve two nearby spectral components of a windowed signal depend mostly on the main
lobe width, i.e. the nearest zero crossings on both sides of the main lobe; second, the amount of leakage
from one frequency component to neighbouring bands depends on the amplitude of the side lobes, and
primarily on relative side lobe level, i.e. the difference in dB between the amplitudes of the main lobe
and the largest side lobe.
Figure 1.13(b) shows that the rectangular window has the smallest main lobe width, therefore it is
better than other windows in resolving nearby sinusoids. On the other hand, it has the largest relative side
lobe level, therefore it causes considerable leakage. Other windows have different performances in terms

This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license,


⃝2005-2019
c by the authors except for paragraphs labeled as adapted from <reference>
Chapter 1. Fundamentals of digital audio processing 1-33

0
1 rectangular
hann
−10

magnitude (dB)
0.8 hamming
blackman
window

0.6 −20

0.4 −30
0.2
−40
0
−20 −10 0 10 20 0 50 100
Time index n Frequency index k
(a) (b)

Figure 1.13: Comparison of different windows; (a) time-domain window sequences, symmetrical with
respect to n = 0; (b) window spectra in the vicinity of k = 0. Differences in terms of main lobe width
and relative side lobe level can be appreciated.

of main lobe width and relative side lobe level. The hamming window is often considered to provide the
best trade-off for short-term spectral analysis of generic signals, however the choice of the window to
use depends in general on many factors, including the characteristics of the signal to be analyzed.

M-1.13
Compute the DFT of complex exponential sequences x[n] = ej2πk0 n/N , windowed with rectangular, Hann, Ham-
ming, and Blackman windows. Verify the performance of each window with respect to resolution and leakage
effects, for integer and non-integer values of k0 .

1.6 Digital filters


1.6.1 The z-Transform
1.6.1.1 Definitions
The z-Transform is an operator that maps a sequence of x[n] into a function X : C → C. Definition:


+∞
Z{x}(z) = X(z) = x[n]z −n , (1.86)
n=−∞

with z ∈ C complex variable.


Close relationship with the DTFT: if z = ejωd , i.e. if we restrict the variable z to the complex unit
circle, then the z-Transform reduces to the DTFT. In particular the point z = 1 corresponds to frequency
ωd = 0, and z = −1 corresponds to ωd = π. Therefore evaluation of the z-Transform on the upper half
of the complex unit circle gives the DTFT up to the (normalized angular) Nyquist frequency. In general
we can write:
F{x}(ωd ) = Z{x}(z)|z=ejωd . (1.87)
( jω )
For this reason we sometimes write X e d to indicate the DTFT of the sequence x.
The series in Eq. (1.86) does not converge for all values of z. For any sequence x, ∑the set of values z
−n
for which the series converges is called region of convergence (ROC). Since | X(z) | ≤ +∞ n=−∞ | x[n] | | z | ,
if a point z0 belongs to the ROC, then all points z that are on the complex circle with radius | z0 | also

This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license,


⃝2005-2019
c by the authors except for paragraphs labeled as adapted from <reference>
1-34 Algorithms for Sound and Music Computing [v.February 2, 2019]

Property Sequences z-Transforms ROCs


x[n], y[n] X(z), Y (z) Rx , Ry
Linearity ax[n] + by[n] aX(z) + bY (z) contains Rx ∩ Ry
Time-shifting x[n − n0 ] z −n0 X(z) Rx
z-scaling z0n x[n] X(z/z0 ) | z0 | Rx
z- differentation nx[n] −z dX
dz (z) Rx
Conjugation x∗ [n] X (z ∗ )
∗ Rx
Time-reversal x∗ [−n] X ∗ (1/z ∗ ) 1/Rx
Convolution (x ∗ y)[n] X(z) · Y (z) contains Rx ∩ Ry
Initial value theorem If x[n] causal (i.e. x[n] = 0 ∀n < 0), then limz→∞ X(z) = x[0].

Table 1.2: General properties of the z-Transform.

belong to the ROC. Therefore the ROC is in general a ring in the complex plane. Such ring may or may
not include the unit circle, in other words the z-Transform of x[n] may exist in a certain region of the
complex plane even if the DTFT of x[n] does not exist.
Table 1.2 lists a set of relevant properties of the z-Transform, that are particularly useful in the study
of discrete-time signals and digital filters. Most of them have direct counterparts in DTFT properties (see
Table 1.1), and can be proved from the definition (1.86).
The inverse z-transform is formally defined as
I
1
x[n] = X(z)z n−1 dz, (1.88)
2πj C

where the integral is evaluated over the contour C, which can be any closed contour that belongs to
the ROC of X(z) and encircles z = 0. Without entering into details, we remark that for the kinds of
sequences and z-transforms typically encountered in digital signal processing applications, less formal
procedures are sufficient and preferable. We propose some examples in Sec. 1.6.2.

1.6.1.2 z-transforms of common sequences


We illustrate the z-transform with some notable examples.
The unit impulse δ[n] has z-transform 1 and the ROC is the entire complex plane. The ideal delay
δ[n − n0 ] has z-transform z −n0 and the ROC is the entire complex plane.
Consider the finite-length exponential sequence
{
an , 0 ≤ n < N
x[n] = (1.89)
0, elsewhere.

The z-transform is
∑ −1 ∑ −1
N N
( )−n 1 − aN z −N
X(z) = an z −n = az −1 = . (1.90)
1 − az −1
n=0 n=0

In particular since the series has only a finite number of terms the ROC is the entire complex plane. We
can generalize and state that any finite-length sequence admits a z-transform whose ROC is the entire
complex plane.

This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license,


⃝2005-2019
c by the authors except for paragraphs labeled as adapted from <reference>
Chapter 1. Fundamentals of digital audio processing 1-35

Slightly more complex example: the right-sided exponential sequence x[n] = an u[n] already exam-
ined in Sec. 1.4.1. In this case

+∞ ∑
+∞
( )−n 1
−n
X(z) = n
a u[n]z = az −1 = , (1.91)
n=−∞
1 − az −1
n=0

where the last equality can be written only if az −1 < 1, otherwise the series does not converge. In
other words the ROC is the region | z | > | a |. It is easy to verify that the left-sided exponential sequence
x[n] = −an u[−n] also has a z-transform, identical to the one in (1.91), but with a different ROC (the
region | z | < | a |).
This example shows that the algebraic expression of the z-transform does not completely specify
the corresponding sequence, and that the ROC must also be specified. The example also shows a case
of a sequence that has a z-transform but does not have a DTFT: for a ≥ 1 the right-sided exponential
sequence still admits the z-transform 1/(1 − az −1 ) in the region | z | > a > 1 although it increases
exponentially in time and does not have a DTFT.
The right-sided real sinusoidal sequence
( x[n] = cos(ω0 n)u[n]. ) Note that it can be written as a sum
of two exponential sequences: x[n] = ejω0 n u[n] + e−jω0 n u[n] . Therefore
[ ]
1 1 1 1 − [cos ω0 ]z −1
X(n) = + = . . . = . (1.92)
2 1 − ejω0 z −1 1 − e−jω0 z −1 1 − 2[cos ω0 ]z −1 + z −2
Since | cos ω0 | ≤ 1, the ROC is clearly the region | z | > 1. This is a second example of a sequence that
does not admit a DTFT but admits a z-transform.
A final important example is the exponentially damped right-sided sinusoidal sequence, defined as
x[n] = r−n cos(ω0 n)u[n], with 0 < r < 1. In this case

1 − [r cos ω0 ]z −1
X(n) = . (1.93)
1 − 2[r cos ω0 ]z −1 + r2 z −2
The ROC is the region | z | > r. Note that in this case the ROC includes the unit circle, therefore the
sequence x[n] also admits a DTFT. In fact as we will see this sequence represents the impulse response
of a second-order resonating filter.

1.6.1.3 Rational z-transforms


A very important and useful family of z-transforms is that of rational transforms, i.e. those that can be
written as
P (z)
X(z) = , with P (z), Q(z) polynomials. (1.94)
Q(z)
In fact in the previous section we have just examined a few examples of sequences with rational trans-
forms, in particular the right-sided exponential sequence (1.91). Recalling that the z-transform is linear,
we can say that any sequence that can be expressed as a linear combination of right-sided exponentials
has a rational z-transform.
∑N ∏
∑N
k=1 Ak
−1
m̸=k (1 − am z )
x[n] = Ak ak u[n] ⇒ X(z) =
n
∏N (1.95)
−1
k=1 k=1 (1 − ak z )

The values z for which the P (z) = 0 (and therefore X(z) = 0) are called zeros of X(z). The values
z for which Q(z) = 0 are called poles of X(z). A number of important relations exist between the poles
of a rational transform and its ROC.

This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license,


⃝2005-2019
c by the authors except for paragraphs labeled as adapted from <reference>
1-36 Algorithms for Sound and Music Computing [v.February 2, 2019]

N
aN a a a

ω0
1 1 1 1

(a) (b) (c) (d)

Figure 1.14: Pole-zero plots and ROCs of some simple transforms: (a) finite-length exponential se-
quence; (b) right-sided exponential sequence with | a | > 1; (c) left-sided exponential sequence with
| a | > 1; (d) exponentially damped, right-sided sinusoidal sequence. In all plots the dotted circle is the
unit circle and the gray-shaded region is the ROC of the corresponding transform.

First, the ROC cannot contain poles by definition, since X(z) is not defined on the poles. It follows
immediately that a finite-length sequence cannot have any poles. As an example, looking at Eq. (1.90)
one notices that the pole at z = a cancels with one of the zeros of the numerator (1 − aN z −N ), therefore
there are no poles. In a similar line of reasoning one can prove that for any right-sided sequence the
ROC extends outwards from the pole with largest absolute value towards | z | → +∞, and that for any
left-sided sequence the ROC extends inwards from the pole with smallest absolute value towards z → 0.
For a generic sequence that extends infinitely on both sides, the ROC consists of a ring bounded by a
pole on the interior and exterior.
So-called pole-zero plots are typically used to represent z-transforms and their associated ROCs.
Conventionally a zero is denoted with a “◦” symbol and a pole is denoted with a “×” symbol. As an
example. figure 1.14 shows the pole-zero plots for some of the transforms discussed in the previous
section. Note in particular that the right-sided and left-sided exponential sequences have identical pole-
zero patterns, but have different ROCs.
Since the pole-zero pattern does not completely define the corresponding sequence, it is sometimes
convenient to specify some time-domain properties of the sequences, that implicitly define the ROC. As
an example, consider the pole-zero plot of either Fig. 1.14(b) or Fig. 1.14(c) and assume that the ROC
is not known. If one states that the corresponding sequence is absolutely summable, then this implies
that it admits a DTFT and consequently implies that the ROC must be that of Fig. 1.14(c). Alternatively
one may state that the corresponding sequence is causal: this implies that the ROC must extend towards
| z | → +∞ and consequently implies that the ROC must be that of Fig. 1.14(b).

1.6.2 Transfer function and frequency response of a LTI system


1.6.2.1 Definitions

In Sec. 1.2.3 we have seen that a LTI system is completely characterized by its impulse response h[n],
since the response to any imput sequence x[n] can be written as the convolution between x and h (see
Eqs. (1.22, 1.23)). Using the convolution property given in table 1.2, one can restate this by saying that
for an LTI system an input sequence x is related to the corresponding output sequence y through the
equation
Y (z) = H(z)X(z), (1.96)

This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license,


⃝2005-2019
c by the authors except for paragraphs labeled as adapted from <reference>
Chapter 1. Fundamentals of digital audio processing 1-37

where X(z), Y (z), H(z) are the z-transforms of x[n], y[n], h[n], respectively. We call H(z) the transfer
function of the LTI system. Assuming that an appropriate ROC is specified for H, we can say that the
LTI system is completely characterized by its transfer function.
If the ROC includes the unit circle, then h[n] admits a Fourier representation. In this case we can
also write ( ) ( ) ( )
Y ejωd = H ejωd X ejωd , (1.97)
( jω ) ( jω ) ( jω ) ( jω )
where X e d , Y e d , H e d are the DTFTs of x[n], y[n], h[n], respectively. We call H e d
the frequency response of ( the) system. Assuming
[ ( )]that the DTFTs are expressed in polar form (see
Eq. (1.54)), we call H ejωd and arg H ejωd the magnitude response and the phase response
of the system, respectively.
If the LTI system under consideration has been expressed through a constant-coefficient difference
equation (see Sec. 1.2.3), then one can immediately write the corresponding transfer function as a rational
function:

N ∑
M ∑
N ∑
M
−k
ak y[n − k] = bk x[n − k] ⇔ ak z Y (z) = bk z −k X(z), (1.98)
k=0 k=0 k=0 k=0

from which it follows immediately that


∑M −k
Y (z) k=0 bk z
H(z) = = ∑N . (1.99)
X(z) −k
k=0 ak z

Such z-transforms arise frequently in digital signal processing applications. By looking at a transform
of this kind, one can easily find the corresponding sequence (this is an example of an informal procedure
for determining the z-transform). First one can note that H(z) can be written as
∏M ( −1
)
b0 k=1 1 − ck z
H(z) = · ∏N , (1.100)
−1
k=1 (1 − dk z )
a0

where the ck ’s and the dk ’s are the zeros and the poles of H, respectively. If M ≤ N and the poles are
all first-order, then by applying a partial fraction expansion the above equation can be rewritten as

N
Ak ( )
H(z) = , with Ak = 1 − dk z −1 H(z)|z=dk . (1.101)
1 − dk z −1
k=1

Looking back at Eq. (1.91), we can conclude that h[n] is a linear combination of right-sided exponential
sequences (or left-sided exponential sequences, depending on the ROC).

1.6.2.2 The concept of filtering


The magnitude and phase responses
( jω of a−jω
LTI) system describe how the system transforms a sinusoidal
input x[n] = A cos(ω0 n) = A e + e 0 /2: the corresponding output is
0

( ) ( [ ( )])
y[n] = (h ∗ x)[n] = A H ejω0 · cos ω0 n + arg H ejω0 , (1.102)

i.e. the magnitude response at ω0 defines the gain and the phase delay of the system at the frequency
ωd = ω0 . Similar considerations apply to a generic input x[n]: the corresponding output can be described
in terms of system magnitude and phase response as
( ) ( ) ( )
Y ejωd = H ejωd · X ejωd ,
[ ( )] [ ( )] [ ( )] (1.103)
arg Y ejωd = arg H ejωd + arg X ejωd .

This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license,


⃝2005-2019
c by the authors except for paragraphs labeled as adapted from <reference>
1-38 Algorithms for Sound and Music Computing [v.February 2, 2019]

The first equation in (1.103) says that frequency components


( ) of the input are emphasized or attenuated
(or even suppressed) depending on the values of H ejωd at those frequencies. For this reason we
typically refer to an LTI system as a frequency selective filter, or simply a filter. Thinking of audio,
equalization is an obvious example of filtering an input sound by emphasizing certain frequency ranges
and attenuating other ranges.
The second equation in (1.103) says that frequency components of the input are delayed in a frequency-
dependent manner. The amount and type of tolerable phase distortion depends on the application. Often
phase responses are disregarded in audio applications because phase distortions are to a large extent in-
audible. However taking into account the phase response can be important in certain cases, e.g. when
one wants to preserve the shape of the time-domain waveform.
A generally tolerable type of phase distortion is linear distortion. A filter with a linear phase response
produces the same phase delay for[ all (frequencies:
)] as an example, the ideal delay system hn0 = δ(n−n0 )
has a linear phase response arg Hn0 ejωd = ωd n0 . A convenient measure of the linearity of the phase
response of a filter is the group delay, defined as3

d { [ ( )]}
τ (ωd ) = − arg H ejωd . (1.104)
dωd
The deviation of τ from a constant indicates the degree of phase non-linearity. Figure 1.15(a) provides a
graphical comparison of the phase delay and the group delay.
The reason why τ is termed group delay is that this quantity relates to the effect of the phase on a
quasi-sinusoidal signal. More precisely, consider the signal x[n] = a[n]eω0 n , and assume that a[n] is
varying slowly (equivalently, assume that the spectrum A(ejω ) is concentrated near ω = 0). Then x will
look like the signal in Fig. 1.15(b) (upper panel). The signal can also be rewritten as
[ ∫ +ϵ ]
ω0 n 1
x[n] = a[n]e = A(e )e dω eω0 n ,
jω jωn
(1.105)
2π −ϵ

where ϵ ≪ π is the upper limit of the band of A. Therefore x can be viewed as a superposition of
neighboring sinusoidal
( jω ) components, or a group around ω0 .
Since X e d ̸= 0 only in the vicinity of ω0 ), the phase response can be approximated in that
neighborhood as a line with slope −τ (ω0 ). With this approximation it is then quite straightforward to
show that the output is the filter output y[n] looks like in Fig. 1.15(b) (lower panel), i.e. τ (ω0 ) represents
the delay applied to the slowly-varying amplitude a[n].

1.6.2.3 Stability, causality, and transfer functions


We have defined in Sec. 1.2.3 the notion of BIBO stability, and we have proved that an LTI system is
stable if and only if its impulse response is absolutely summable. This latter condition can be rewritten
as

+∞
h[n]z −n < ∞, (1.106)
n=−∞

for | z | = 1. Therefore the condition of stability is equivalent to the condition that the ROC of the
transfer function includes the unit circle. The examples depicted in Fig. 1.14 confirm this finding.
Consider the relevant particular case of rational transfer functions associated to causal systems: for
these transfer functions the condition of stability is equivalent to the condition that all the poles are inside
the unit circle, because the ROC extends outwards from the pole with the largest absolute value.
3
Our definition uses ωd , so that τ is adimensional. Usual definitions employ ω, so that τ is in seconds.

This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license,


⃝2005-2019
c by the authors except for paragraphs labeled as adapted from <reference>
Chapter 1. Fundamentals of digital audio processing 1-39

arg H(ejω d )
phase
response x[n]
in
rig
t h e o lay) n
h d e
oug se
e thr =pha
n
li lop e
phase
(−s delay
group
y)
de e

delay
ωd
up rv
la

y[n]
ro cu

−φ0
=g the
pe to

n
slo nt
(− nge
ta

(a) (b)

Figure 1.15: Comparing phase delay and group delay: (a) evaluation of phase delay and group delay for
a generic non-linear phase response; (b) illustration of phase delay and group delay for a narrowband
signal.

Another relevant particular case are FIR systems. We have alredy seen that a FIR system is always
stable since its impulse response is always absolutely summable. This property can be “seen” in the z
domain by noting that the transfer function of a FIR system does not have poles, therefore the ROC is
always the entire z plane and includes the unit circle.

1.6.3 Digital filter structures and design approaches


In the following we only give a quick overview. In the next chapters we will discuss many specific filters.
In particular we will examine the second-order resonant filter in Chapter Sound modeling: signal based approaches;
comb and all-pass in Chapter Sound modeling: source based approaches; more comb and all-pass structures in
Chapter Sound in space. We do not discusss filter structures (direct forms etc.), just a few words. Same with
design techniques. In Chapter Sound modeling: source based approaches we talk about bilinear transform, s-to-z
mappings, impulse response invariance, all techniques used to design a digital filter from an analog one.
In Chapter Sound in space we will see pole-zero approximations.

1.6.3.1 Block diagram representations

Most the LTI systems that we are going to realize are represented as linear constant coefficient equations.
As we have seen, the output of a LTI system represented in this way can be computed recursively pro-
vided that past values of input and output are available. This values will undergo two operations in the
computation of the filter output: multiplication with coefficients, and addition.
Therefore the three basic elements for the implementation of a filter are memory for storing, past
values, adders, and multipliers. A filter structure is created by properly interconnecting these elements.
Figure 1.16 shows the pictorial symbols that are typically used for representing them. With these ele-
ments, a general filter


N ∑
M ∑M
bk z −k
y[n] − ak y[n − k] = bk x[n − k] ⇒ k=0
H(z) = ∑N (1.107)
−k
k=1 k=0 k=1 ak z

can be represented with the block diagram of Fig. 1.16(b).

This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license,


⃝2005-2019
c by the authors except for paragraphs labeled as adapted from <reference>
1-40 Algorithms for Sound and Music Computing [v.February 2, 2019]

x[n] b0 1 y[n]

x[n] x[n−1] z−1 z−1


z−1 b1 a1

x[n] ax[n] z−1 z−1


a
bM−1 aN−1

x[n] x[n]+y[n]
z−1 z−1
bM aN
y[n]

(a) (b)

Figure 1.16: Block diagram representation of filters: (a) conventional pictorial symbols for delay, adder,
and multiplier; (b) a general filter structure.

1.6.3.2 Filter classification

Filters can be classified according to most salient characteristics of their frequency responses, most
typically their magnitude. Basic filter categories are represented in Fig. 1.17. Other types of filters can
in general be described as a combination of these basic elements.
Low-pass filters (see Fig. 1.17(a), upper panel) select low frequencies up to a given cut-off frequency
ωc , annd attenuate higher frequencies. High-pass filters (see Fig. 1.17(a), lower panel) have the opposite
behavior: they select frequencies above ωc , and attenuate lower frequencies.
Band-pass filters (see Fig. 1.17(b), upper panel) select frequencies within a frequency band, specified
by two cut-off frequencies ωc1 and ωc2 , while frequencies outside this band are attenuated. Band-reject
filters (see Fig. 1.17(b), lower panel) have the opposite behavior: they select frequencies outside the band
[ωc1 , ωc2 ], and attenuate frequencies within the band.
Resonator filters (see Fig. 1.17(c), upper panel) amplify frequencies in a narrow band around a cut-off
frequency ωc . Conversely, notch filters (see Fig. 1.17(c), lower panel) attenuate frequencies in a narrow
band around ωc . Finally, when the magnitude response is perfectly flat the filter is called an all-pass
filter, since all frequencies are passed. Note however that an all-pass filter modifies the phase of the input
signal. In the next chapter we will see some important uses of all-pass filters.
In order to optimize their frequency selective properties, ideal filters should have magnitude re-
sponses exhibiting vertical transition between selected frequencies and rejected ones, Moreover they
should have null or linear phase response in order not to introduce phase distortion. As an example, the
ideal low-pass filter has the frequency response
{
( jω
) 1, | ω | ≤ ωc ,
Hlp e = (1.108)
0, ωc < | ω | ≤ π.

However the corresponding impulse response is

sin ωc n
hlp [n] = , −∞ < n < +∞, (1.109)
πn

This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license,


⃝2005-2019
c by the authors except for paragraphs labeled as adapted from <reference>
Chapter 1. Fundamentals of digital audio processing 1-41

H(ωd) H(ωd) H(ωd)


resonator
band−pass
low−pass

ωc ωc1 ωc2 ωc
ωd ωd ωd
H(ωd) H(ωd) H(ωd)

band−reject notch
high−pass

ωc
ωc ωc1 ωc2
ωd ωd ωd
(a) (b) (c)

Figure 1.17: Classification of filters into basic categories, depending on their magnitude response
| H(ωd ) |: (a) low-pass and high-pass filter; (b) band-pass and band-reject filter; (c) resonator and
notch filter.

i.e. the filter is non-causal and has a two-sided infinite impulse response. Therefore it is not possible
to compute its output either recursively or non-recursively. In other words, the filter is not realizable.
Similar considerations apply to other types of ideal filters.
The simplest examples of realizable filters are first-order filters (i.e. filters with no more than one
pole and/or one zero). First-order FIR low-pass and high-pass filters are defined as follows:
1( ) 1( )
Hlp (z) = 1 + z −1 , Hhp (z) = 1 − z −1 . (1.110)
2 2
They have a zero in z = 1 and z = −1, respectively. Therefore the magnitude responses decrease and
increase monotically, respectively. The low-pass filter in particular can be recognized to be a moving
average filter that averages two contiguous samples.
First-order IIR low-pass and high-pass filters are defined as follows:

1 − α 1 + z −1 1 + α 1 − z −1
Hlp (z) = , Hhp (z) = . (1.111)
2 1 − αz −1 2 1 − αz −1
Both have a pole in z = α, therefore | α | < 1 for stability, and α ∈ R in order for the impulse responses
to be real-valued. They have a zero in z = 1 and z = −1, respectively. For α = 0 they reduce to the FIR
filters above, while for α > 0 they have steeper responses.

M-1.14
Study the frequency responses of the first-order low-pass and high-pass filters of Eqs. (1.110, 1.111). Apply them
to a broad-band audio signal (e.g. a snaredrum), and study their effect.

1.7 Commented bibliography


A general reference for digital signal processing is[Oppenheim et al., 1999]. Another classic is Mitra
[2005]: more implementation oriented, with Matlab examples.

This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license,


⃝2005-2019
c by the authors except for paragraphs labeled as adapted from <reference>
1-42 Algorithms for Sound and Music Computing [v.February 2, 2019]

Primers on digital audio processing: Rocchesso [2003] and Steiglitz [1996]. Examples focused on
audio are found also in [Zölzer, 2002, Chapter 1]
A useful reading about Fourier analysis for discrete-time signals is provided in [Smith, 2008]. Our
discussion of FFT algorithms is based on [Cormen et al., 2001].
A discussion on recursive generators of sinusoidal signals is found e.g. in [Orfanidis, 1996]. Models
for fractal signals are also partially discussed in [Orfanidis, 1996].

References
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms. MIT Press,
Cambridge, MA, 2001.

Sanjit K Mitra. Digital Signal Processing. McGraw-Hill, third edition, 2005.

Alan V. Oppenheim, Ronald W. Schafer, and John R. Buck. Discrete-Time Signal Processing. Prentice Hall, Upper Saddle
River, NJ, 1999.

Sophocles J. Orfanidis. Introduction to Signal Processing. Prentice Hall, 1996.

Davide Rocchesso. Introduction to Sound Processing. Mondo Estremo, Firenze, 2003. http://profs.sci.univr.it/˜rocchess/SP.

Julius O. Smith. Mathematics of the Discrete Fourier Transform (DFT). http://ccrma.stanford.edu/ jos/mdft/, 2008. Online
book, accessed Oct. 2008.

Kenneth Steiglitz. A Digital Signal Processing Primer - With Applications to Digital Audio and Computer Music. Prentice
Hall, 1996.

Udo Zölzer, editor. DAFX – Digital Audio Effects. John Wiley & Sons, 2002.

This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license,


⃝2005-2019
c by the authors except for paragraphs labeled as adapted from <reference>
Contents

1 Fundamentals of digital audio processing 1-1


1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1
1.2 Discrete-time signals and systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1
1.2.1 Discrete-time signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1
1.2.1.1 Main definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1
1.2.1.2 Basic sequences and operations . . . . . . . . . . . . . . . . . . . . . 1-3
1.2.1.3 Measures of discrete-time signals . . . . . . . . . . . . . . . . . . . . 1-4
1.2.1.4 Random signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-5
1.2.2 Discrete-time systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-5
1.2.2.1 Basic systems and block schemes . . . . . . . . . . . . . . . . . . . . 1-5
1.2.2.2 Classes of discrete-time systems . . . . . . . . . . . . . . . . . . . . 1-6
1.2.3 Linear Time-Invariant Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-7
1.2.3.1 Impulse response and convolution . . . . . . . . . . . . . . . . . . . . 1-7
1.2.3.2 Properties of LTI systems . . . . . . . . . . . . . . . . . . . . . . . . 1-8
1.2.3.3 Constant-coefficient difference equations . . . . . . . . . . . . . . . . 1-10
1.3 Signal generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-11
1.3.1 Digital oscillators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-11
1.3.1.1 Table lookup oscillator . . . . . . . . . . . . . . . . . . . . . . . . . 1-12
1.3.1.2 Recurrent sinusoidal signal generators . . . . . . . . . . . . . . . . . 1-12
1.3.1.3 Control signals and envelope generators . . . . . . . . . . . . . . . . 1-13
1.3.1.4 Frequency controlled oscillators . . . . . . . . . . . . . . . . . . . . . 1-15
1.3.2 Noise generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-17
1.3.2.1 White noise generators . . . . . . . . . . . . . . . . . . . . . . . . . 1-17
1.3.2.2 Pink noise generators . . . . . . . . . . . . . . . . . . . . . . . . . . 1-18
1.4 Spectral analysis of discrete-time signals . . . . . . . . . . . . . . . . . . . . . . . . . . 1-19
1.4.1 The discrete-time Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . 1-19
1.4.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-19
1.4.1.2 DTFT of common sequences . . . . . . . . . . . . . . . . . . . . . . 1-20
1.4.1.3 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-21
1.4.2 The sampling problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-22
1.4.2.1 Frequency aliasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-22
1.4.2.2 The sampling theorem and the Nyquist frequency . . . . . . . . . . . 1-23
1.5 Short-time Fourier analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-25
1.5.1 The Discrete Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . 1-25
1.5.1.1 Definitions and properties . . . . . . . . . . . . . . . . . . . . . . . . 1-25
1.5.1.2 Resolution, leakage and zero-padding . . . . . . . . . . . . . . . . . . 1-26
1.5.1.3 Fast computation of the DFT: the FFT algorithm . . . . . . . . . . . . 1-28

1-43
1-44 Algorithms for Sound and Music Computing [v.February 2, 2019]

1.5.1.4 Iterative FFT algorithms and parallel realizations . . . . . . . . . . . . 1-29


1.5.2 The Short-Time Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . 1-30
1.5.2.1 Definition and examples . . . . . . . . . . . . . . . . . . . . . . . . . 1-30
1.5.2.2 Windowing and the uncertainty principle . . . . . . . . . . . . . . . . 1-32
1.6 Digital filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-33
1.6.1 The z-Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-33
1.6.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-33
1.6.1.2 z-transforms of common sequences . . . . . . . . . . . . . . . . . . . 1-34
1.6.1.3 Rational z-transforms . . . . . . . . . . . . . . . . . . . . . . . . . . 1-35
1.6.2 Transfer function and frequency response of a LTI system . . . . . . . . . . . . 1-36
1.6.2.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-36
1.6.2.2 The concept of filtering . . . . . . . . . . . . . . . . . . . . . . . . . 1-37
1.6.2.3 Stability, causality, and transfer functions . . . . . . . . . . . . . . . . 1-38
1.6.3 Digital filter structures and design approaches . . . . . . . . . . . . . . . . . . . 1-39
1.6.3.1 Block diagram representations . . . . . . . . . . . . . . . . . . . . . 1-39
1.6.3.2 Filter classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-40
1.7 Commented bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-41

This book is licensed under the CreativeCommons Attribution-NonCommercial-ShareAlike 3.0 license,


⃝2005-2019
c by the authors except for paragraphs labeled as adapted from <reference>

You might also like