Fundamentals of digital audio processing
Fundamentals of digital audio processing
1.1 Introduction
The purpose of this chapter is to provide the reader with fundamental concepts of digital signal process-
ing, which will be used extensively in the reminder of the book. Since the focus is on audio signals, all
the examples deal with sound. Those who are already fluent in DSP may skip this chapter.
3.5 8
6
2.5
5
2
4
1.5
3
1 2
0.5 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
t (s) t (s)
(a) (b)
3.5 8
6
2.5
5
2
4
1.5
3
1 2
0.5 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
t (s) t (s)
(c) (d)
Figure 1.1: (a) Analog, (b) quantized-analog, (c) discrete-time, and (d) numerical signals.
be countable or non-countable. Moreover, C may be a subset of R or C, i.e. the signal x may be either a
real-valued or a complex-valued function.
When D = R we talk of continuous-time signals x(t), where t ∈ R, while when D = Z we talk of
discrete-time signals x[n]. In this latter case n ∈ Z identifies discrete time instants tn : the most common
and important example is when tn = nTs , with Ts is a fixed quantity. In many practical applications a
discrete-time signal xd is obtained by periodically sampling a continuous-time signal xc , as follows:
xd [n] = xc (nTs ) − ∞ < n < ∞, (1.1)
The quantity Ts is called sampling period, measured in s. Its its reciprocal is the sampling frequency,
measured in Hz, and is usually denoted as Fs = 1/Ts . Note also the use of square brackets in the notation
for a discrete-time signal x[n], which avoids ambiguity with the notation x(t) used for a continuous-time
signal.
When C = R we talk of continuous-amplitude signals, while when C = Z we talk of discrete-
amplitude signals. Typically the range of a discrete-amplitude signal is a finite set of M values {xk }M
k=1 ,
and the most common example is that of a uniformly quantized signal with xk = kq (where q is called
quantization step).
By combining the above options we obtains the following classes of signals, depicted in Fig. 1.1:
1. D = R, C = R: analog signal.
2. D = R, C = Z: quantized analog signal.
3. D = Z, C = R: sequence, or sampled signal.
4. D = Z, C = Z: numerical, or digital, signal. This is the type of signal that can be processed with
the aid of the computer.
In these sections we will focus on discrete-time signals, regardless of whether they are quantized or not.
We will equivalently use the terms discrete-time signal and sequence. We will always refer to a single
value x[n] as the n-th sample of the sequence x, regardless of whether the sequence has been obtained
by sampling a continuous-time signal or not.
The unit step is the simplest example of a right-sided sequence, defined as a sequence that is zero except
for a right-infinite interval n1 ≤ n < +∞. Similarly, left-sided sequences are defined as a sequences
that are zero except for a left-infinite interval −∞ < n ≤ n1 .
The unit step is related to the impulse by the following equalities:
∞
∑ ∑
n
u[n] = δ[n − k] = δ[k]. (1.6)
k=0 k=−∞
Conversely, the impulse can be written as the first backward difference of the unit step:
The general form of the real sinusoidal sequence with constant amplitude is
where A, ω0 and ϕ are real numbers. By analogy with continuous-time functions, ω0 is called angular
frequency of the sinusoid, and ϕ is called the phase. Note however that, since n is dimensionless, the
dimension of ω0 is radians. Very often we will say that the dimension of n is “samples” and therefore
we will specify the units of ω0 to be radians/sample. If x[n] has been sampled from a continuous-time
sinusoid with a given sampling rate Fs , we will also use the term normalized angular frequency, since in
this case ω0 is the continuous-time angular frequency normalized with respect to Fs .
Another relevant numerical signal is constructed as the sequence of powers of a real or complex
number α. Such sequences are termed exponential sequences and their general form is
where A and α are real or complex constant. When α is complex, x[n] has real and imaginary parts
that are exponentially weighted sinusoid. Specifically, if α = |α|ejω0 and A = |A|ejϕ , then x[n] can be
expressed as
Therefore x[n] can be expressed as x[n] = xRe [n]+jxIm [n], with xRe [n] = | A | | α |n cos(ω0 n+ϕ) and
xIm [n] = | A | | α |n sin(ω0 n + ϕ). These sequences oscillate with an exponentially growing magnitude
if | α | > 1, or with an exponentially decaying magnitude if | α | < 1. When | α | = 1, the sequences
xRe [n] and xIm [n] are real sinusoidal sequences with constant amplitude and x[n] is referred to as a the
complex exponential sequence.
An important property of real sinusoidal and complex exponential sequences is that substituting the
frequency ω0 with ω0 + 2πk (with k integer) results in sequences that are indistinguishable from each
other. This can be easily verified and is ultimately due to the fact that n is integer. We will see the
implications of this property when discussing the Sampling Theorem in Sec. 1.4.1, for now we will
implicitly assume that ω0 varies in an interval of length 2π, e.g. (−π, π], or [0, 2π).
Real sinusoidal sequences and complex exponential sequences are also examples of a periodic se-
quence: we define a sequence to be periodic with period N ∈ N if it satisfies the equality x[n] =
x[n + kN ], for −∞ < n < ∞, and for any k ∈ Z. The fundamental period N0 of a periodic signal is
the smallest value of N for which this equality holds. In the case of Eq. (1.8) the condition of periodicity
implies that ω0 N0 = 2πk. If k = 1 satisfies this equality we can say that the sinusoidal sequence is
periodic with period N0 = 2π/ω0 , but this is not always true: the period may be longer or, depending on
the value of ω0 , the sequence may not be periodic at all.
Note that an infinite-length sequence with finite sample values may or not have finite energy. The rate
of transporting energy is known as power. The average power of a sequence x[n] is then defined as the
average energy per sample:
N −1
Ex 1 ∑
Px = = |x[n]|2 . (1.12)
N N
n=0
M 1 times
x[n]
T−1 ... T−1
Tn y[n]
0
(M1 +M2+1)−1
x[n] y[n]
T1 ... T1
T1 ... T1
x[n] y[n]
T {.} n0 times TMA
M 2 times
Figure 1.2: Block schemes of discrete-time systems; (a) a generic system T {·}; (b) ideal delay system
Tn0 ; (c) moving average system TM A . Note the symbols used to represent sums of signals and multipli-
cation by a constant (these symbols will be formally introduced in Sec. 1.6.3).
Discrete-time systems are typically represented pictorially through block schemes that depict the signal
flow graph. As an example the block scheme of Fig.1.2(a) represents the generic discrete-time system of
Eq. (1.15).
The simplest concrete example of a discrete-time system is the ideal delay system Tn0 , defined as
where the integer n0 is the delay of the system and can be both positive and negative: if n0 > 0 then
y corresponds to a time-delayed version of x, while if n0 < 0 then the system operates a time advance.
This system can be represented with the block-scheme of Fig. 1.2(b): this block-scheme also provides
an example of cascade connection of systems, based on the trivial observation that the system Tn0 can
be seen as cascaded of n0 unit delay systems T1 .
A slightly more complex example is the moving average system TM A , defined as
1 ∑
M2
y[n] = TM A {x}[n] = x[n − k]. (1.17)
M1 + M2 + 1
k=−M1
The n-th sample of the sequence y is the average of (M1 + M2 + 1) samples of the sequence x, cen-
tered around the sample x[n], hence the name of the system. Fig. 1.2(c) depicts a block scheme of the
moving average system: in this case all the branches carrying the shifted versions of x[n] form a parallel
connection in which they are all summed up and subsequently multiplied by the factor 1/(M1 + M2 +1).
Classes of systems are defined by placing constraints on the properties of the transformation T {·}. Doing
so often leads to very general mathematical representations.
We define a system to be memoryless if the output sequence y[n] at every value of n depends only on
the value of the input sequence x[n] at the same value of n. As an example, the system y[n] = sin(x[n])
is a memoryless system. On the other hand, the ideal delay system and the moving average system
described in the previous section are not memoryless: these systems are referred to as having memory,
since they must “remember” past (or even future) values of the sequence x in order to compute the
“present” output y[n].
We define a system to be linear if it satisfies the principle of superposition. If y1 [n], y2 [n] are the
responses of a system T to the inputs x1 [n], x2 [n], respectively, then T is linear if and only if
for any pair of arbitrary constants a1 and a2 . Equivalently we say that a linear system possesses an
additive property and a scaling property. As an example, the ideal delay system and the moving average
system described in the previous section are linear systems. On the other hand, the memoryless system
y[n] = sin(x[n]) discussed above is clearly non-linear.
We define a system to be time-invariant (or shift-invariant) if a time shift of the input sequence causes
a corresponding shift in the output sequence. Specifically, let y = T {x}. Then T is time-invariant if and
only if
T {Tn0 {x}} [n] = y[n − n0 ] ∀n0 , (1.19)
where Tn0 is the ideal delay system defined previously. This relation between the input and the output
must hold for any arbitrary input sequence x and its corresponding output. All the systems that we have
examined so far are time-invariant. On the other hand, an example of non-time-invariant system is y[n] =
x[M n] (with M ∈ N). This system creates y by selecting one every M samples of x. One can easily see
that T {Tn0 {x}} [n] = x[M n − n0 ], which is in general different from y[n − n0 ] = x[M (n − n0 )].
We define a system to be causal if for every choice of n0 the output sequence sample y[n0 ] depends
only on the input sequence samples x[n] with n ≤ n0 . This implies that, if y1 [n], y2 [n] are the responses
of a causal system to the inputs x1 [n], x2 [n], respectively, then
The moving average system discussed in the previous section is an example of a non-causal systems,
since it needs to know M1 “future” values of the input sequence in order to compute the current value
y[n]. Apart from this, all the systems that we have examined so far are causal.
We define a system to be stable if and only if every bounded input sequence produces a bounded
output sequence. A sequence x[n] is said to be bounded if there exist a positive constant Bx such that
Stability then requires that for such an input sequence there exists a positive constant By such that
| y[n] | ≤ By ∀n. This notion of stability is often referred to as bounded-input bounded-output (BIBO)
stability. All the systems that
∑we have examined so far are BIBO-stable. On the other hand, an example of
unstable system is y[n] = nk=−∞ x[k]. This is called the accumulator system, since y[n] accumulates
the sum of all past values of x. In order to see that the accumulator system is not stable it is sufficient to
verify that y[n] is not bounded when x[n] is the step sequence.
represented as a linear combination of delayed impulses (see Eq. (1.4)). If we use this representation and
exploit the linearity and time-invariance properties, we can write:
{ +∞ }
∑ ∑
+∞ ∑
+∞
y[n] = T x[k]δ[n − k] = x[k]T {δ[n − k]} = x[k]h[n − k], (1.22)
k=−∞ k=−∞ k=−∞
where in the first equality we have used the representation (1.4), in the second equality we have used the
linearity property, and in the last equality we have used the time-invariance property.
Equation (1.22) states that a LTI system can be completely characterized
∑∞ by its impulse response
h[n], since the response to any imput sequence x[n] can be written as k=−∞ x[n]h[n − k]. This can
be interpreted as follows: the k-th input sample, seen as a single impulse x[k]δ[n − k], is transformed by
the system into the sequence x[k]h[n − k], and for each k these sequences are summed up to form the
overall output sequence y[n].
The sum on the right-hand side of Eq. (1.22) is called convolution sum of the sequences x[n] and
h[n], and is usually denoted with the sign ∗. Therefore we have just proved that a LTI system T has the
property
y[n] = T {x}[n] = (x ∗ h)[n]. (1.23)
Let us consider again the systems defined in the previous sections: we can find their impulse re-
sponses through the definition, i.e. by computing their response to an ideal impulse δ[n]. For the ideal
delay system the impulse response is simply a shifted impulse:
hn0 [n] = δ[n − n0 ]. (1.24)
The impulse response of the moving average system is easily computed as
{
1 ∑
M2 1
, −M1 < n < M2 ,
h[n] = δ[n − k] = M1 +M2 +1 (1.25)
M1 + M2 + 1 0, elsewhere.
k=−M 1
There is a fundamental difference between these impulse responses. The first two responses have a finite
number of non-zero samples (1 and M1 + M2 + 1, respectively): systems that possess this property are
called finite impulse response (FIR) systems. On the other hand, the impulse response of the accumulator
has an infinite number of non-zero samples: systems that possess this property are called infinite impulse
response (IIR) systems.
x[n] y[n]
h2 [n] h1 [n] h2 [n]
(a) (b)
Figure 1.3: Properties of LTI system connections, and equivalent systems; (a) cascade, and (b) parallel
connections.
where we have substituted the variable m = n − k in the sum. This property implies that a LTI system
with input h[n] and impulse response x[n] will have the same ouput of a LTI system with input x[n]
and impulse response h[n]. More importantly, associativity and commutativity have implications on the
properties of cascade connections of systems. Consider the block scheme in Fig. 1.3(a) (upper panel):
the output from the first block is x ∗ h1 , therefore the final output is (x ∗ h1 ) ∗ h2 , which equals both
(x ∗ h2 ) ∗ h1 and x ∗ (h1 ∗ h2 ). As a result the three block schemes in Fig. 1.3(a) represent three systems
with the same impulse response.
Linearity and commutativity imply that the convolution is distributive over addition. From the defi-
nition (1.23) it is straightforward to prove that
Distributivity has implications on the properties of parallel connections of systems. Consider the block
scheme in Fig. 1.3(b) (upper panel): the final output is (x ∗ h1 ) + (x ∗ h2 ), which equals x ∗ (h1 + h2 ).
As a result the two block schemes in Fig. 1.3(a) represent two systems with the same impulse response.
In the case of a LTI system, the notions of causality and stability given in the previous sections can
also be related to properties of the impulse response. As for causality, it is a straightforward exercise to
show that a LTI system is causal if and only if
For this reason, sequences that satisfy the above condition are usually termed causal sequences.
As for stability, recall that a system is BIBO-stable if any bounded input produces a bounded output.
The response of a LTI system to a bounde input x[n] ≤ Bx is
∑
+∞ ∑
+∞ ∑
+∞
| y[n] | = x[k]h[n − k] ≤ | x[k] | | h[n − k] | ≤ Bx | h[n − k] | . (1.31)
k=−∞ k=−∞ k=−∞
From this chain of inequalities we find that a sufficient condition for the stability of the system is
∑
+∞ ∑
+∞
| h[n − k] | = | h[k] | < ∞. (1.32)
k=−∞ k=−∞
One can prove that this is also a necessary condition for stability. Assume that Eq. (1.32) does not hold
and define the input x[n] = h∗ [−n]/ | h[n] | for h[n] ̸= 0 (x = 0 elsewhere): this input is bounded by
∑
unity, however one can immediately prove that y[0] = +∞ k=−∞ | h[k] | = +∞. In conclusion, a LTI
system is stable if and only if h is absolutely summable, or h ∈ L1 (Z). A direct consequence of this
property is that FIR systems are always stable, while IIR systems may not be stable.
Using the properties demonstrated in this section, we can look back at the impulse responses of
Eqs. (1.24,1.25,1.26), and we can immediately immediately prove whether they are stable and causal.
∑
N ∑
M
ak y[n − k] = bk x[n − k]. (1.33)
k=0 k=0
Question: given a set of values for {ak } and {bk }, does this equation define a LTI system? The answer is
no, because a given input x[n] does not univocally determine the output y[n]. In fact it is easy to see that,
if x[n], y[n] are two sequences satisfying Eq. (1.33), then the equation is satisfied also by the sequences
x[n], y[n] + yh [n], where yh is any sequence that satisfies the homogeneous equation:
∑
N
ak yh [n − k] = 0. (1.34)
k=0
∑
One could show that yh has the general form yh [n] = N n
m=1 Am zm , where the zm ’s are roots of the
∑N
polynomial k=0 ak z k (this can be verified by substituting the general form of yh into Eq. (1.34)).
The situation is very much like that of linear constant-coefficient differential equations in continuous-
time: since yh has N undetermined coefficients Am , we must specify N additional constraints in order
for the equation to admit a unique solution. Typically we set some initial conditions. For Eq. (1.33),
an initial condition is a set of N consecutive “initial” samples of y[n]. Suppose that the samples
y[−1], y[−2], . . . y[−N ] have been fixed: then all the infinite remaining samples of y can be recursively
determined through the recurrence equations
N
∑ ak ∑
M
bm
− y[n − k] + x[n − m], n ≥ 0,
a0 a0
k=1 m=0
y[n] = −1 (1.35)
∑
N
ak ∑
M
bm
− − x[n + N − m], n ≤ −N − 1.
a0
y[n + N k] +
a0
k=0 m=0
In particular, y[0] is determined with the above equation using the initial values y[−1], y[−2], . . . y[−N ],
then y[1] is determined using the values y[−0], y[−1], . . . y[−N + 1], and so on.
Another way of guaranteeing that Eq. (1.33) specifies a unique solution is requiring that the LTI
system is also causal. If we look back at the definition of causality, we see that in this context it implies
that for an input x[n] = 0 ∀n < n0 , then y[n] = 0 ∀n < n0 . Then again we have sufficient initial
conditions to recursively compute y[n] for n ≥ n0 : in this case can speak of initial rest conditions.
All the LTI systems that we have examined so far can actually be written in the form (1.33). As an
example, let us examine the accumulator system: we can write
∑
n ∑
n−1
y[n] = x[k] = x[n] + x[k] = x[n] + y[n − 1], (1.36)
k=−∞ k=−∞
δ [n−1]
y[n−1] δ [n−M 2−1]
(a) (b)
Figure 1.4: Block schemes for constant-coefficient difference equations representing (a) the accumulator
system, and (b) the moving average system.
where in the second equality we have simply separated the term x[n] from the sum, and in the third equal-
ity we have applied the definition of the accumulator system for the output sample y[n − 1]. Therefore,
input and output of the accumulator system satisfy the equation
y[n] − y[n − 1] = x[n], (1.37)
which is in form (1.33) with N = 1, M = 0 and with a0 = 1, a1 = −1, b0 = 1. This also means that
we can implement the accumulator with the block scheme of Fig. 1.4(a).
As a second example, consider the moving average system, with M1 = 0 (so that the system is
causal). Then Eq. (1.17) is already a constant-coefficient difference equation. But we can also represent
the system with a different equation by noting that
1 ∑2M
1 ∑ n
y[n] = x[n − k] = (x[n] − x[n − M2 − 1]) . (1.38)
M2 + 1 M2 + 1 −∞
k=0
Now we note that the sum on the right-hand side represents an accumulator applied to the signal x1 [n] =
(x[n] − x[n − M2 − 1]). Therefore we can apply Eq. (1.37) and write
1 1
y[n] − y[n − 1] = x1 [n] = (x[n] − x[n − M2 − 1]) . (1.39)
M2 + 1 M2 + 1
We have then found a totally different equation, which is still in the form (1.33) and still represents the
moving average system. The corresponding block scheme is given in Fig. 1.4(b). This example also
shows that different equations of the form (1.33) can represent the same LTI system.
L f0 L
SI = = . (1.40)
Fs /f0 Fs
Therefore the SI is proportional to f0 . Having defined the sampling increment, samples of the desired
signal are generated by reading one every SI samples of the table. If the SI is not an integer, the closest
sample of the table will be chosen (obviously, the largest L, the better the approximation). In this way,
the oscillator resample the table to generate a waveform with different fundamental frequencies.
M-1.1
Implement in Matlab a circular look-up from a table of length L and with sampling increment SI.
M-1.1 Solution
phi=mod(phi +SI,L);
s=tab[round(phi)];
where phi is a state variable indicating the reading point in the table, A is a scaling parameter, s is
the output signal sample. The function mod(x,y) computes the remainder of the division x/y and is
used here to implement circular reading of the table. Notice that phi can be a non integer value. In
order to use it as array index, it has to be truncated, or rounded to the nearest integer (as we did in
the code above). A more accurate output can be obtained by linear interpolation between adjacent
table values.
In particular, with initial values y[0] = 1 and y[−1] = cos ω0 the generator produces the sequence
y[n] = cos(ω0 n), while with initial conditions y[0] = 0 and y[−1] = − sin ω0 it produces the sequence
y[n] = sin(ω0 n). This property can be justified by recalling the trigonometric relation cos ω0 · cos ϕ =
0.5[cos(ϕ + ω0 ) + cos(ϕ − ω0 )].
A second recursive method for generating sinusoidal sequence combines both the sinusoidal and
cosinusoidal generators and is termed coupled form. It is described by the equations
With x[0] = 1 and y[0] = 0 the sequences x[n] = cos(ω0 n) and y[n] = sin(ω0 n) are generated.
This property can be verified by noting that for the complex exponential sequence the trivial relation
ejω0 (n+1) = ejω0 ejω0 n holds. From this relation, the above equations are immediately proved by calling
x[n] and y[n] the real and imaginary parts of the complex exponential sequence, respectively.
A major drawback of both these recursive methods is that they are not robust against quatization.
Small quantization errors in the computation will cause the generated signals either to grow exponentially
or to decay rapidly into silence. To avoid this problem, a periodic re-initialization is advisable. It is
possible to use a slightly different set of coefficients to produce absolutley stable sinusoidal waveforms
where c = 2 sin(ω0 /2). With x[0] = 1 and y[0] = c/2 we have x[n] = cos(ω0 n).
where a[n] scales the amplitude of the signal, while the phase ϕ[n] relates to the instantaneous frequency
f0 [n] of the signal: if f0 [n] is not constant, then ϕ[n] does not increase linearly in time. Figure 1.5(a)
shows the symbol usually adopted to depict an oscillator with fixed waveform and varying amplitude and
frequency.
The signals a[n], and f0 [n] are usually referred to as control signals, as opposed to audio signals. The
reason for this distinction is that control signals vary on a much slower time-scale than audio signals (as
an example, a musical vibrato usually have a frequency of a no more than ∼ 5 Hz). Accordingly, many
sound synthesis languages define control signals at a different (smaller) rate than the audio sampling rate
Fs . This second rate is called control rate, or frame rate: a frame is a time window with pre-defined
length (e.g. 5 or 50 ms), in which the control signals can be reasonably assumed to have small variations.
We will use the notation Fc for the control rate.
Suitable control signals can be synthesized using envelope generators. An envelope generator can be
constructed through the table-lookup approach described previously. In this case however the table will
be read only once since the signal to be generated is not periodic. Given a desired duration (in seconds)
of the control signal, the appropriate sampling increment will be chosen accordingly.
envelope value
a[n] f 0 [n] decay rate
attack rate
A D S R time
x[n]
key key
pressed released
(a) (b)
Figure 1.5: Controlling a digital oscillator; (a) symbol of the digital controlled in amplitude and fre-
quency; (b) example of an amplitude control signal generated with an ADSR envelope.
Alternatively, envelope generators can be constructed by specifying values of control signals at a few
control points and interpolating the signal in between them. In the simplest formulation, linear interpo-
lation is used. In order to exemplify this approach, we discuss the so-called Attack, Decay, Sustain, and
Release (ADSR) envelope typically used in sound synthesis applications to describe the time-varying am-
plitude a[n]. This envelope is shown in Fig. 1.5(b)): amplitude values are specified only at the boundaries
between ADSR phases, and within each phase the signal varies linearly.
The attack and release phases mark the identity of the sound, while the central phases are associated
with the steady-state portion of the sound. Therefore, in order to synthesize two sounds with the similar
identity (or timbre) but different durations, it is advisable to only slightly modify the duration of attack
and release, while the decay and especially sustain can be lengthened more freely.
M-1.2
Write a function that realizes a line-segment envelope generator. The input to the function are a vector of time
instants and a corresponding vector of envelope values.
M-1.2 Solution
function env = envgen(t,a,method); %t= vector of control time instants
%a= vector of envelope vaues
The envelope shape is specified by break-points, described as couples (time instant (sec) and am-
plitude). The function generates the envelope at frame rate. Notice that the interpolation function
interp1 allows to easily use cubic of spline interpolations.
The use of waveform and envelope generators allows to generate quasi periodic sounds with very
limited hardware and constitutes the building block of many more sophisticated algorithms.
M-1.3
Assume that a function sinosc(t0,a,f,ph0) realizes a sinusoidal oscillator controlled in frequency and am-
plitude, with t0 initial time, a,f frame-rate amplitude and frequency vectors, and ph0 initial phase (see exam-
ple M-1.4). Then generate a sinusoid with varying amplitude and constant frequency.
M-1.3 Solution
global Fs; global SpF; %global variables: sample rate, samples-per-frame
Fs=22050;
framelength=0.01; %frame length (in s)
SpF=round(Fs*framelength); %samples per frame
Note the structure of this simple example: in the “headers” section some global parameters are de-
fined, that need to be known also to auxiliary functions; a second section defines the control parame-
ters, and finally the audio signal is computed.
M-1.4
Realize the sinosc(t0,a,f,ph0) function that we have used in M-1.3. Use equation (1.48) to compute the
phase given the frame-rate frequency vector f.
M-1.4 Solution
function s = sinosc(t0,a,f,ph0);
Both the amplitude a and frequency f envelopes are defined at frame rate and are interpolated at
sample rate inside the function body. Note in particular the computation of the phase vector within
each frame.
We can finally listen to a sinudoidal oscillator controlled both in amplitude and in frequency.
M-1.5
Synthesize a sinusoid modulated both in amplitude and frequency, using the functions sinosc and envgen.
M-1.5 Solution
global Fs; global SpF; %global variables: sample rate, samples-per-frame
Fs=22050;
framelength=0.01; %frame length (in s)
SpF=round(Fs*framelength); %samples per frame
1
260
amplitude (adim)
0.8
frequency (Hz)
240
0.6
220
0.4
0.2 200
0 180
0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4
t (s) t (s)
(a) (b)
where a and c are two constants that should be chosen very carefully in order to have a maximal length
sequence, i.e. long M samples before repetition. The actual generated sequence depends on the initial
value I[0]; that is way the sequence is called pseudorandom. The numbers are uniformly distributed
over the interval 0 ≤ u[n] < 1. The mean is E[u] = 1/2 and the variance is σu2 = 1/12. The
transformation s[n] = 2u[n] − 1 generates a zero-mean uniformly distributed random sequence over the
interval [−1, 1). This sequence corresponds to a white noise signal because the generated numbers are
mutually independent. The power spectral density is given by S(f ) = σu2 . Thus the sequence contains
all the frequencies in equal proportion and exhibits equally slow and rapid variation in time.
With a suitable choice of the coefficients a and b, it produces pseudorandom sequences with flat
spectral density magnitude (white noise). Different spectral shapes ca be obtained using white noise as
input to a filter.
M-1.6
A method of generating a Gaussian distributed random sequence is based on the central limit theorem, which
states that the sum of a large number of independent random variables is Gaussian. As exercise, implement a
very good approximation of a Gaussian noise, by summing 12 independent uniform noise generators.
If we desire that the numbers vary at a slower rate, we can generate a new random number every d
sampling instants and hold the previous value in the interval (holder) or interpolate between two succes-
sive random numbers (interpolator). In this case the power spectrum is given by
σu2
S(f ) = |H(f )|2
d
with
sin(πf d/Fs )
|H(f )| =
sin(πf /Fs )
for the holder and [ ]
1 sin(πf d/Fs ) 2
|H(f )| =
d sin(πf /Fs )
for linear interpolation.
M-1.7
In the Voss 1/f noise generation algorithm, the role of the low pass filters is played by the hold filter seen in the
previous paragraph. The 1/f noise is generated by taking the average of several periodically held generators
yi [n], with periods forming a geometric progression di = 2i , i.e.
1 ∑
M
y[n] = yi [n] (1.51)
M i=1
The power spectrum does not have an exact 1/f shape, but it is close to it for frequencies f ≥ Fs /2M . As
exercise, implement a 1/f noise generator and use it to assign the pitches to a melody.
M-1.8
The music derived from the 1/f noise is closed to the human music: it does not have the unpredictability and
randomness of white noise nor the predictability of brown noise. 1/f processes correlate logarithmically with the
past. Thus the averaged activity of the last ten events has as much influence on the current value as the last
hundred events, and the last thousand. Thus they have a relatively long-term memory.
1/f noise is a fractal one; it exhibits self-similarity, one property of the fractal objects. In a self-similar sequence,
the pattern of the small details matches the pattern of the larger forms, but on a different scale. In this case,
is used to say that 1/f fractional noise exhibits statistical self-similarity. The pink noise algorithm for generating
pitches has become a standard in algorithmic music. Use the 1/f generator developed in M-1.7 to produce a
fractal melody.
where the variable f is frequency and is expressed in Hz, while the angular frequency ω has been defined
as ω = 2πf and expressed in radians/s. Note that we are following the conventional notation by which
time-domain signals are denoted using lowercase symbols (e.g., x(n)) while frequency-domain signals
are denoted in uppercase (e.g., X(ω)).
We can try to find an equivalent expression in the case of a discrete-time signal x[n]. If we think
of x[n] as the sampled version of a continuous-time signal x(t) with a sampling interval Ts = 1/Fs ,
i.e. x[n] = x(nTs ), we can define the discrete-time Fourier transform (DTFT) starting from Eq. (1.52)
where the integral is substituted by a summation:
∑
+∞
−j2πf Fn
∑
+∞
F{x}(ωd ) = X(ωd ) = x(nTs )e s = x[n]e−jωd n . (1.53)
n=−∞ n=−∞
There are two remarks to be made about this equation. First, we have omitted the scaling factor Ts in
front of the summation, which would be needed to have a perfect correspondence with Eq. (1.52) but
is irrelevant to our tractation. Second, we have defined a new variable ωd = 2πf /Fs : we call this the
normalized (or digital) angular frequency. This is not to be confused with the angular frequency ω used
in Eq. (1.52): ωd is measured in radians/sample, and varies in the range [−2π, 2π] when f varies in the
range [−Fs , Fs ]. In this book we use the notation ω to indicate the angular frequency in radians/s, and
ωd to indicate the normalized angular frequency in radians/sample.
As one can verify from Eq. (1.53), X(ωd ) is a periodic function in ωd with a period 2π. Note that
this periodicity of 2π in ωd corresponds to a periodicity of Fs in the domain of the absolute-frequency
f . Moreover X(ωd ) is in general a complex function, and can thus be written in terms of its real and
imaginary parts, or alternatively in polar form as
where | X(ωd ) | is the magnitude function and arg[X(ωd )] is the phase function. Both are real-valued
functions. Given the 2π periodicity of X(ωd ) we will arbitrarily assume that −π < arg[X(ωd )] < π.
We informally refer to | X(ωd ) | also as the spectrum of x[n].
The inverse discrete-time Fourier transform (IDTFT) is found by observing that Eq. (1.53) represents
the Fourier series of the periodic function X(ωd ). As a consequence, one can apply Fourier theory for
periodic functions of continuous variables, and compute the Fourier coefficients x[n] as
∫ π
−1 1
F {X}[n] = x[n] = X(ωd )ejωd n dωd . (1.55)
2π −π
Equations (1.53) and (1.55) together form a Fourier representation for the sequence x[n]. Equation (1.55)
can be regarded as a synthesis formula, since it represents x[n] as a superposition of infinitesimally
small complex sinusoids, with X(ωd ) determining the relative amount of each sinusoidal component.
Equation (1.53) can be regarded as an analysis formula, since it provides an expression for computing
X(ωd ) from the sequence x[n] and determining its sinusoidal components.
The unit step sequence u[n] does not have a DTFT, because the sum in Eq. (1.53) takes infinite values.
The exponential sequence (1.9) also does not admit a DTFT. However if we consider the right sided
exponential sequence x[n] = an u[n], in which the unit step is multiplied by an exponential with | a | < 1,
then this admits a DTFT:
∑
+∞ ∑
+∞
( )n 1
−jωd n
F{x}(ωd ) = n
a u[n]e = ae−jωd = . (1.57)
n=−∞
1 − ae−jωd
n=0
The complex exponential sequence x[n] = ejω0 n or the real sinusoidal sequence x[n] = cos(ω0 n+ϕ)
are other examples of sequences that do not have a DTFT, because the sum in Eq. (1.53) takes infinite
values. In general a sequences does not necessarily admit a Fourier representation, meaning with this
that the series in Eq. (1.53) may not converge. One can show that x[n] being absolutely summable
(we have defined absolute summability in Eq. (1.32)) is a sufficient condition for the convergence of
the series (recall the definition of absolute summability given in Eq. (1.32)). Note that an ∑ absolutely
summable
∑ sequence has always finite energy, and that the opposite is not always true, since | x[n] |2 ≤
( | x[n] |)2 . Therefore a finite-energy sequence does not necessarily admit a Fourier representation.2
1.4.1.3 Properties
Table 1.1 lists a number of properties of the DTFT which are useful in digital signal processing applica-
tions. Time- and frequency-shifting are interesting properties in that they show that a shifting operation
in either domain correspond to multiplication for an complex exponential function in the other domain.
Proof of these properties is straightforward from the definition of DTFT.
The convolution property is extremely important: it says that a convolution in the time domain be-
comes a simple multiplication in the frequency domain. This can be demonstrated as follows:
( +∞ )
∑
+∞ ∑ ∑
+∞ ∑
+∞
F{x ∗ y}(ωd ) = x[k]y[n − k] e−jωd n = x[k]y[m]e−jωd (k+m)
n=−∞ k=−∞ m=−∞ k=−∞
(1.58)
∑
+∞ ∑
+∞
= x[k]e−jωd k · y[m]e−jωd m ,
k=−∞ m=−∞
where in the second equality we have substituted m = n − k. The multiplication property is dual to
the convolution property: a multiplication in the time-domain becomes a convolution in the frequency
domain.
The Parseval relation is also very useful: if we think of the sum on the left-hand side as an inner
product between the sequences x and y, we can restate this property by saying that the DTFT preserves
2
For non-absolutely summable sequences like the unit step or the sinusoidal sequence, the DTFT can still be defined if we
resort to the Dirac delta δD (ωd − ω0 ). Since this is not a function but rather a distribution, extending the DTFT formalism to
non-summable sequences requires to dive into the theory of distributions, which we are not willing to do.
1 x1(t)
0.5 x2(t)
amplitude
x3(t)
0
x1,2,3[n]
−0.5
−1
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
t (s)
the inner product (apart from the scaling factor 1/2π). In particular, when x = y, it preserves the energy
of the signal x. The Parseval relation can be demonstrated by noting that the DTFT of the sequence
y ∗ [−n] is Y ∗ (ωd ). Then we can write:
∫ π ∑
+∞
−1 ∗ 1
F {XY }[n] = X(ωd )Y ∗ (ωd )ejωd n dωd = x[k]y ∗ [k − n], (1.59)
2π −π k=−∞
where in the first equality we have simply used the definition of the IDTFT, while in the second equality
we have exploited the convolution property. Evaluating this expression for n = 0 proves the Parseval
relation.
These sinusoids have frequencies 3, 7, and 13 Hz, respectively. Now we construct three sequences
xi [n] = xi (n/Fs ) (i = 1, 2, 3), each obtained by sampling one of the above signals, with a sampling
frequency Fs = 10 Hz. We obtain the sequences
Figure 1.7 shows the plots of both the continuous-time sinusoids and the sampled sequences: note that
all sequences have exactly the same sample values for all n, i.e. they actually are the same sequence.
This phenomenon of a higher frequency sinusoid acquiring the identity of a lower frequency sinusoid
after being sampled is called frequency aliasing.
In fact we can understand the aliasing phenomenon in a more general way using the Fourier theory.
Consider a continuous-time signal x(t) and its sampled version xd [n] = x(nTs ). The we can prove
that the Fourier Transform X(ω) of x(t) and the DTFT Xd (ωd ) of xd [n] are related via the following
equation:
∑
+∞
Xd (ωd ) = Fs X(ωd Fs + 2mπFs ). (1.62)
m=−∞
This equation tells a fundamental result: Xd (ωd ) is a periodization of X(ω), i.e. it is a periodic func-
tion (of period 2π) made of a sum of shifted and scaled replicas of X(ω). The terms of this sum for
m ̸= 0 are aliasing terms and are said to alias into the so-called base band [−πFs , πFs ]. There-
fore if two continuous-time signals x1 (t), x2 (t) have Fourier transforms with the property X2 (ω) =
X1 (ω + 2mπFs ) for some m ∈ Z, sampling these signals will produces identical DTFTs and therefore
indentical sequences. This is the case of the sinusoids in Eq. (1.61).
In the remainder of this section we provide a proof of Eq. (1.62). We first write x(t) and xd [n] in
terms of their Fourier transforms:
∫ +∞ ∫ +π
1 jωt 1
x(t) = X(ω)e dω, xd [n] = Xd (ωd )ejωd n dωd . (1.63)
2π −∞ 2π −π
The first integral can be broken up into an infinite sum of integrals computed on the disjoint intervals
[(2m − 1)πFs , (2m + 1)πFs ], (with m ∈ Z) each of length 2πFs . Then
+∞ ∫ (2m+1)πFs
1 ∑
x(t) = X(ω)ejωt dω
2π m=−∞ (2m−1)πFs
∫ πFs (1.64)
1 ∑
+∞
= ejθt X(θ + 2mπFs )ej2mπFs t dθ,
2π −πFs m=−∞
where in the second equality we have substituted ω = θ + 2mπFs in the integral, and we have swapped
the integral and the series. If we sample this representation to obtain xd [n] = x(nTs ), we can write
∫ πFs ∑
+∞
1
xd [n] = x(nTs ) = ejθnTs X(θ + 2mπFs )ej2mπFs nTs dθ
2π −πFs m=−∞
∫ ( +∞ ) (1.65)
1 πFs ∑
= ejθnTs X(θ + 2mπFs ) dθ,
2π −πFs m=−∞
because the exponentials inside the sum are all equal to 1 (ej2mπFs nTs = ej2nmπ = 1). If we finally
substitute ωd = θTs we obtain
∫ ( +∞ )
Fs +π jωd n ∑
xd [n] = e X(ωd Fs + 2mπFs ) dωd , (1.66)
2π −π m=−∞
ωd Fs ωd Fs ωd Fs
Figure 1.8: Examples of sampling a continuous time signal: (a) spectrum limited to the base band; (b)
the same spectrum shifted by 2π; (c) spectrum larger than the base band.
the original signal. The sampling theorem formalizes this intuition by saying that x(t) can be exactly
reconstructed from its samples x[n] = x(nTs ) if and only if X(ω) = 0 outside the base band (i.e. for
all | ω | ≥ π/Fs ). The frequency fNy = Fs /2 Hz, corresponding to the upper limit of the base band, is
called Nyquist frequency.
Based on what we have just said, when we sample a continuous-time signal we must choose Fs in
such a way that the Nyquist frequency is above any frequency of interest, otherwise frequencies above
fNy will be aliased. In the case of audio signals, we know from psychoacoustics that humans perceive
audio frequencies up to ∼ 20 kHz: therefore in order to guarantee that no artifacts are introduced by
the sampling procedure we must use Fs > 40 kHz, and in fact the most diffused standard is Fs =
44.1 kHz. In some specific cases we may use lower sampling frequencies: as an example it is known
that the spectrum of a speech signal is limited to ∼ 4 kHz, and accordingly the most diffused standard in
telephony is Fs = 8 kHz.
In the remainder of this section we sketch the proof of the sampling theorem. If X(ω)) ̸= 0 only in
the base band, then all the sum terms in Eq. (1.62) are 0 except for the one with m = 0. Therefore
Xd (ωd ) = Fs X(ωd Fs ) for ωd ∈ (−π, π). (1.67)
In order to reconstruct x(t) we can take the inverse Fourier Transform:
∫ +∞ ∫ πFs ∫ +πFs ( )
1 jωt 1 jωt 1 ω
x(t) = X(ω)e dω = X(ω)e dω = Xd ejωt dω. (1.68)
2π −∞ 2π −πFs 2πFs πFs Fs
where in the second equality we have exploited the hypothesis X ≡ 0 outside the base band and in the
third one we have used Eq. (1.67). If we now reapply the definition of the DTFT we obtain
∫ πFs [ ∑+∞
]
∑
+∞ ∫
1 −jωTs n jωt x(nTs ) πFs jω(t−nTs )
x(t) = x(nTs ))e e dω = e dω, (1.69)
2πFs −πFs n=−∞ n=−∞
2πFs −πFs
where in the second equality we have swapped the sum with the integral. Now look at the integral on the
right hand side. We can solve it explicitly and write
∫ πFs [ ]
1 1 2
ejω(t−nTs ) dω = ejπFs (t−nTs ) − e−jπFs (t−nTs ) =
2πFs −πFs 2πFs 2j(t − nTs )
(1.70)
sin[πFs (t − nTs )]
= = sinc[Fs (t − nTs )].
πFs (t − nTs )
That is, the integral is a cardinal sine function, defined as sinc(t) ≜ sin(πt)/πt (the use of π in the
definition has the effect that the sinc function has zero crossings on the non-zero integers). In conclusion,
we can rewrite Eq. (1.68) as
∑
+∞ ( )
t
x(t) = x(nTs ) sinc −n . (1.71)
n=−∞
Ts
We have just proved that if X ≡ 0 outside the base band then x(t) can be reconstructed from its samples
through Eq. (1.71). The opposite implication is obvious: if x(t) can be reconstructed through its samples
it must be true that X ≡ 0 outside the base band, since a sampled signal only supports frequencies up to
fNy by virtue of Eq. (1.62).
The Discrete Fourier Transform (DFT) is a special case of the DTFT applied to finite-length sequences
x[n] with 0 ≤ n ≤ N − 1. Let us consider one such sequence: we define the DFT of x[n] as the
sequence X[k] obtained by uniformly sampling the DTFT X(ωd ) on the ωd -axis between 0 ≤ ωd < 2π,
at points at ωk = 2πk/N , 0 ≤ k ≤ N − 1. If x[n] has been sampled from a continuous-time signal, i.e.
x[n] = x(nTs ), the points ωk correspond to the frequency points fk = kFs /N (in Hz).
From Eq. (1.53) one can then write
∑
N −1 ∑
N −1
x[n]e−j N kn =
2π
X[k] = X(ωd )|ωd =2πk/N = x[n]WNkn , 0≤k ≤N −1 (1.72)
n=0 n=0
where we have used the notation WN = e−j2π/N . Note that the DFT is also a finite-length sequence in
the frequency domain, with length N . The inverse discrete Fourier Transform (IDFT) is given by
N −1
1 ∑
x[n] = X[k]WN−kn , 0 ≤ n ≤ N − 1. (1.73)
N
k=0
This relation can be verified by multiplying both sides by WNln , with l integer, and summing the result
from n = 0 to n = N − 1:
−1 N −1 N −1 N −1
[N −1 ]
N∑ 1 ∑ ∑ −(k−l)n 1 ∑ ∑ −(k−l)n
ln
x[n]WN = X[k]WN = X[k] WN , (1.74)
N N
n=0 n=0 k=0 k=0 n=0
where the last equality has been obtained by interchanging the order of summation. Now, the summation
∑N −1 −(k−l)n
n=0 WN has the interesting property that it takes the value N when k − l = rN (r ∈ Z), and
takes the value 0 for any other value of k and l. Therefore Eq. (1.74) reduces to the definition of the DFT,
and therefore Eq. (1.73) is verified.
We have just proved that, for a length-N sequence x[n], the N values of its DTFT X(ωd ) at points
ωd = ωk are sufficient to determine x[n], and hence X(ωd ), uniquely. This justifies our definition of
Discrete Fourier Transform of finite-length sequences given in Eq. (1.72). The DFT is at the heart of
digital signal processing, because it is a computable transformation.
Most of the DTFT properties listed in Table 1.1 have a direct translation for the DFT. Clearly the DFT
is linear. The time- and frequency-shifting properties still correspond to a multiplication by a complex
number, however these properties becomes periodic with period N . As an example, the time shifting
properties for the DFT becomes
Clearly any shift of m + lN samples cannot be distinguished from a shift by m samples, since WNkm =
k(m+lN )
WN . In other words, the ambiguity of the shift in the time domain has a direct counterpart in the
frequency domain.
The convolution property also holds for the DFT and is stated as follows:
∑
N −1
z[n] = (x ∗ y)[n] ≜ x[n]y[n − m] ⇒ Z[k] = (X · Y )[k], (1.76)
m=0
where in this case the symbol ∗ indicates the periodic convolution. The proof of this property is similar
to the one given for the DTFT.
60 60 60
40 40 40
|X[k]|
|X[k]|
|X[k]|
20 20 20
0 0 0
0 20 40 60 0 20 40 60 0 50 100
Frequency index k Frequency index k Frequency index k
Figure 1.9: Examples of DFT applied to complex exponential sequences: (a) N = 64, k0 = 20, the
sequence is periodic and the DFT is a delta-sequence in the frequency domain; (b)N = 64, k0 = 20.5,
the sequence is not periodic and the DFT exhibits leakage; (c) N = 128, k0 = 41, the sequence is the
windowed exponential of Eq. (1.77) and the DFT is a shifted version of the rectangular window DFT.
the frequency of this particular signal. We call this effect leakage: since ω0 does not line up with one of
the “allowed” frequencies, the energy of the DFT leaks all over the base band.
In order to understand the leakage effect we now look at a third example. Figure 1.9(c) depicts the
DFT of the sequence {
ej2πk0 n/N , 0 ≤ n < N2 ,
x[n] = (1.77)
2 ≤ n < N,
N
0,
with k0 = 41 and N = 128. In other words, the sequence x[n] is constructed by taking the complex
exponential sequence ej2πk0 n/N over 64 points, and by zero-padding this sequence over the remaining
64 points. Note that the complex exponential sequence of this example is clearly the same that we have
considered in Fig. 1.9(b) (we have simply doubled the values of k0 and N , so that ω0 has the same
value), and is clearly periodic over N = 128 points. Note also that the DFTs of Fig. 1.9(b) and 1.9(c)
are identical, except that the one in Fig. 1.9(c) has twice the points and consequently a better resolution
in frequency.
The sequence x[n] of Eq. (1.77) can be also written as
{
−k0 n 1, 0 ≤ n < N2 ,
x[n] = e j2πk0 n/N
· wN/2 [n] = WN wN/2 [n], where wN/2 [n] = (1.78)
0, N2 ≤ n < N,
where wN/2 [n] is a rectangular window of length N/2. More in general, we call wM [n] a rectangular
window of length M .
The advantage of this representation is that we can now compute explicitely the DFT X[k], since we
know that multiplying by WN−k0 n in time correspoinds to shifting by k0 samples in frequency. Therefore
X[k] equals the DFT of wN/2 [h], shifted by k0 samples. The DFT of the generic rectangular window
wM [n] can be computed from the definition as
∑
N −1 ∑
M −1
1 − WNkM −j πk
kM
(M −1) sin(π N )
F{wM }[k] = wM [n]WNkn = WNkn = = e N , (1.79)
n=0 n=0
1 − WNk sin(π Nk )
∑ −1 k
where in the third equality we have used the property of the geometric series M M
k=0 q = (1−q )/(1−
q), and in the fourth equality we have applied some straigthforward trigonometry. Figure 1.10 shows
three plots of this DFT, for different window lengths M . In particular Fig. 1.10(c) shows the case
60 60 60
Magnitude
Magnitude
Magnitude
40 40 40
20 20 20
0 0 0
0 50 100 0 50 100 0 50 100
Frequency index k Frequency index k Frequency index k
Figure 1.10: Examples of DFT (N = 128) applied to a rectangular window: (a) M = N/16, (b)
M = N/8, (a) M = N/2.
M = N/2: note that this plot coincides with the one in Fig. 1.9(c), except for a shift in frequency,
which is what we expected.
To summarize, in this section we have shown that (a) the frequency resolution of the DFT is limited
by the number N of DFT points, (b) the DFT of a complex exponential sequence which is not periodic
over N points exhibits poor resolution and leakage, and (c) the leakage effect can be interpreted as the
effect of a rectangular window of length N applied to the sequence.
M-1.9
Compute the DFT of complex exponential sequence x[n] = ej2πk0 n/N for integer and non-integer values of k0 , in
order to explore leakage effects. Then ompute the DFT of zero-padded complex exponential sequences, in order
to see how frequency resolution and leakage are affected.
−1
N
−1
N
∑
2 ∑
2
k(2n+1)
X[k] = x[2n]WN2kn + x[2n + 1]WN
n=0 n=0
−1
N
−1
N
∑
2 ∑
2
where in the first equality we have separated the terms involving even elements and odd elements, in
the second one we have factorized a term WNk , and in the third one we have exploited the property
WaNak = W k for any a ∈ N. If we now look at the two sums in the last row, we see that they are the DFTs
N
of the sequences x0 [n] = {x[0], x[2], . . . x[N − 2]} and x1 [n] = {x[1], x[3], . . . x[N − 1]}, respectively.
Both x0 [n] and x1 [n] have length N/2. Therefore the problem of computing X[k] reduces to the problem
of computing two half-length DFTs X0 [k], Xi [k], and then summing their values according to Eq. (1.80).
The resulting computational procedure is detailed in Algorithm 1.1. It is quite obvious that the running
time T (N ) of this algorithm is given by the recurrence equation T (N ) = 2T (N/2) + O(N ), therefore
T (N ) = O(N log N ).
M-1.10
Realize Algorithm 1.1 and assess its functioning by comparing it with the fft function.
each made of log N bits). Figure 1.11(b) shows an efficient parallel implementation of this algorithm: it
is made of log N stages, each performing N/2 butterfly operations.
14 return X
M-1.11
Realize Algorithm 1.2 and assess its functioning by comparing it with the fft function.
If w ≡ 1, then this equation reduces to the DTFT of x[n]. However in practical applications we are inter-
ested in using finite-length windows, in order to analyze the spectral properties of x[n] over a finite time
interval. In such applications what we really do is computing, for each n, the DFT of the finite-length
sequence w[n − m]x[m], and what we obtain is a finite-length sequence Xn [k] = Xn (ωd )|ωd =2πk/N .
Xn (ωd ) is the DTFT of the sequence xn [m] = w[n − m]x[m], therefore
∫ +π
1
w[n − m]x[n] = Xn (ωd )ejωd m dωd , (1.82)
2π −π
This equation shows that the sequence x can be reconstructed from its STFT.
The magnitude of STFT is called spectrogram. Since STFT is function of two variables (i.e., time
n and frequency ωd ), the plot of the spectrogram lives in a 3-D space. A typical 2-D representation of
Butterfly a a+Wb
x[0] Operation W
x[0] b a−Wb
x[0] x[4] x[4] x[0] X [0]
x[0] x[2] W 02 W 04 W 08
x[1] x[4] x[2] x[1] X [1]
x[6] x[2] W 14 W 18
x[2] x[2] X [2]
x[6] x[6] W 02 W 28
x[3]
x[3] X [3]
x[4] x[1] W 38
x[5] x[1]
x[1] x[4] X [4]
x[5] W 02 W 04
x[6] x[3] x[5] x[5] X [5]
x[7] x[5] W 14
x[7] x[3] x[3] x[6] X [6]
W 02
x[7] x[7] X [7]
x[7]
s=1 s=2 s=3
(a) (b)
Figure 1.11: (a) Tree of calls in an invocation of RECURSIVE-FFT: the leaves contain a bit-reversal
permutation of x[n]; (b) parallel realization of ITERATIVE-FFT, where butterfly operations involve bit-
reversed elements of x[n].
a spectrogram uses two axes for time and frequency, and magnitude values are represented with some
color map (e.g. a greyscale map).
Figure 1.12 shows an example of short-time spectral analysis applied to a simple time-varying signal,
the chirp. The chirp sequence is defined as
[ω ]
0
x[n] = A cos (nTs )2 . (1.84)
2
We have already seen in Sec. 1.3.1 that in general the instantaneous frequency of a signal cos[ϕ(t)] is
dϕ/dt(t): therefore the instantaneuous frequency of this chirp signal is ω0 nTs , i.e. it is not constant but
increases linearly in time. A portion of a chirp signal with ω0 = 2π · 800 rad/s2 is shown in Fig. 1.12(a).
Now we segment this signal into a set of subsequences with short finite length, e.g. using a rectangular
window w[n], as in Eq. (1.81). If the window is sufficiently short, we can reasonably assume that the
frequency of the chirp is approximately constant in each subsequence. In fact the resulting spectrogram,
shown in Fig. 1.12(b), shows that for a given time index n the STFT is essentially the DFT of a sinu-
soidal sequence: the STFT magnitude has large non-zero values around the instantaneous frequency, and
much smaller non-zero values at other frequency points. Moreover the instantaneous frequency increases
linearly and after 5 seconds it reaches the value 4000 Hz (= 800 · 5 Hz), as expected.
So far in this example we have implicitely assumed that the sampling period Ts is sufficiently small,
so that no aliasing occurs: in fact in drawing Fig. 1.12(b) we have used a sampling rate Fs = 8 kHz.
However, aliasing will occur if we use smaller sampling rates. Figure 1.12(c) shows the spectrogram
obtained using Fs = 4 kHz: in this case frequencies above fNy = 2 kHz are aliased, and this effect
appears in the spectrogram when the black line starts moving down instead of increasing.
M-1.12
Synthesize a chirp signal and compute its STFT using different sampling rates, in order to verify the occurrence
of aliasing.
1 4000 2000
f (Hz)
f (Hz)
0 2000 1000
−1 0 0
0 0.05 0.1 0.15 0.2 0 1 2 3 4 0 1 2 3 4
t (s) t (s) t (s)
(a) (b) (c)
Figure 1.12: Short-time spectral analysis of a chirp signal: (a) initial portion of the chirp signal, with
ω0 = 2π · 800 rad/s2 ; (b) spectrogram obtained with Fs = 8 kHz; (c) spectrogram obtained with
Fs = 4 kHz.
0
1 rectangular
hann
−10
magnitude (dB)
0.8 hamming
blackman
window
0.6 −20
0.4 −30
0.2
−40
0
−20 −10 0 10 20 0 50 100
Time index n Frequency index k
(a) (b)
Figure 1.13: Comparison of different windows; (a) time-domain window sequences, symmetrical with
respect to n = 0; (b) window spectra in the vicinity of k = 0. Differences in terms of main lobe width
and relative side lobe level can be appreciated.
of main lobe width and relative side lobe level. The hamming window is often considered to provide the
best trade-off for short-term spectral analysis of generic signals, however the choice of the window to
use depends in general on many factors, including the characteristics of the signal to be analyzed.
M-1.13
Compute the DFT of complex exponential sequences x[n] = ej2πk0 n/N , windowed with rectangular, Hann, Ham-
ming, and Blackman windows. Verify the performance of each window with respect to resolution and leakage
effects, for integer and non-integer values of k0 .
∑
+∞
Z{x}(z) = X(z) = x[n]z −n , (1.86)
n=−∞
belong to the ROC. Therefore the ROC is in general a ring in the complex plane. Such ring may or may
not include the unit circle, in other words the z-Transform of x[n] may exist in a certain region of the
complex plane even if the DTFT of x[n] does not exist.
Table 1.2 lists a set of relevant properties of the z-Transform, that are particularly useful in the study
of discrete-time signals and digital filters. Most of them have direct counterparts in DTFT properties (see
Table 1.1), and can be proved from the definition (1.86).
The inverse z-transform is formally defined as
I
1
x[n] = X(z)z n−1 dz, (1.88)
2πj C
where the integral is evaluated over the contour C, which can be any closed contour that belongs to
the ROC of X(z) and encircles z = 0. Without entering into details, we remark that for the kinds of
sequences and z-transforms typically encountered in digital signal processing applications, less formal
procedures are sufficient and preferable. We propose some examples in Sec. 1.6.2.
The z-transform is
∑ −1 ∑ −1
N N
( )−n 1 − aN z −N
X(z) = an z −n = az −1 = . (1.90)
1 − az −1
n=0 n=0
In particular since the series has only a finite number of terms the ROC is the entire complex plane. We
can generalize and state that any finite-length sequence admits a z-transform whose ROC is the entire
complex plane.
Slightly more complex example: the right-sided exponential sequence x[n] = an u[n] already exam-
ined in Sec. 1.4.1. In this case
∑
+∞ ∑
+∞
( )−n 1
−n
X(z) = n
a u[n]z = az −1 = , (1.91)
n=−∞
1 − az −1
n=0
where the last equality can be written only if az −1 < 1, otherwise the series does not converge. In
other words the ROC is the region | z | > | a |. It is easy to verify that the left-sided exponential sequence
x[n] = −an u[−n] also has a z-transform, identical to the one in (1.91), but with a different ROC (the
region | z | < | a |).
This example shows that the algebraic expression of the z-transform does not completely specify
the corresponding sequence, and that the ROC must also be specified. The example also shows a case
of a sequence that has a z-transform but does not have a DTFT: for a ≥ 1 the right-sided exponential
sequence still admits the z-transform 1/(1 − az −1 ) in the region | z | > a > 1 although it increases
exponentially in time and does not have a DTFT.
The right-sided real sinusoidal sequence
( x[n] = cos(ω0 n)u[n]. ) Note that it can be written as a sum
of two exponential sequences: x[n] = ejω0 n u[n] + e−jω0 n u[n] . Therefore
[ ]
1 1 1 1 − [cos ω0 ]z −1
X(n) = + = . . . = . (1.92)
2 1 − ejω0 z −1 1 − e−jω0 z −1 1 − 2[cos ω0 ]z −1 + z −2
Since | cos ω0 | ≤ 1, the ROC is clearly the region | z | > 1. This is a second example of a sequence that
does not admit a DTFT but admits a z-transform.
A final important example is the exponentially damped right-sided sinusoidal sequence, defined as
x[n] = r−n cos(ω0 n)u[n], with 0 < r < 1. In this case
1 − [r cos ω0 ]z −1
X(n) = . (1.93)
1 − 2[r cos ω0 ]z −1 + r2 z −2
The ROC is the region | z | > r. Note that in this case the ROC includes the unit circle, therefore the
sequence x[n] also admits a DTFT. In fact as we will see this sequence represents the impulse response
of a second-order resonating filter.
The values z for which the P (z) = 0 (and therefore X(z) = 0) are called zeros of X(z). The values
z for which Q(z) = 0 are called poles of X(z). A number of important relations exist between the poles
of a rational transform and its ROC.
N
aN a a a
ω0
1 1 1 1
Figure 1.14: Pole-zero plots and ROCs of some simple transforms: (a) finite-length exponential se-
quence; (b) right-sided exponential sequence with | a | > 1; (c) left-sided exponential sequence with
| a | > 1; (d) exponentially damped, right-sided sinusoidal sequence. In all plots the dotted circle is the
unit circle and the gray-shaded region is the ROC of the corresponding transform.
First, the ROC cannot contain poles by definition, since X(z) is not defined on the poles. It follows
immediately that a finite-length sequence cannot have any poles. As an example, looking at Eq. (1.90)
one notices that the pole at z = a cancels with one of the zeros of the numerator (1 − aN z −N ), therefore
there are no poles. In a similar line of reasoning one can prove that for any right-sided sequence the
ROC extends outwards from the pole with largest absolute value towards | z | → +∞, and that for any
left-sided sequence the ROC extends inwards from the pole with smallest absolute value towards z → 0.
For a generic sequence that extends infinitely on both sides, the ROC consists of a ring bounded by a
pole on the interior and exterior.
So-called pole-zero plots are typically used to represent z-transforms and their associated ROCs.
Conventionally a zero is denoted with a “◦” symbol and a pole is denoted with a “×” symbol. As an
example. figure 1.14 shows the pole-zero plots for some of the transforms discussed in the previous
section. Note in particular that the right-sided and left-sided exponential sequences have identical pole-
zero patterns, but have different ROCs.
Since the pole-zero pattern does not completely define the corresponding sequence, it is sometimes
convenient to specify some time-domain properties of the sequences, that implicitly define the ROC. As
an example, consider the pole-zero plot of either Fig. 1.14(b) or Fig. 1.14(c) and assume that the ROC
is not known. If one states that the corresponding sequence is absolutely summable, then this implies
that it admits a DTFT and consequently implies that the ROC must be that of Fig. 1.14(c). Alternatively
one may state that the corresponding sequence is causal: this implies that the ROC must extend towards
| z | → +∞ and consequently implies that the ROC must be that of Fig. 1.14(b).
In Sec. 1.2.3 we have seen that a LTI system is completely characterized by its impulse response h[n],
since the response to any imput sequence x[n] can be written as the convolution between x and h (see
Eqs. (1.22, 1.23)). Using the convolution property given in table 1.2, one can restate this by saying that
for an LTI system an input sequence x is related to the corresponding output sequence y through the
equation
Y (z) = H(z)X(z), (1.96)
where X(z), Y (z), H(z) are the z-transforms of x[n], y[n], h[n], respectively. We call H(z) the transfer
function of the LTI system. Assuming that an appropriate ROC is specified for H, we can say that the
LTI system is completely characterized by its transfer function.
If the ROC includes the unit circle, then h[n] admits a Fourier representation. In this case we can
also write ( ) ( ) ( )
Y ejωd = H ejωd X ejωd , (1.97)
( jω ) ( jω ) ( jω ) ( jω )
where X e d , Y e d , H e d are the DTFTs of x[n], y[n], h[n], respectively. We call H e d
the frequency response of ( the) system. Assuming
[ ( )]that the DTFTs are expressed in polar form (see
Eq. (1.54)), we call H ejωd and arg H ejωd the magnitude response and the phase response
of the system, respectively.
If the LTI system under consideration has been expressed through a constant-coefficient difference
equation (see Sec. 1.2.3), then one can immediately write the corresponding transfer function as a rational
function:
∑
N ∑
M ∑
N ∑
M
−k
ak y[n − k] = bk x[n − k] ⇔ ak z Y (z) = bk z −k X(z), (1.98)
k=0 k=0 k=0 k=0
Such z-transforms arise frequently in digital signal processing applications. By looking at a transform
of this kind, one can easily find the corresponding sequence (this is an example of an informal procedure
for determining the z-transform). First one can note that H(z) can be written as
∏M ( −1
)
b0 k=1 1 − ck z
H(z) = · ∏N , (1.100)
−1
k=1 (1 − dk z )
a0
where the ck ’s and the dk ’s are the zeros and the poles of H, respectively. If M ≤ N and the poles are
all first-order, then by applying a partial fraction expansion the above equation can be rewritten as
∑
N
Ak ( )
H(z) = , with Ak = 1 − dk z −1 H(z)|z=dk . (1.101)
1 − dk z −1
k=1
Looking back at Eq. (1.91), we can conclude that h[n] is a linear combination of right-sided exponential
sequences (or left-sided exponential sequences, depending on the ROC).
( ) ( [ ( )])
y[n] = (h ∗ x)[n] = A H ejω0 · cos ω0 n + arg H ejω0 , (1.102)
i.e. the magnitude response at ω0 defines the gain and the phase delay of the system at the frequency
ωd = ω0 . Similar considerations apply to a generic input x[n]: the corresponding output can be described
in terms of system magnitude and phase response as
( ) ( ) ( )
Y ejωd = H ejωd · X ejωd ,
[ ( )] [ ( )] [ ( )] (1.103)
arg Y ejωd = arg H ejωd + arg X ejωd .
d { [ ( )]}
τ (ωd ) = − arg H ejωd . (1.104)
dωd
The deviation of τ from a constant indicates the degree of phase non-linearity. Figure 1.15(a) provides a
graphical comparison of the phase delay and the group delay.
The reason why τ is termed group delay is that this quantity relates to the effect of the phase on a
quasi-sinusoidal signal. More precisely, consider the signal x[n] = a[n]eω0 n , and assume that a[n] is
varying slowly (equivalently, assume that the spectrum A(ejω ) is concentrated near ω = 0). Then x will
look like the signal in Fig. 1.15(b) (upper panel). The signal can also be rewritten as
[ ∫ +ϵ ]
ω0 n 1
x[n] = a[n]e = A(e )e dω eω0 n ,
jω jωn
(1.105)
2π −ϵ
where ϵ ≪ π is the upper limit of the band of A. Therefore x can be viewed as a superposition of
neighboring sinusoidal
( jω ) components, or a group around ω0 .
Since X e d ̸= 0 only in the vicinity of ω0 ), the phase response can be approximated in that
neighborhood as a line with slope −τ (ω0 ). With this approximation it is then quite straightforward to
show that the output is the filter output y[n] looks like in Fig. 1.15(b) (lower panel), i.e. τ (ω0 ) represents
the delay applied to the slowly-varying amplitude a[n].
for | z | = 1. Therefore the condition of stability is equivalent to the condition that the ROC of the
transfer function includes the unit circle. The examples depicted in Fig. 1.14 confirm this finding.
Consider the relevant particular case of rational transfer functions associated to causal systems: for
these transfer functions the condition of stability is equivalent to the condition that all the poles are inside
the unit circle, because the ROC extends outwards from the pole with the largest absolute value.
3
Our definition uses ωd , so that τ is adimensional. Usual definitions employ ω, so that τ is in seconds.
arg H(ejω d )
phase
response x[n]
in
rig
t h e o lay) n
h d e
oug se
e thr =pha
n
li lop e
phase
(−s delay
group
y)
de e
delay
ωd
up rv
la
y[n]
ro cu
−φ0
=g the
pe to
n
slo nt
(− nge
ta
(a) (b)
Figure 1.15: Comparing phase delay and group delay: (a) evaluation of phase delay and group delay for
a generic non-linear phase response; (b) illustration of phase delay and group delay for a narrowband
signal.
Another relevant particular case are FIR systems. We have alredy seen that a FIR system is always
stable since its impulse response is always absolutely summable. This property can be “seen” in the z
domain by noting that the transfer function of a FIR system does not have poles, therefore the ROC is
always the entire z plane and includes the unit circle.
Most the LTI systems that we are going to realize are represented as linear constant coefficient equations.
As we have seen, the output of a LTI system represented in this way can be computed recursively pro-
vided that past values of input and output are available. This values will undergo two operations in the
computation of the filter output: multiplication with coefficients, and addition.
Therefore the three basic elements for the implementation of a filter are memory for storing, past
values, adders, and multipliers. A filter structure is created by properly interconnecting these elements.
Figure 1.16 shows the pictorial symbols that are typically used for representing them. With these ele-
ments, a general filter
∑
N ∑
M ∑M
bk z −k
y[n] − ak y[n − k] = bk x[n − k] ⇒ k=0
H(z) = ∑N (1.107)
−k
k=1 k=0 k=1 ak z
x[n] b0 1 y[n]
x[n] x[n]+y[n]
z−1 z−1
bM aN
y[n]
(a) (b)
Figure 1.16: Block diagram representation of filters: (a) conventional pictorial symbols for delay, adder,
and multiplier; (b) a general filter structure.
Filters can be classified according to most salient characteristics of their frequency responses, most
typically their magnitude. Basic filter categories are represented in Fig. 1.17. Other types of filters can
in general be described as a combination of these basic elements.
Low-pass filters (see Fig. 1.17(a), upper panel) select low frequencies up to a given cut-off frequency
ωc , annd attenuate higher frequencies. High-pass filters (see Fig. 1.17(a), lower panel) have the opposite
behavior: they select frequencies above ωc , and attenuate lower frequencies.
Band-pass filters (see Fig. 1.17(b), upper panel) select frequencies within a frequency band, specified
by two cut-off frequencies ωc1 and ωc2 , while frequencies outside this band are attenuated. Band-reject
filters (see Fig. 1.17(b), lower panel) have the opposite behavior: they select frequencies outside the band
[ωc1 , ωc2 ], and attenuate frequencies within the band.
Resonator filters (see Fig. 1.17(c), upper panel) amplify frequencies in a narrow band around a cut-off
frequency ωc . Conversely, notch filters (see Fig. 1.17(c), lower panel) attenuate frequencies in a narrow
band around ωc . Finally, when the magnitude response is perfectly flat the filter is called an all-pass
filter, since all frequencies are passed. Note however that an all-pass filter modifies the phase of the input
signal. In the next chapter we will see some important uses of all-pass filters.
In order to optimize their frequency selective properties, ideal filters should have magnitude re-
sponses exhibiting vertical transition between selected frequencies and rejected ones, Moreover they
should have null or linear phase response in order not to introduce phase distortion. As an example, the
ideal low-pass filter has the frequency response
{
( jω
) 1, | ω | ≤ ωc ,
Hlp e = (1.108)
0, ωc < | ω | ≤ π.
sin ωc n
hlp [n] = , −∞ < n < +∞, (1.109)
πn
ωc ωc1 ωc2 ωc
ωd ωd ωd
H(ωd) H(ωd) H(ωd)
band−reject notch
high−pass
ωc
ωc ωc1 ωc2
ωd ωd ωd
(a) (b) (c)
Figure 1.17: Classification of filters into basic categories, depending on their magnitude response
| H(ωd ) |: (a) low-pass and high-pass filter; (b) band-pass and band-reject filter; (c) resonator and
notch filter.
i.e. the filter is non-causal and has a two-sided infinite impulse response. Therefore it is not possible
to compute its output either recursively or non-recursively. In other words, the filter is not realizable.
Similar considerations apply to other types of ideal filters.
The simplest examples of realizable filters are first-order filters (i.e. filters with no more than one
pole and/or one zero). First-order FIR low-pass and high-pass filters are defined as follows:
1( ) 1( )
Hlp (z) = 1 + z −1 , Hhp (z) = 1 − z −1 . (1.110)
2 2
They have a zero in z = 1 and z = −1, respectively. Therefore the magnitude responses decrease and
increase monotically, respectively. The low-pass filter in particular can be recognized to be a moving
average filter that averages two contiguous samples.
First-order IIR low-pass and high-pass filters are defined as follows:
1 − α 1 + z −1 1 + α 1 − z −1
Hlp (z) = , Hhp (z) = . (1.111)
2 1 − αz −1 2 1 − αz −1
Both have a pole in z = α, therefore | α | < 1 for stability, and α ∈ R in order for the impulse responses
to be real-valued. They have a zero in z = 1 and z = −1, respectively. For α = 0 they reduce to the FIR
filters above, while for α > 0 they have steeper responses.
M-1.14
Study the frequency responses of the first-order low-pass and high-pass filters of Eqs. (1.110, 1.111). Apply them
to a broad-band audio signal (e.g. a snaredrum), and study their effect.
Primers on digital audio processing: Rocchesso [2003] and Steiglitz [1996]. Examples focused on
audio are found also in [Zölzer, 2002, Chapter 1]
A useful reading about Fourier analysis for discrete-time signals is provided in [Smith, 2008]. Our
discussion of FFT algorithms is based on [Cormen et al., 2001].
A discussion on recursive generators of sinusoidal signals is found e.g. in [Orfanidis, 1996]. Models
for fractal signals are also partially discussed in [Orfanidis, 1996].
References
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms. MIT Press,
Cambridge, MA, 2001.
Alan V. Oppenheim, Ronald W. Schafer, and John R. Buck. Discrete-Time Signal Processing. Prentice Hall, Upper Saddle
River, NJ, 1999.
Davide Rocchesso. Introduction to Sound Processing. Mondo Estremo, Firenze, 2003. http://profs.sci.univr.it/˜rocchess/SP.
Julius O. Smith. Mathematics of the Discrete Fourier Transform (DFT). http://ccrma.stanford.edu/ jos/mdft/, 2008. Online
book, accessed Oct. 2008.
Kenneth Steiglitz. A Digital Signal Processing Primer - With Applications to Digital Audio and Computer Music. Prentice
Hall, 1996.
Udo Zölzer, editor. DAFX – Digital Audio Effects. John Wiley & Sons, 2002.
1-43
1-44 Algorithms for Sound and Music Computing [v.February 2, 2019]