Audio Concepts, APIs, and Architecture
Audio Concepts, APIs, and Architecture
Roger B. Dannenberg
Professor of Computer Science and Art
Carnegie Mellon University
Introduction
n So far, we’ve dealt with discrete, symbolic
music representations
n “Introduction to Computer Music” covers
sampling theory, sound synthesis, audio
effects
n This lecture addresses some system and
real-time issues of audio processing
n We will not delve into any DSP algorithms for
generating/transforming audio samples
1
Overview
n Audio Concepts
Samples
n
Frames
n
n Blocks
n Synchronous processing
n Audio APIs
n PortAudio
n Callback models
n Blocking API models
n Scheduling
n Architecture
n Unit generators
n Fan-In, Fan-Out
n Plug-in Architectures
Audio Concepts
n Audio is basically a stream of signal amplitudes
n Typically represented
n Externally as 16-bit signed integer: +/- 32K
n Internally as 32-bit float from [-1, +1]
n Floating point gives >16bit precision
n And “headroom”: samples >1 are no problem as long as later, something
(e.g. a volume control) scales them back to [-1, +1]
n Fixed sample rate, e.g. 44100 samples/second (Hz)
n Many variations:
n Sample rates from 8000 to 96000 (and more)
n Can represent frequencies from 0 to ½ sample rate
n Sample size from 8bit to 24bit integer, 32-bit float
n About 6dB/bit signal-to-noise ratio
n Also 1-bit delta-sigma modulation and compressed formats
2
Multi-Channel Audio
n Each channel is an
independent audio signal
n Each sample period now
has one sample per channel
n Sample period is called an
audio frame
n Formats:
n Usually stored as interleaved data
n Usually processed as independent, non-interleaved arrays
n Exception: Since channels are often correlated, there are
special multi-channel compression and encoding techniques,
e.g. for surround sound on DVDs.
3
Audio is Always Processed
Synchronously
Read frames
Interleaved to
non-interleaved
Sometimes
described as a
data-flow process: Audio effect Audio effect
each box accepts
block(s) and
outputs block(s) at Gain, etc. Gain, etc.
block time t.
4
Latency/Buffers Are Not
Completely Bad
n Of course, there’s no reason to increase
buffer sizes just to add delay (latency) to
audio!
n What about reducing buffer sizes?
n Very small buffers (or none) means we cannot
benefit from block processing: more CPU load
n Small buffers (~1ms) lead to underflow if OS
does not run our application immediately after
samples become available.
n Blocks and buffers are a “necessary evil”
9 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg
5
Buffering Schemes
n Hardware buffering schemes include:
n Circular Buffer
n Double Buffer
n Buffer Queues
n these may be reflected in the user level API
n Poll for buffer position, or get interrupt or callback
when buffers complete
n What’s a callback?
n Typically audio code generates blocks and you care
about adapting block-based processing to buffer-
based input/output. (It may or may not be 1:1)
Latency in Detail
n Audio input/output is strictly synchronous and
precise (to < 1ns)
n Therefore, we need input/output buffers
n Assume audio block size = b samples
n Computation time r sample times
n Assume pauses up to c sample periods
n Worst case:
n Wait for b samples – inserts a delay of b
n Process b samples in r sample periods – delay of r
n Pause for c sample periods – delay of c
n Total delay is b + r + c sample periods
6
Latency In Detail: Circular Buffers
n Assumes sample-by-sample processing
n Audio latency is b + r + c sample periods
n In reality, there are going to be a few samples of buffering or
latency in the transfer from input hardware to application
memory and from application memory to output hardware.
n But this number is probably small compared to c
n Normal buffer state is: input empty, output full
7
Latency In Detail: Double Buffer
n Assumes block-by-block processing
n Assume buffer size is nb, a multiple of block size
n Audio latency is 2nb sample periods
Input to buffer
Process buffer
Output from buffer
2nb
n How long to process one buffer (worst case)? nr + c
n How long do we have? nb
n n ≥ c / (b – r)
n n ≥ c / (b – r)
n Example 1: n Example 2:
n b = 64 n b = 64
n r = 48 n r = 48
n c = 128 n c = 16
n ∴n=8 n ∴n=1
8
Latency In Detail: Buffer Queues
Synchronous/blocking vs
Asynchronous/callback APIs
n Blocking APIs
n Typically provide primitives like read() and write()
n Can be used with select() to interleave with other operations
n Users manage their own threads for concurrency (consider
Python, Ruby, SmallTalk, …)
n Great if your OS threading services can provide real-time
guarantees (e.g. some embedded computers, Linux)
n Callback APIs
n User provides a function pointer to be called when samples
are available/needed
n Concurrency is implicit, user must be careful with locks or
blocking calls
n You can assume the API is doing its best to be real-time
9
PortAudio: An Abstraction of Audio APIs
n See http://www.portaudio.com
19 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg
PortAudio Example:
Generating a Sine Wave
struct TestData {
float sine[TABLE_SIZE];
int phase;
};
10
PortAudio Example:
Running a Stream (1)
int main(void)
{
TestData data;
for (int i=0; i < TABLE_SIZE; ++i)
data.sine[i] = sin(M_PI * 2 *
((double)i/(double)TABLE_SIZE));
data.phase = 0;
Pa_Initialize();
PaStreamParameters outputParameters;
outputParameters.device = Pa_GetDefaultOutputDevice();
outputParameters.channelCount = 2;
outputParameters.sampleFormat = paFloat32;
outputParameters.suggestedLatency =
Pa_GetDeviceInfo(outputParameters.device)->
defaultLowOutputLatency;
outputParameters.hostApiSpecificStreamInfo = NULL;
...
PortAudio Example:
Running a Stream (2)
...
PaStream *stream;
Pa_OpenStream(&stream, NULL /* no input */,
&outputParameters,
SAMPLE_RATE, FRAMES_PER_BUFFER, paClipOff /*flags*/,
TestCallback, &data);
Pa_StartStream(stream);
Pa_StopStream(stream);
Pa_CloseStream(stream);
Pa_Terminate();
}
11
Modular Audio Processing
n Unit generators
n Graph evaluation
n Evaluation mechanisms
n Block-based processing
n Vector allocation strategies
n Variations
Unit Generators
n A sample generating or processing function, and its
accompanying state. e.g. Oscillators, filters, etc.
n A functional view:
n f(state, inputs) à (state, outputs)
n An OOP view:
n Class Ugen{ virtual Update( float*[] ins, float *[] outs ); }
n In a dynamic system, the flow
between units is explicitly
represented by a
“synchronous dataflow graph”
12
Graph Evaluation
n Generators which produce signals must be evaluated before the
generators which consume those signals*, therefore: execute in
a depth-first order starting from sinks. (1) (2)
(4)
n Note: depth-first implies sinks are
the last to evaluate in any graph
traversal. (3) (5)
(6)
*Why?
*Or else, outputs from generator will not be considered until the next “pass”,
introducing a one-block delay, or even worse, if outputs go to reusable
memory buffers, output could be overwritten.
Evaluation Mechanisms
(3) (5)
(6)
13
Topological Sort
class Ugen
var block_num
var inputs
def update(new_block_num)
if new_block_num > block_num
for input in inputs
input.update(new_block_num)
really_update() // virtual method
block_num = block_num + 1
14
Block-Based Processing
n Process arrays of input samples and produces arrays
of output samples
n Pros: more efficient (common subexpressions,
register loads, indexing, cache line prefetching, loop
unrolling, SIMD etc)
n Cons: latency, feedback loops incur blocksize delay
n Vector size:
n fixed (c.f. Csound k-rate, Aura)
n Variable with upper bound
15
Buffer Allocation Strategies
n 1) One buffer/vector per generated signal, i.e. for
every Unit Generator output.
n 2) Reuse buffers once all sinks have consumed them
(c.f. Graph coloring register allocation)
n Dannenberg’s measurements indicate this is wasted
effort
n Buffers are relatively small
n Cache is relatively big
n DSP is relatively expensive compared to (relatively
few) cache faults
n So speedup from buffer reuse (2) is insignificant
Feedback
16
Variations on Block-Based
Processing
n Hierarchical block sizes e.g. process subgraphs with
smaller blocks to reduce feedback delay
n Synchronous multi-rate: separate evaluation phases
using the same or different graphs (e.g. Csound
krate/arate passes).
n Or support signals with one sample per block time:
“Block-rate” UGs have no inner loop and support a
sample rate of
BLOCK_SR = AUDIO_SR / BLOCKSIZE.
n Combine synchronous dataflow graph for audio with
asynchronous message processing for control (e.g.
Max/MSP)
33 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg
Audio Plug-Ins
n A plug-in is a software object that can extend
the functionality of an audio application, e.g.
an editor, player, or software synthesizer.
n Effectively a plug-in is a unit generator:
n audio inputs
n audio outputs
n parametric controls
n Plug-ins are
n dynamically loadable and
n self-describing
17
VST Plug-Ins
n Proprietary spec: Steinberg
n Commonly used and widely supported
n Multiplatform:
n Windows (a multithreaded DLL)
n Mac OS-X (a bundle)
n Linux (sort-of)
n Uses WINE (Windows emulation)
n Kjetil Matheussen's original vstserver,
n The fst project from Paul Davis and Torben Hohn,
n Chris Cannam's dssi-vst wrapper plugin
18
VST Conventions
Example Code
AGain::AGain(audioMasterCallback audioMaster)
: AudioEffectX(audioMaster, 1, 1) // 1 program, 1 parameter only
{ fGain = 1.; // default to 0 dB
setNumInputs(2); // stereo in
setNumOutputs(2); // stereo out
setUniqueID('Gain'); // identify
canMono(); // makes sense to feed both inputs the same signal
canProcessReplacing (); // supports both accumulating and replacing
strcpy(programName, "Default"); // default program name
}
19
Example Code (2)
void AGain::setParameter(long index, float value)
{ fGain = value;
}
20
Example Code (4)
void AGain::process(float **inputs, float **outputs,
long sampleFrames)
{
float *in1 = inputs[0];
float *in2 = inputs[1];
float *out1 = outputs[0];
float *out2 = outputs[1];
21
VST on the Host Side
typedef AEffect *(*mainCall)(audioMasterCallback cb);
audioMasterCallback audioMaster;
void instanciatePlug(mainCall plugsMain)
Assume host loaded
{ AEffect *ce = plugsMain (&audioMaster); plugin and has its main
if (ce && ce->magic == AEffectMagic) { .... }
}
------ the main() routine in the plugin (DLL): -------
AEffect *main(audioMasterCallback audioMaster)
{ // check for the correct version of VST
if (!audioMaster(0,audioMasterVersion,0,0,0,0)) return 0;
ADelay* effect = new ADelay(audioMaster); // Create the AudioEffect
if (!effect) return 0;
if (oome) { // Check if no problem in constructor of AGain
delete effect;
return 0;
}
return effect->getAeffect(); // return C interface of our plug-in
}
More VST
n Program = full set of parameters
n Bank = set of programs (user can call up preset)
n Interactive Interfaces
n Host can construct editor based on text:
n Parameter name, display, label – “Gain: -6 dB”
22
LADSPA – Linux Audio Developers’
Simple Plugin Architecture
n the plugin library is loaded (using a system-specific method like
dlopen or for glib, gtk+ users, g_module_open).
n the plugin descriptor is obtained using the plugin library's
ladspa_descriptor function, which may allocate memory.
n the host uses the plugin's instantiate function to allocate a new
(or several new) sample-processing instances.
n the host must connect buffers to every one of the plugin's ports.
It must also call activate before running samples through the
plugin.
n the host processes sample data with the plugin by filling the
input buffers it connected, then calling either run or run_adding.
The host may reconnect ports with connect_port as it sees fit.
n the host deactivates the plugin handle. It may opt to activate and
reuse the handle, or it may destroy the handle.
n the handle is destroyed using the cleanup function.
n the plugin is closed. Its _fini function is responsible for
deallocating memory.
45 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg
Summary
n Audio samples, frames, blocks
n Synchronous processing:
n Never skip or duplicate samples
n Buffers are essential
n Latency comes (mostly) from buffer length
n PortAudio
n Host API
n Device
n Stream
23
Summary (2)
n Modular Audio Processing
n Unit Generator
n Networks of Unit Generators
n Synchronous Dataflow
n Plug-ins
n VST example
n Unit Generator that is…
n Dynamically loadable
n Self-describing
n May have its own graphical interface
24