0% found this document useful (0 votes)
28 views24 pages

Audio Concepts, APIs, and Architecture

Uploaded by

luvauva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views24 pages

Audio Concepts, APIs, and Architecture

Uploaded by

luvauva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Week 9 – Audio

Concepts, APIs, and


Architecture

Roger B. Dannenberg
Professor of Computer Science and Art
Carnegie Mellon University

Introduction
n So far, we’ve dealt with discrete, symbolic
music representations
n “Introduction to Computer Music” covers
sampling theory, sound synthesis, audio
effects
n This lecture addresses some system and
real-time issues of audio processing
n We will not delve into any DSP algorithms for
generating/transforming audio samples

2 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

1
Overview
n Audio Concepts
Samples
n
Frames
n
n Blocks
n Synchronous processing
n Audio APIs
n PortAudio
n Callback models
n Blocking API models
n Scheduling
n Architecture
n Unit generators
n Fan-In, Fan-Out
n Plug-in Architectures

3 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

Audio Concepts
n Audio is basically a stream of signal amplitudes

n Typically represented
n Externally as 16-bit signed integer: +/- 32K
n Internally as 32-bit float from [-1, +1]
n Floating point gives >16bit precision
n And “headroom”: samples >1 are no problem as long as later, something
(e.g. a volume control) scales them back to [-1, +1]
n Fixed sample rate, e.g. 44100 samples/second (Hz)
n Many variations:
n Sample rates from 8000 to 96000 (and more)
n Can represent frequencies from 0 to ½ sample rate
n Sample size from 8bit to 24bit integer, 32-bit float
n About 6dB/bit signal-to-noise ratio
n Also 1-bit delta-sigma modulation and compressed formats

4 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

2
Multi-Channel Audio
n Each channel is an
independent audio signal
n Each sample period now
has one sample per channel
n Sample period is called an
audio frame
n Formats:
n Usually stored as interleaved data
n Usually processed as independent, non-interleaved arrays
n Exception: Since channels are often correlated, there are
special multi-channel compression and encoding techniques,
e.g. for surround sound on DVDs.

5 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

Block Processing Reduces


Overhead
n Example task: convert stereo to mono with
scale factor System call per
frame
n Naïve organization:
read frame into left and right Load scale and
output = scale * (left + right) locals to registers
write output

n Block processing organization


read 64 interleaved frames into data
for (i = 0; i < 64; i++) {
output[i] = scale * (data[i*2] + data[i*2 + 1]);
}
write 64 output samples

6 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

3
Audio is Always Processed
Synchronously
Read frames
Interleaved to
non-interleaved
Sometimes
described as a
data-flow process: Audio effect Audio effect
each box accepts
block(s) and
outputs block(s) at Gain, etc. Gain, etc.
block time t.

No samples may Non-interleaved


be dropped or to interleaved
duplicated (or else Write frames
distortion will
result)
7 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

Audio Latency Is Caused (Mostly)


By Sample Buffers
n Samples arrive every 22υs or so
n Application cannot wake up and run once for each
sample frame (at least not with any efficiency)
n Repeat:
n Capture incoming samples in input buffer while taking
output samples from output buffer
n Run application: consume some input, produce some
output
n Application can’t compute too far ahead (output
buffer will fill up and block the process).
n But Application can fall too far behind (input buffer
overflow, output buffer underflow) – bad!

8 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

4
Latency/Buffers Are Not
Completely Bad
n Of course, there’s no reason to increase
buffer sizes just to add delay (latency) to
audio!
n What about reducing buffer sizes?
n Very small buffers (or none) means we cannot
benefit from block processing: more CPU load
n Small buffers (~1ms) lead to underflow if OS
does not run our application immediately after
samples become available.
n Blocks and buffers are a “necessary evil”
9 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

There Are Many Audio APIs


n Every OS has one or more APIs:
n Windows: WinMM, DirectX, ASIO, Kernel Streaming
n Mac OS X: Core Audio
n Linux: ALSA, Jack

n APIs exist at different levels


n Device driver – interface between OS and hardware
n System/Kernel – manage audio streams, conversion,
format
n User space – provide higher-level services or
abstractions through a user-level library or server
process

10 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

5
Buffering Schemes
n Hardware buffering schemes include:
n Circular Buffer
n Double Buffer
n Buffer Queues
n these may be reflected in the user level API
n Poll for buffer position, or get interrupt or callback
when buffers complete
n What’s a callback?
n Typically audio code generates blocks and you care
about adapting block-based processing to buffer-
based input/output. (It may or may not be 1:1)

11 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

Latency in Detail
n Audio input/output is strictly synchronous and
precise (to < 1ns)
n Therefore, we need input/output buffers
n Assume audio block size = b samples
n Computation time r sample times
n Assume pauses up to c sample periods
n Worst case:
n Wait for b samples – inserts a delay of b
n Process b samples in r sample periods – delay of r
n Pause for c sample periods – delay of c
n Total delay is b + r + c sample periods

12 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

6
Latency In Detail: Circular Buffers
n Assumes sample-by-sample processing
n Audio latency is b + r + c sample periods
n In reality, there are going to be a few samples of buffering or
latency in the transfer from input hardware to application
memory and from application memory to output hardware.
n But this number is probably small compared to c
n Normal buffer state is: input empty, output full

n Worst case: output buffer almost empty

n Oversampling A/D and D/A converters can add 0.2 to 1.5ms


(each)

13 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

Latency In Detail: Double Buffer


n Assumes block-by-block processing
n Assume buffer size is nb, a multiple of block size
n Audio latency is 2nb sample periods
Input to buffer
Process buffer
Output from buffer
2nb

n How long to process one buffer (worst case)?


n How long do we have?

14 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

7
Latency In Detail: Double Buffer
n Assumes block-by-block processing
n Assume buffer size is nb, a multiple of block size
n Audio latency is 2nb sample periods
Input to buffer
Process buffer
Output from buffer
2nb
n How long to process one buffer (worst case)? nr + c
n How long do we have? nb
n n ≥ c / (b – r)

15 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

Latency In Detail: Double Buffer (2)

n n ≥ c / (b – r)
n Example 1: n Example 2:
n b = 64 n b = 64

n r = 48 n r = 48

n c = 128 n c = 16

n ∴n=8 n ∴n=1

n Audio latency = 2nb = n Audio latency = 2nb =


1024 sample periods 128 sample periods

How does this compare to circular buffer?

16 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

8
Latency In Detail: Buffer Queues

n Assume queue of buffers with b sample each


(buffer size = block size)
n Queues of length n on both input and output
n In the limit, this is same as circular buffers
n In other words, circular buffer of n blocks
n If we are keeping up with audio, state is:
n Audio latency = (n – 1)b
n Need: (n – 2)b > r + c
Input
n ∴ n ≥ (r + c) / b + 2
Output
n Example 1: latency = 256 vs 1024, Ex 2: 128 (same)

17 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

Synchronous/blocking vs
Asynchronous/callback APIs
n Blocking APIs
n Typically provide primitives like read() and write()
n Can be used with select() to interleave with other operations
n Users manage their own threads for concurrency (consider
Python, Ruby, SmallTalk, …)
n Great if your OS threading services can provide real-time
guarantees (e.g. some embedded computers, Linux)
n Callback APIs
n User provides a function pointer to be called when samples
are available/needed
n Concurrency is implicit, user must be careful with locks or
blocking calls
n You can assume the API is doing its best to be real-time

18 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

9
PortAudio: An Abstraction of Audio APIs

n PortAudio wraps multiple Host APIs providing a


unified and portable interface for writing real-time
audio applications
n Main entities:
n Host API – a particular user-space audio API (ie JACK,
DirectSound, ASIO, ALSA, WMME, CoreAudio, etc.)
n PaHostApiInfo, Pa_GetHostApiCount(), Pa_GetHostApiInfo()
n Device – a particular device, usually maps directly to a host
API device. Can be full or half duplex depending on the host
n PaDeviceInfo, Pa_GetDeviceCount(), PaGetDeviceInfo()
n Stream – an interface for sending and/or receiving samples
to an opened Device
n PaStream, Pa_OpenStream(), Pa_StartStream()

n See http://www.portaudio.com
19 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

PortAudio Example:
Generating a Sine Wave
struct TestData {
float sine[TABLE_SIZE];
int phase;
};

static int TestCallback( const void *inputBuffer,


void *outputBuffer, unsigned long framesPerBuffer,
const PaStreamCallbackTimeInfo* timeInfo,
PaStreamCallbackFlags statusFlags, void *userData ) {
TestData *data = (TestData*) userData;
float *out = (float*) outputBuffer;

for (int i=0; i<framesPerBuffer; i++) {


float sample = data->sine[ data->phase++ ];
*out++ = sample; /* left */
*out++ = sample; /* right */
if (data->phase >= TABLE_SIZE)
data->phase -= TABLE_SIZE;
}
return paContinue;
}
20 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

10
PortAudio Example:
Running a Stream (1)
int main(void)
{
TestData data;
for (int i=0; i < TABLE_SIZE; ++i)
data.sine[i] = sin(M_PI * 2 *
((double)i/(double)TABLE_SIZE));
data.phase = 0;

Pa_Initialize();

PaStreamParameters outputParameters;
outputParameters.device = Pa_GetDefaultOutputDevice();
outputParameters.channelCount = 2;
outputParameters.sampleFormat = paFloat32;
outputParameters.suggestedLatency =
Pa_GetDeviceInfo(outputParameters.device)->
defaultLowOutputLatency;
outputParameters.hostApiSpecificStreamInfo = NULL;

...

21 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

PortAudio Example:
Running a Stream (2)
...

PaStream *stream;
Pa_OpenStream(&stream, NULL /* no input */,
&outputParameters,
SAMPLE_RATE, FRAMES_PER_BUFFER, paClipOff /*flags*/,
TestCallback, &data);

Pa_StartStream(stream);

printf("Play for %d seconds.\n", NUM_SECONDS);


sleep(NUM_SECONDS);

Pa_StopStream(stream);
Pa_CloseStream(stream);
Pa_Terminate();
}

22 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

11
Modular Audio Processing

n Unit generators
n Graph evaluation
n Evaluation mechanisms
n Block-based processing
n Vector allocation strategies
n Variations

23 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

Unit Generators
n A sample generating or processing function, and its
accompanying state. e.g. Oscillators, filters, etc.
n A functional view:
n f(state, inputs) à (state, outputs)
n An OOP view:
n Class Ugen{ virtual Update( float*[] ins, float *[] outs ); }
n In a dynamic system, the flow
between units is explicitly
represented by a
“synchronous dataflow graph”

24 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

12
Graph Evaluation
n Generators which produce signals must be evaluated before the
generators which consume those signals*, therefore: execute in
a depth-first order starting from sinks. (1) (2)
(4)
n Note: depth-first implies sinks are
the last to evaluate in any graph
traversal. (3) (5)

(6)
*Why?
*Or else, outputs from generator will not be considered until the next “pass”,
introducing a one-block delay, or even worse, if outputs go to reusable
memory buffers, output could be overwritten.

25 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

Evaluation Mechanisms

n Direct graph traversal (using topological sort


algorithm)
n Simple, dynamic
n Can't modify the graph while evaluating
(1) (2)
(4)

(3) (5)

(6)

26 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

13
Topological Sort

class Ugen
var block_num
var inputs

def update(new_block_num)
if new_block_num > block_num
for input in inputs
input.update(new_block_num)
really_update() // virtual method
block_num = block_num + 1

Question: Why not just ask each block to update/compute its


ancestors before running its own update/compute method
instead of messing with block numbers and “if” tests?
27 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

Evaluation Mechanisms (2)

n Execution sequence (list of function pointers,


polymorphic object pointers, bytecodes)
n Possibly more efficient, harder to modify
n Decouples evaluation from traversal. Graph
can be modified during traversal; later
sequence/program must be computed again.
n Essentially the same topological sort algorithm
is used, but traversal order is stored as a
sequence or program.

28 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

14
Block-Based Processing
n Process arrays of input samples and produces arrays
of output samples
n Pros: more efficient (common subexpressions,
register loads, indexing, cache line prefetching, loop
unrolling, SIMD etc)
n Cons: latency, feedback loops incur blocksize delay
n Vector size:
n fixed (c.f. Csound k-rate, Aura)
n Variable with upper bound

29 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

Variable Block Size


n Rarely used, but this is a good topic to test your understanding of unit
generator implementation
n Imagine fixed block size of N and every UG has an inner sample
computation loop that runs N times; samples are written to output arrays
that hold N samples.
n Now imagine that N is a variable. If the next “event” – some parameter
update – is scheduled 5 samples after the start time of the next block, we
set N to 5 and all the UGs compute 5 samples. (Remember that all
computation is synchronous, so all UGs have the same number of input
and output samples.)
n After running all the UGs, we get 5 samples of output, do the event/
update, and then compute the next value of N.
n We limit N to an upper bound to avoid reallocating buffers of memory that
hold samples. These stay at some fixed size N_MAX.
n Main drawback: closely spaced events/updates impact efficiency, so
performance is less predictable.
30 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

15
Buffer Allocation Strategies
n 1) One buffer/vector per generated signal, i.e. for
every Unit Generator output.
n 2) Reuse buffers once all sinks have consumed them
(c.f. Graph coloring register allocation)
n Dannenberg’s measurements indicate this is wasted
effort
n Buffers are relatively small
n Cache is relatively big
n DSP is relatively expensive compared to (relatively
few) cache faults
n So speedup from buffer reuse (2) is insignificant

31 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

Feedback

n Don't visit a node more than once during


graph traversal

n Save output from previous evaluation pass so


it can be consumed during next evaluation

32 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

16
Variations on Block-Based
Processing
n Hierarchical block sizes e.g. process subgraphs with
smaller blocks to reduce feedback delay
n Synchronous multi-rate: separate evaluation phases
using the same or different graphs (e.g. Csound
krate/arate passes).
n Or support signals with one sample per block time:
“Block-rate” UGs have no inner loop and support a
sample rate of
BLOCK_SR = AUDIO_SR / BLOCKSIZE.
n Combine synchronous dataflow graph for audio with
asynchronous message processing for control (e.g.
Max/MSP)
33 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

Audio Plug-Ins
n A plug-in is a software object that can extend
the functionality of an audio application, e.g.
an editor, player, or software synthesizer.
n Effectively a plug-in is a unit generator:
n audio inputs
n audio outputs
n parametric controls
n Plug-ins are
n dynamically loadable and
n self-describing

34 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

17
VST Plug-Ins
n Proprietary spec: Steinberg
n Commonly used and widely supported
n Multiplatform:
n Windows (a multithreaded DLL)
n Mac OS-X (a bundle)
n Linux (sort-of)
n Uses WINE (Windows emulation)
n Kjetil Matheussen's original vstserver,
n The fst project from Paul Davis and Torben Hohn,
n Chris Cannam's dssi-vst wrapper plugin

35 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

Example VST GUI

jack_fst running the Oberon VSTi synth

36 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

18
VST Conventions

n Host calls plug-in, sets up input buffers and


controls buffer size and when processing is
performed
n process(): must be implemented, output is
added to the output buffer
n processReplacing(): optional, output
overwrites data in output buffer
n Parameters range: 0.0 to 1.0 (32-bit float)
n Audio samples: -1.0 to +1.0 (32-bit float)

37 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

Example Code
AGain::AGain(audioMasterCallback audioMaster)
: AudioEffectX(audioMaster, 1, 1) // 1 program, 1 parameter only
{ fGain = 1.; // default to 0 dB
setNumInputs(2); // stereo in
setNumOutputs(2); // stereo out
setUniqueID('Gain'); // identify
canMono(); // makes sense to feed both inputs the same signal
canProcessReplacing (); // supports both accumulating and replacing
strcpy(programName, "Default"); // default program name
}

AGain::~AGain() { } // nothing to do here

void AGain::setProgramName(char *name)


{ strcpy(programName, name);
}

void AGain::getProgramName(char *name)


{ strcpy (name, programName);
}
38 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

19
Example Code (2)
void AGain::setParameter(long index, float value)
{ fGain = value;
}

float AGain::getParameter(long index)


{ return fGain;
}

void AGain::getParameterName(long index, char *label)


{ strcpy(label, "Gain"); // default max string length is 24 (!)
}

void AGain::getParameterDisplay(long index, char *text)


{ dB2string(fGain, text);
}

void AGain::getParameterLabel(long index, char *label)


{ strcpy(label, "dB");
}

39 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

Example Code (3)


bool AGain::getEffectName(char* name)
{ strcpy(name, "Gain");
return true;
}

bool AGain::getProductString(char* text)


{ strcpy(text, "Gain");
return true;
}

bool AGain::getVendorString(char* text)


{ strcpy(text, "Steinberg Media Technologies");
return true;
}

40 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

20
Example Code (4)
void AGain::process(float **inputs, float **outputs,
long sampleFrames)
{
float *in1 = inputs[0];
float *in2 = inputs[1];
float *out1 = outputs[0];
float *out2 = outputs[1];

while (--sampleFrames >= 0)


{
(*out1++) += (*in1++) * fGain; // accumulating
(*out2++) += (*in2++) * fGain;
}
}

41 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

Example Code (5)

void AGain::processReplacing(float **inputs, float **outputs,


long sampleFrames)
{
float *in1 = inputs[0];
float *in2 = inputs[1];
float *out1 = outputs[0];
float *out2 = outputs[1];

while (--sampleFrames >= 0)


{
(*out1++) = (*in1++) * fGain; // replacing
(*out2++) = (*in2++) * fGain;
}
}

42 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

21
VST on the Host Side
typedef AEffect *(*mainCall)(audioMasterCallback cb);
audioMasterCallback audioMaster;
void instanciatePlug(mainCall plugsMain)
Assume host loaded
{ AEffect *ce = plugsMain (&audioMaster); plugin and has its main
if (ce && ce->magic == AEffectMagic) { .... }
}
------ the main() routine in the plugin (DLL): -------
AEffect *main(audioMasterCallback audioMaster)
{ // check for the correct version of VST
if (!audioMaster(0,audioMasterVersion,0,0,0,0)) return 0;
ADelay* effect = new ADelay(audioMaster); // Create the AudioEffect
if (!effect) return 0;
if (oome) { // Check if no problem in constructor of AGain
delete effect;
return 0;
}
return effect->getAeffect(); // return C interface of our plug-in
}

43 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

More VST
n Program = full set of parameters
n Bank = set of programs (user can call up preset)
n Interactive Interfaces
n Host can construct editor based on text:
n Parameter name, display, label – “Gain: -6 dB”

n Plug-In can open a window and make a GUI


n Plug-In can use VSTGUI library to make a cross-
platform GUI
n VSTi – plug-in instruments (synthesizers)
n Plug-In has API for receiving MIDI events

44 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

22
LADSPA – Linux Audio Developers’
Simple Plugin Architecture
n the plugin library is loaded (using a system-specific method like
dlopen or for glib, gtk+ users, g_module_open).
n the plugin descriptor is obtained using the plugin library's
ladspa_descriptor function, which may allocate memory.
n the host uses the plugin's instantiate function to allocate a new
(or several new) sample-processing instances.
n the host must connect buffers to every one of the plugin's ports.
It must also call activate before running samples through the
plugin.
n the host processes sample data with the plugin by filling the
input buffers it connected, then calling either run or run_adding.
The host may reconnect ports with connect_port as it sees fit.
n the host deactivates the plugin handle. It may opt to activate and
reuse the handle, or it may destroy the handle.
n the handle is destroyed using the cleanup function.
n the plugin is closed. Its _fini function is responsible for
deallocating memory.
45 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

Summary
n Audio samples, frames, blocks
n Synchronous processing:
n Never skip or duplicate samples
n Buffers are essential
n Latency comes (mostly) from buffer length
n PortAudio
n Host API
n Device
n Stream

46 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

23
Summary (2)
n Modular Audio Processing
n Unit Generator
n Networks of Unit Generators
n Synchronous Dataflow

n Plug-ins
n VST example
n Unit Generator that is…
n Dynamically loadable
n Self-describing
n May have its own graphical interface

47 Carnegie Mellon University ⓒ 2019 by Roger B. Dannenberg

24

You might also like