TarsosDSP, a Real-Time Audio Processing
Framework in Java
Joren Six1,2 , Olmo Cornelis2 , Marc Leman1
1 University
Ghent, IPEM, Sint-Pietersnieuwstraat 41, 9000, Gent, Belgium
2 University
College Ghent, School of Arts, Jozef Kluyskensstraat 2, 9000 Gent, Belgium
Correspondence should be addressed to Joren Six (joren.six@ugent.be)
ABSTRACT
This paper presents TarsosDSP, a framework for real-time audio analysis and processing. Most libraries and
frameworks offer either audio analysis and feature extraction or audio synthesis and processing. TarsosDSP
is one of a only a few frameworks that offers both analysis, processing and feature extraction in real-time, a
unique feature in the Java ecosystem. The framework contains practical audio processing algorithms, it can
be extended easily, and has no external dependencies. Each algorithm is implemented as simple as possible
thanks to a straightforward processing pipeline. TarsosDSP’s features include a resampling algorithm, onset
detectors, a number of pitch estimation algorithms, a time stretch algorithm, a pitch shifting algorithm, and
an algorithm to calculate the Constant-Q. The framework also allows simple audio synthesis, some audio
effects, and several filters. The Open Source framework is a valuable contribution to the MIR-Community
and ideal fit for interactive MIR-applications on Android.
1. INTRODUCTION
Frameworks or libraries1 for audio processing can be divided into two categories.The first category offers audio
analysis and feature extraction. The second category offers audio synthesis capabilities. Both types may or may
not operate in real-time. Table 1 shows a partial overview
of notable audio frameworks. It shows that only a few
frameworks offer real-time feature extraction combined
with synthesis capabilities. To the best of the authors’
knowledge, TarsosDSP is unique in that regard within
the Java ecosystem. The combination of real-time feature extraction and synthesis can be of use for music education tools or music video games. Especially for development on the Android platform there is a need for
such functionality.
TarsosDSP also fills a need for educational tools for
Music Information Retrieval. As identified by Gomez
in [14], there is a need for comprehensible, welldocumented MIR-frameworks which perform useful
tasks on every platforms, without the requirement of a
costly software package like Matlab. TarsosDSP serves
1 The distinction between library and framework is explained in [2].
In short, a framework is an abstract specification of an application
whereby analysis and design is reused, conversely when using a (class)
library code is reused but a library does not enforce a design.
this educational goal, it has already been used by several
master students as a starting point into music information
retrieval[5, 32, 28].
The framework tries to hit the sweet spot between being capable enough to get real tasks done, and compact
enough to serve as a demonstration for beginning MIRresearchers on how audio processing works in practice.
TarsosDSP therefore targets both students and more experienced researchers who want to make use of the implemented features.
After this introduction a section about the design decisions made follows, then the main features of TarsosDSP
are highlighted. Chapter four is about the availability of
the framework. The paper ends with a conclusion and
future work.
2. DESIGN DECISIONS
To meet the goals stated in the introduction a couple of
design decisions were made.
2.1. Java based
TarsosDSP was written in Java to allow portability from
one platform to another. The automatic memory management facilities are a great boon for a system implemented
in Java. These features allow a clean implementation
AES 53RD INTERNATIONAL CONFERENCE, London, UK, 2014 January 27–29
1
Six
TarsosDSP, a Real-Time Audio Processing Framework in Java
Name
Aubio [7]
CLAM [3]
CSL [23]
Essentia [6]
Marsyas [29]
SndObj [16]
Sonic Visualizer [11]
STK [25]
Tartini [19]
YAAFE [17]
Beads[21]
JASS[30]
jAudio [18]
Jipes
jMusic[8]
JSyn [10]
Minim[22]
TarsosDSP
Analysis
True
True
True
True
True
True
True
False
True
True
False
False
True
True
False
False
False
True
Synthesis
False
True
True
False
True
True
False
True
False
False
True
True
False
False
True
True
True
True
Real-Time
True
True
True
True
False
True
False
True
True
False
True
True
False
False
False
True
True
True
Technology
C
C
C++
C++
C++
C++
C++
C++
C++
C++
Java
Java
Java
Java
Java
Java
Java
Java
Table 1: A table with notable audio frameworks. Only a few frameworks offer real-time feature extraction and audio
synthesis capabilities. According to the research by the authors, in the Java ecosystem only TarsosDSP offers this
capability.
of audio processing algorithms. The clutter introduced
by memory management instructions, and platform dependent ifdef’s typically found in C++ implementations are avoided. The Dalvik Java runtime enables to
run TarsosDSP’s algorithms unmodified on the Android
platform. Java or C++ libraries are often hard to use
due to external dependencies. TarsosDSP has no external dependencies, except for the standard Java Runtime.
Java does have a serious drawback, it struggles to offer
a low-latency audio pipeline. If real-time low-latency is
needed, the environment in which TarsosDSP operates
needs to be optimized, e.g. by following the instructions
found in [1].
2.2. Processing pipeline
The processing pipeline is kept as simple as possible.
Currently, only single channel audio is allowed, which
helps to makes the processing chain extremely straightforward2 . A schematic representation can be found in
Figure 1. The source of the audio is a file, a microphone,
2 Actually multichannel audio is accepted as well, but it is automatically downmixed to one channel before it is send through the processing pipeline
or an optionally empty stream. The AudioDispatcher
chops incoming audio in blocks of a requested number of samples, with a defined overlap. Subsequently
the blocks of audio are scaled to a float in the range
[-1,1]. The wrapped blocks are encapsulated in an
AudioEvent object which contains a pointer to the audio, the start time in seconds, and has some auxiliary
methods, e.g. to calculate the energy of the audio block.
The AudioDispatcher sends the AudioEvent through
a series of AudioProcessor objects, which execute an
operation on audio. The core of the algorithms are contained in these AudioProcessor objects. They can e.g.
estimate pitch or detect onsets in a block of audio. Note
that the size of a block of audio can change during the
processing flow. This is the case when a block of audio is stretched in time. For more examples of available
AudioProcessor operations see section 3. Figure 2.2
shows a processing pipeline. It shows how the dispatcher
chops up audio and how the AudioProcessor objects
are linked. Also interesting to note is line 8, where an
anonymous inner class is declared to handle pitch estimation results. The example covers filtering, analysis,
effects and playback. The last statement on line 23 boot-
AES 53RD INTERNATIONAL CONFERENCE, London, UK, 2014 January 27–29
Page 2 of 7
Six
TarsosDSP, a Real-Time Audio Processing Framework in Java
Audio
Dispatcher
Event
Processor
Output
Fig. 1: A schematic representation of the TarsosDSP processing pipeline. The incoming audio (left) is divided into
blocks which are encapsulated in Event objects by the Dispatcher. The event objects flow through one or more
Processor blocks, which may or may not alter the audio and can generate output (e.g. pitch estimations). Dotted lines
represent optional flows.
01:
02:
03:
04:
05:
06:
07:
08:
09:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
//Get an audio stream from the microphone, chop it in blocks of 1024 samples, no overlap (0 samples)
AudioDispatcher d = AudioDispatcher.fromDefaultMicrophone(1024, 0);
float sr = 44100;//The sample rate
//High pass filter, let everything pass above 110Hz
AudioProcessor highPass = new HighPass(110,sr);
d.addAudioProcessor(highPass);
//Pitch detection, print estimated pitches on standard out
PitchDetectionHandler printPitch = new PitchDetectionHandler() {
@Override
public void handlePitch(PitchDetectionResult pitchDetectionResult,AudioEvent audioEvent) {
System.out.println(pitchDetectionResult.getPitch());
}
};
PitchEstimationAlgorithm algo = PitchEstimationAlgorithm.YIN; //use YIN
AudioProcessor pitchEstimator = new PitchProcessor(algo, sr,1024,printPitch);
d.addAudioProcessor(pitchEstimator);
//Add an audio effect (delay)
d.addAudioProcessor(new DelayEffect(0.5,0.3,sr));
//Mix some noise with the audio (synthesis)
d.addAudioProcessor(new NoiseGenerator(0.3));
//Play the audio on the loudspeakers
d.addAudioProcessor(new AudioPlayer(new AudioFormat(sr, 16, 1, true,true)));
d.run();//starts the dispatching process
Fig. 2: A TarsosDSP processing PipeLine. Here, pitch estimation on filtered audio from a microphone sample session
is done in real-time. A delay audio effect is added and some noise is added to the audio before it is played back. The
example covers filtering, analysis, audio effects, synthesis and playback.
AES 53RD INTERNATIONAL CONFERENCE, London, UK, 2014 January 27–29
Page 3 of 7
Six
TarsosDSP, a Real-Time Audio Processing Framework in Java
Fig. 3: A visualization of some of the features that can be extracted using TarsosDSP: a waveform (top panel, black),
onsets (top panel, blue), beats (top panel, red), a Constant-Q spectrogram (bottom panel, gray), and pitch estimations
(bottom panel, red). The source code for the visualisation is part of the TarsosDSP distribution as well.
straps the whole process.
2.3. Optimizations
TarsosDSP serves an educational goal, therefore the implementations of the algorithms are kept as pure as possible, and no obfuscating optimizations are made. Readability of the source code is put before its execution
speed, if algorithms are not quick enough users are invited to optimize the Java code themselves, or look for
alternatives, perhaps in another programming language
like C++. This is a rather unique feature of the TarsosDSP framework, other libraries take a different approach.
jAudio [18] and YAAFE[17] for example reuse calculations for feature extraction, this makes algorithms more
efficient, but also harder to grasp. Other libraries still,
like SoundTouch3 , carry a burden by being highly optimized - with assembler code - and by having a large
history. These things tend to contribute to less readable
code, especially for people new in the field.
3. IMPLEMENTED FEATURES
In this chapter the main implemented features are highlighted. Next to the list below, there are boiler-plate features e.g. to adjust gain, write a wav-file, detect silence,
following envelope, playback audio. Figure 3 shows a
3 http://www.surina.net/soundtouch/ SoundTouch, by Olli Parviainen, is an open-source audio processing library.
visualization of several features computed with TarsosDSP.
• TarsosDSP was originally conceived as a library for
pitch estimation, therefore it contains several pitch
estimators: YIN [12], MPM [20], AMDF[24]4 , and
an estimator based on dynamic wavelets[15]. There
are two YIN implementations, one remains within
the comforts of the time domain, the other calculates convolution in the frequency domain5 .
• Two onset detectors are provided. One described in
[4], and the one used by the BeatRoot system[13].
• The WSOLA time stretch algorithm[31], which allows to alter the speed of an audio stream without
altering the pitch is included. On moderate time
stretch factors - 80%-120% of the original speed only limited audible artifacts are noticeable.
• A resampling algorithm based on [27] and the related open source resample software package6 .
4 Partial
5 The
implementation provided by Eder Souza
YIN FFT implementation was kindly contributed by Matthias
Mauch
6 Resample 1.8.1 can be found on the digital audio resampling home
page https://ccrma.stanford.edu/ jos/resample/, maintained by Julius O.
Smith
AES 53RD INTERNATIONAL CONFERENCE, London, UK, 2014 January 27–29
Page 4 of 7
Six
TarsosDSP, a Real-Time Audio Processing Framework in Java
• A pitch shifting algorithm, which allows to change
the pitch of audio without affecting speed, is formed
by chaining the time-stretch algorithm with the resample algorithm.
• As examples of audio effects, TarsosDSP contains
a delay and flanger effect. Both are implemented as
minimalistic as possible.
• Several IIR-filters are included. A single pass and
four stage low pass filter, a high pass filter, and a
band pass filter.
• TarsosDSP also allows audio synthesis and includes
generators for sine waves and noise. Also included
is a Low Frequency Oscillator (LFO) to control the
amplitude of the resulting audio.
• A spectrum can be calculated with the inevitable
FFT or using the provided implementation of the
Constant-Q[9] transform.
3.1. Example Applications
To show the capabilities of the framework, seventeen examples are built. Most examples are small programmes
with a simple user interface, showcasing one algorithm.
They don’t only show which functionality is present in
the framework, but also how to use those in other applications. There are example applications for time stretching,
pitch shifting, pitch estimation, onset detection,. . . Figure
4 shows an example application, featuring the pitch shifting algorithm.
TarsosDSP is used by Tarsos[26] a software tool to analyze and experiment with pitch organization in nonwestern music. It is an end-user application with a graphical user interface that leverages a lot of TarsosDSP’s features. It can be seen as a showcase for the framework.
Fig. 4: A TarsosDSP example application. Most algorithms implemented in TarsosDSP have a demo application with a user interface. Here, the capabilities of a pitch
shifting algorithm are shown.
4. AVAILABILITY AND LICENSE
The source code is available under the GPL license terms
at GitHub:
https://github.com/JorenSix/TarsosDSP.
Contributions are more than welcome. TarsosDSP
releases, the manual, and documentation can be found at
the release directory http://0110.be/releases/TarsosDSP/.
Nightly builds can be found there as well. Other downloads, documentation on the example applications and
background information is available on http://0110.be.
Providing the source code under the GPL license makes
sure that derivative works also need to provide the source
code, which enables reproducibility.
AES 53RD INTERNATIONAL CONFERENCE, London, UK, 2014 January 27–29
Page 5 of 7
Six
TarsosDSP, a Real-Time Audio Processing Framework in Java
5. CONCLUSION
In this paper TarsosDSP was presented. An Open Source
Java library for real time audio processing without external dependencies. It allows real-time pitch and onset
extraction, a unique feature in the Java ecosystem. It also
contains algorithms for time stretching, pitch shifting,
filtering, resampling, effects, and synthesis. TarsosDSP
serves an educational goal, therefore algorithms are implemented as simple and self-contained as possible using
a straightforward pipeline. The library can be used on the
Android platform, as a back-end for Java applications or
stand alone, by using one of the provided example applications. After two years of active development it has
become a valuable addition to the MIR-community.
6. REFERENCES
[1] Real-Time, Low Latency Audio Processing in Java.
In Proceedings of the International Computer Music Conference (ICMC 2007), pages 99–102, 2007.
[2] X. Amatriain. An Object-Oriented Metamodel for
Digital Signal Processing with a focus on Audio
and Music. PhD thesis, 2005.
[3] X. Amatriain, P. Arumı́, and M. Ramı́rez. CLAM,
Yet Another Library for Audio and Music Processing? In Proceedings of 17th Annual ACM Conference on Object-Oriented Programming, Systems,
Languages and Applications (OOPSLA 2002),
2002.
[4] Dan Barry, Derry Fitzgerald, Eugene Coyle, and
Bob Lawlor. Drum Source Separation using Percussive Feature Detection and Spectral Modulation.
In Proceedings of the Irish Signals and Systems
Conference (ISSC 2005), 2005.
[5] Santiago David Davila Benavides. Racioco de
Agentes Musicais Composi Algorica, Vida artificiale Interatividade em Sistemas Multiagentes Musicais. Master’s thesis, Instituto de Matemca e
Estatica - Universidade de Saulo, 2012.
[6] D. Bogdanov, Nicolas Wack, Emilia Gómez,
Sankalp Gulati, P. Herrera, O. Mayor, G. Roma,
J. Salamon, J. Zapata, and Xavier Serra. ESSENTIA: an Audio Analysis Library for Music Information Retrieval. In Proceedings of the 14th International Symposium on Music Information Retrieval (ISMIR 2013), pages 493–498, Curitiba,
Brazil, 04/11/2013 2013.
[7] Paul Brossier. Automatic Annotation of Musical
Audio for Interactive Applications. PhD thesis,
Queen Mary University of London, UK, August
2006.
[8] Andrew R. Brown and Andrew C. Sorensen. Introducing jMusic. In Andrew R. Brown and Richard
Wilding, editors, Australasian Computer Music
Conference, pages 68–76, Queensland University
of Technology, Brisbane, 2000. ACMA.
[9] Judith Brown and Miller S. Puckette. An efficient
algorithm for the calculation of a constant q transform. Journal of the Acoustical Society of America,
92(5):2698–2701, November 1992.
[10] P. Burk. JSyn - A Real-time Synthesis API for Java.
In Proceedings of the 1998 International Computer
Music Conference (ICMC 1998). Computer Music
Associaciation, 1998.
[11] C Cannam, C Landone, M Sandler, and J.P Bello.
The Sonic Visualiser: A Visualisation Platform for
Semantic Descriptors from Musical Signals. In
Proceedings of the 7th International Symposium on
Music Information Retrieval (ISMIR 2006), Victoria, Canada, 2006.
[12] Alain de Cheveigné and Kawahara Hideki. YIN, a
Fundamental Frequency Estimator for Speech and
Music. The Journal of the Acoustical Society of
America, 111(4):1917–1930, 2002.
[13] Simon Dixon. Automatic Extraction of Tempo and
Beat From Expressive Performances. Journal of
New Music Research (JNMR), 30(1):39–58, 2001.
[14] Emilia G. Teaching MIR: Educational Resources
Related To Music Information Retrieval. In Proceedings of the 13th International Symposium on
Music Information Retrieval (ISMIR 2012). International Society for Music Information Retrieval,
2012.
[15] Eric Larson and Ross Maddox. Real-Time TimeDomain Pitch Tracking Using Wavelets. 2005.
[16] Victor Lazzarini.
Sound Processing with the
SndObj Library: An Overview. In Proceedings of
the 4th International Conference on Digital Audio
Effects (DAFX 2001), pages 6–8, 2001.
AES 53RD INTERNATIONAL CONFERENCE, London, UK, 2014 January 27–29
Page 6 of 7
Six
TarsosDSP, a Real-Time Audio Processing Framework in Java
[17] Benoathieu, Slim Essid, Thomas Fillon, Jacques
Prado, and Gaichard. YAAFE, an Easy to Use and
Efficient Audio Feature Extraction Software. In
Proceedings of the 11th International Symposium
on Music Information Retrieval (ISMIR 2010),
pages 441–446. International Society for Music Information Retrieval, 2010.
[18] D. McEnnis, C. McKay, and I. Fujinaga. jAudio: A
Feature Extraction Library. In Proceedings of the
6th International Symposium on Music Information
Retrieval (ISMIR 2005), 2005.
[19] Philip McLeod. Fast, Accurate Pitch Detection
Tools for Music Analysis. PhD thesis, University
of Otago. Department of Computer Science, 2009.
[20] Phillip McLeod and Geoff Wyvill. A Smarter Way
to Find Pitch. In Proceedings of the International
Computer Music Conference (ICMC 2005), 2005.
[21] E.X. Merz. Sonifying Processing: The Beads Tutorial. CreateSpace, 2011.
[22] John Anderson Mills III, Damien Di Fede, and
Nicolas Brix. Music programming in minim. In
Proceedings of the New Interfaces for Musical Expression++ Conference (NIME++), Sydney, Australia, 2010.
[23] Stephen Travis Pope and Chandrasekhar Ramakrishnan. The Create Signal Library (Sizzle): Design,
Issues and Applications. In Proceedings of the 2003
International Computer Music Conference (ICMC
2003), 2003.
[27] Julius O. Smith and Phil Gosset. A Flexible
Sampling-Rate Conversion Method. In Proceedings of the International Conference on Acoustics,
Speech, and Signal Processing (ICASSP 1984), volume 2, 1984.
[28] Thomas Stubbe. Geautomatiseerde vorm- en structuuranalyse van muzikale audio. Master’s thesis,
Universiteit Gent, 2013.
[29] George Tzanetakis and Perry Cook. MARSYAS: a
Framework for Audio Analysis. Organized Sound,
4(3):169–175, December 1999.
[30] K. van den Doel and D. K. Pai. JASS: A Java Audio
Synthesis System for Programmers. In J. Hiipakka,
N. Zacharov, and T. Takala, editors, Proceedings of
the 7th International Conference on Auditory Display (ICAD 2001), pages 150–154, 2001.
[31] Werner Verhelst and Marc Roelands. An OverlapAdd Technique Based on Waveform Similarity
(WSOLA) for High Quality Time-Scale Modification of Speech. In IEEE International Conference on Acoustics Speech and Signal Processing
(ICASSP 1993), pages 554–557, 1993.
[32] Michael Wager. Entwicklung eines Systems zur automatischen Notentranskription von monophonischem Audiomaterial. Master’s thesis, Hochschule
Ausburg, 2011.
[24] M. J. Ross, H. L. Shaffer, A. Cohen, R. Freudberg,
and H. J. Manley. Average Magnitude Difference
Function Pitch Extractor. IEEE Trans. on Acoustics, Speech, and Signal Processing, 22(5):353–
362, October 1974.
[25] Gary P. Scavone and Perry R. Cook. RTMidi,
RTAudio, and a Synthesis Toolkit (STK) UPdate.
In Proceedings of the International Computer Music Conference (ICMC 2005), 2005.
[26] Joren Six, Olmo Cornelis, and Marc Leman. Tarsos, a Modular Platform for Precise Pitch Analysis of Western and Non-Western Music. Journal of
New Music Research, 42(2):113–129, 2013.
AES 53RD INTERNATIONAL CONFERENCE, London, UK, 2014 January 27–29
Page 7 of 7