Perspectives on music, sonification and augmented
instruments
Abstract
This paper discusses two composition works where real instruments have been augmented through a motion capture
system (Phasespace). While playing his instrument using
conventional techniques, the player is also controlling some
other sound effects by moving his hands: the instrument becomes totally new increasing its expressive possibilities and
opening new ways of thinking the composition to the author. The relation between sonification and music will also
be investigated through the above mentioned examples, in
order to outline how the different functions and purposes of
music and sonification can be preserved while operating in
combination within the creation of music.
Keywords
Performance, Sonification, Mapping, Electronic Music
1. INTRODUCTION
The expansion of parametric control has a centuries-long
tradition in musical composition. At the times of Bach,
composers were mainly interested in defining pitches and
rhythm: compositional ideas hardly encompassed other parameters - or not in a structured way to say the least. The
development of music composition as an art form brought
along an increased attention for other parameters: articulation first, then dynamics, then timbre - and more recently,
space (Ref. XXX). Now composers are growing an interest
in using performance gestures as a compositional device to
expand the expressive capabilities of musical instruments
(and performers) beyond what was commonly expected to
date (Ref. YYY). As in all previous expansions, the inclusion of gesture as a compositional parameter has brought
with itself a number of problems, both technical and aesthetic, to be tackled in order for works to reach an artistic
maturity of some value. While the compositional use of
performance gestures is still developing, some case studies
carried out by the authors along with composers and performers can outline an early set of interesting results along
with the issues that have accompanied them.
A recent (and accurate) definition of sonification ((Walker
and Nees, 2011, p.9)) reads
Sonification . . . seeks to translate relationships in
data or information into sound(s) that exploit
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
NIME’14, June 30 – July 03, 2014, Goldsmiths, University of London, UK.
Copyright remains with the author(s).
the auditory perceptual abilities of human beings such that the data relationships are comprehensible.
Thus, sonification is a scientific activity which relates to
auditory display picking up from this latter field all the research and analysis carried out on sound perception.
Music is instead one of the oldest and most pervasive
known artifacts of human kind. The questions concerning its origins and a precise definition for this activity have
known many different stages and highs and lows in reputation among musicologists all along the twentieth century to
end up confined in some very specialised branch of evolutionary musicology (Wallin et al. (1999)).
While sonification is quite precisely defined, music’s characterisation boundaries are simply too loose to be of any
help in trying to make some sense out of the relationship
between the two. Given these definitions of course sonification can be intended as music – just as anything else can.
While the creation of music that can be intended as sonification is more difficult to achieve, and indeed it is hard to
find good reasons to do something like that (there are some,
as we will see, but they are the exception rather than the
rule).
Perhaps a better solution is to resort to the different purposes of sonification and music. True, music may have very
different purposes, but at least these can be confined into
three broad categories:
• rite,
• entertainment, and
• intellectual speculation.
We are confident that these three categories encompass most,
if not all, music activity. On the other hand, sonification
has one very specific purpose: scientific analysis. A major difference appears at last: music is an arbitrary activity
carried out in a generally playful way to stimulate our artistic inclinations (whatever those may be), while sonification
implies a thoroughness which can be constantly scrutinised,
amended and improved using all the scientific conceptual
tooling that we have access to. That is to say, for example,
that “bad sonification” will be easily spotted out by accurate scientific analysis, while “bad music” will always be a
personal judgement matter.
2. PERFORMANCE EXAMPLES
We took into account two musical works in which sonification through gesture control is part of the performance tools
used by composers in a creative way.
2.1
Technical implementation
Composition is not the only musical activity which has explored the possibilities of connecting data sonification with
music production. Most notably, composers and performers have used data coming from 3D tracking of gestures and
body postures during performance to contribute to the final musical output of a given pieces. We will synthetically
describe a couple of cases in this area, emphasising their specific characteristics. Both works are described in XXXX and
both were created using a motion capture system (Impulse
Phasespace) to track the soloist movements. This system is
made out of a variable number of infrared cameras which
can detect the movements of the leds that are placed on the
body part/object that is being tracked. Both works call for
the tracking of hand movements; these happen to move laterally or vertically at both sides of the instrument. In both
cases the performer had a pair of gloves, which featured 4
leds each. The so–called rigid body tracking modality was
used: each hand was considered as a unique rigid body defined by a matrix of positions of every leds in relation to
the first one inserted in the chain of leds. The system detects the center of gravity of that combination through a
data triplet (the xyz coordinates) and the accidental occlusion of one of the leds does not affect the continuity of the
tracking. The tracking is thus very robust and suitable for
live performances. The system can be used with a variable
number (> 2) of cameras. Generally speaking, the larger
the number of cameras (and so the points of view) the better will be the robustness of the system which will be less
sensitive to the particular position which the performer may
assume. In live performances however it is necessary to find
the best compromise that will allow this robustness without
being too invasive from a scenic point of view. In the particular case of the hyperbass flute four cameras were used,
placed on two stands placed symmetrically at each end of
the instrument in use: one of them at 2.30m from the floor,
looking at the performer’s hands from the top, the other on
the ground looking at them from the bottom.
2.2 The psOSCd middleware daemon
When considering the usage of the Phasespace Impulse motion capture system we thought that rather than writing
plugins hooking up directly to the system server for every
live–electronic software under the sun, we would be better
off trying to generalize the hook up mechanism in order to
facilitate the use of the system by different musicians and
live–electronic crews.
Thus, in order to build the simplest, easiest to use and
more versatile connection between the motion capture server
Phasespace Impulse and the software tipically used in live–
electronics settings (such as Cycling74 ’s Max/MSP or Pure
Data), we developed a small, self–contained piece of middelware whose basic (and only) function is to act as an iterface
between between owld (this is the name of the server handling the input coming from the motion capture cameras)
and the services provided by the OSC protocol which can be
found on most professional music software.
camera ring
TCP/IP
OSC
psOSCd
TCP/IP
owld
Figure 1: The placement of psOSCd in the live–
electronics chain
Lacking imagination, this middleware has been called psOSCd
and its position in the live–electronics chain is represented
in Fig.1.
Technically speaking, psOSCd is a daemon 1 which can be
compiled and run on any multi–tasking unix–like operating
system. It acts as a double client, requesting connections
to two servers at once: on one side it connects to the owld
Phasespace server using the latter proprietary API library,
on the other side it will try to connect to any existing OSC
server running within any Max/MSP, Pure Data, csound
or SuperCollider application. To connect to these applications it uses the API of the Free Software library liblo (the
lightweight OSC library.
psOSCd operates by means of a configuration file which is
read upon startup and which defines:
• service addresses and port numbers
• Phasespace Impulse system configuration (n. of cameras, camera IDs, etc.)
• OSC tags
• single marker mappings
• rigid body 2 mappings
psOSCd may be run on the same server which runs the
owld daemon, or on one of the computers which handle the
live–electronics, or on a third separate computer (even in a
remote location accessible through the internet, though the
bandwith requirements may be quite heavy) — the TCP/IP
connections give complete location transparence to its positioning.
2.3
Ogni Emozione dell’Aria , by Claudio Ambrosini
Ogni Emozione dell’Aria (2011) is a work for clarinet and
live–electronics by Italian composer Claudio Ambrosini. In
Ogni Emozione dell’Aria, both hands of the clarinet performer are tracked by a real–time motion capture system in
order to control the live electronics processing. The score
calls for specific movements of the player (i.e. opening arms)
and the movement data captured by the system is used
to map the position of sound in space and to add expressive intentions and new layers to the composition. In this
work, each hand is seen as a single independent body: the
left hand controls the location and movement of sound in
space while the right one is connected to timbral effects (i.e.
harmonising, non–linear distortion, etc.). Performance gestures are thus available to the composer who selects them
and notates them precisely in the score in order to replicate
performances in a deterministic way. At the same time,
these new compositional parameters (gestural movements)
preserve the natural inclination of musical expression to be
adapted to individual performance aesthetics (what is generally called musical interpretation).
Delving into technical details, the live–electronics processing has been made using MAX/msp where two main
signal processing strategies have been developed: Dissolution A and Dissolution B. Dissolution A refers to the spectral processing of the clarinet sound through a threshold
FFT . Every spectral band is resynthesized when its amplitude is inside a given range delimited by two threshold
values (upper and lower). The bands that are resynthesized
can have an altered amplitude envelope (through the application of an attack and a decay transient); its pitch can be
1
A daemon is a software that, typically, runs as a background application by shutting down any standard I/O with
the external world and establishing contact with other specific applications solely through network sockets.
2
rigid body constructs are a Phasespace abstraction which
define a collection of four or more markers to be considered together always maintaining the same relative positions among them. This allows for better tracking of given
body parts (hands, arms, legs).
altered too through transposition. A ring modulation with
a 3 kHz carrier can be further added to the altered sound,
filtered with the same frequency cut–off through a second–
order low pass filter. The sum of these two signals is then
filtered by a high-pass shelving filter which can enhance or
attenuate the high frequency zones.
A particular example of Dissolution A is shown in Figure
2: in this case, the right hand is controlling while the left
one is playing. The X value of the right hand is controlling
Figure 2: Ogni Emozione dell’Aria, score at pag. 6.
Dissolution A is used here: “The right hand seems
to help the sound of the clarinet to come out and
then back in again”.
the output level, the Y value is controlling the transposition, the Z value is controlling the left-right spatialisation
and the M value (its modulus) controls the front–rear spatialisation. The X value of the right hand is controlling the
Figure 3: Ogni Emozione dell’Aria, score at pag 9.
Dissolution B is used in this case.
output level, In Dissolution B the clarinet sound is granularized through an FFT. The spectrum is first transposed
and then reduced to a sequence of sound grains realised
with an random selection of a few spectral bands which is
renewed with a period of 72 ms (micro-Mel). An example
of Dissolution B is shown in Figure 3.
The movement of sound in space is also a really important part of the sound processing: the sound of the contrabass clarinet is placed in space as if it was a point on
a Cartesian plane with axes left-right and front-rear. The
right-left dimension is managed through a linear mapping
between gesture and result, while the front-rear control is
constructed with a so called “rubber band algorithm”. The
gesture sends the sound away to the rear position; the sound
comes back slowly to a rest position unless there are new
upcoming sounds creating a new tension sending it again far
away. Performance gestures are thus available to the composer who then selects and notates them precisely in the
score in order to replicate performances in a deterministic
way.
Therefore, Ogni Emozione dell’Aria succeeds in transforming sonification in genuine musical processes (a complete video of the performance at the Sound and Music
Computing conference in Padova can be found in Ambrosini
(2011)).
2.4
Suono Sommerso by Roberto Fabbriciani
The genesis of this work began when noted Italian flute
player Roberto Fabbriciani intended to explore the expressive possibilities of the hyperbass flute. This instrument was
invented by Fabbriciani in the eighties following suggestions
by composer Luigi Nono. The peculiar property of the instrument is to be able to play very low frequencies, around
20-30Hz. It is a very large instrument made by plastic pipes
and it can be tuned to just one note at a time. That is the
main way it has been scored for in large orchestral works,
where it was used as a sort of pedal note or choir (cf. for
example La Pietra di Diaspro by Adriano Guarnieri). The
player only needs to hold the instrument with his hands,
all the sound he is producing is coming from the air of his
lungs and can hardly be rapidly modulated. Roberto Fabbriciani wanted to explore the possibilities of having such
an instrument as a solo player, able to be expressive and intense. In order to do so, a motion capture system has been
used to detect the positions of the hands that could control
some live processing which adds several layers of spectral
expansion, distortion, and pitch transposition. Other specific gestures are used to move sound in space through a
spatialisation system. (Fabbriciani et al. (2011)) is a short
excerpt of this work which illustrates these concepts. In
this case, data sonification represents a true instrumental
extension which augments the capabilities of a specific instrument, thus making it suitable for solo performances and
recitals. The hands movements have been associated with
pitch, timbre and spatialisation controls. The right hand
movement was associated to pitch and timbre control.
The movement of the right hand along the X axis (high low pitch): sound transposition in a two-octave range. The
played note can be transposed one octave up (the hand is
moved to the right) or one octave down (the hand is moved
to the left). The movement of the right hand along the Y
axis (low-high) controls the timbral brightness. The played
note is unchanged (low position) and becomes brighter if
the hand is placed higher. The movement of the right hand
along the Z axis (rear front): sound inharmonicity. The
played note is unchanged (behind position) and becomes
more inharmonic while moving the hand forward (towards
the public). The movement of the left hand was associated
to the control of the sound spatialisation. The movement of
the left hand along the X axis (right-left) controls the leftright spatialisation (from the point of view of the listener).
The movement of the left hand along the Z axis (rearfront) controls the front-rear spatialisation. The right foot
is used to push a pedal that activates a bank of delay lines
that extend and multiply the sounds. This bank is made
by 5 delays with feedback with the following delay times:
3, 3.8, 4.7, 6.3, 7 seconds. (Fabbriciani et al. (2011)) is a
short excerpt of this work which illustrates these concepts.
In this case, data sonification represents a true instrumental extension which augments the capabilities of a specific
instrument, thus making it suitable for solo performances
and recitals.
3. DISCUSSION
When it comes to performance, it should be noted that
“true” sonification of instrumental gesture is already a well–
established technique that is used for several applications,
ranging from physiotherapy (cf.Rosati et al. (2012)) to instrumental pedagogy (cf.Ng and Nesi (2008); Bradshaw and
Ng (2009)). However, when sonification is used in performance its usage boils down to three fundamental schemes:
1. the sonification of non–instrumental gestures which
augments the actual playing
2. the sonification of extra–instrumental gestures, added
by the composer to enhance the polyphony of the piece
3. the direct sonification of specific instrumental gestures
The hyper–bass flute improvisations by Roberto Fabbriciani (cf.2.4) clearly fall into case n.1: the hyper–bass flute
is an instrument than needs only the mouth to be played,
while the hands remain free from (direct) performance duties. Fabbriciani can then use his hands (tracked by motion
capture) to control the overall live–electronics processing of
the sound. The sonification establishes here a strong visual
(and causal) connection to the resulting sound which is a far
better option both for the performer and the public than a
separate live–electronics performer idly sitting at a console
moving faders and pushing buttons.
Ogni Emozione dell’Aria by Claudio Ambrosini belongs
instead to category n.2. The form of the piece is divided
in sections, and the instrumental writing is designed to allow the performer to take turns as to which hand is actually playing the keys of the instrument, while the other is
kept free to add a further contrapuntal voice in the performance. In the last section the performer does not need
the hands on the instrument at all, thus adding two other
sources of voicing in the music. Of course, in this case sonification enables the composer to add a metaphorical and
dramaturgical layer through these gestures; in the case of
Ogni Emozione dell’Aria, the sonified gestures build up to
represent the wings of a flying bird – while continuing to
serve musically through the sonified capture of the wrists’
movements.
The third case is more common in the music literature: it
can be found, for example, in pieces by Adriano Guarnieri
or in the improvisations by Giancarlo Schiaffini on trombone. This latter instrument actually provides a good case
in point for case n.3, because the gestural component of its
instrumental playing (i.e. the movement of the coulisse) is
particularly well suited for tracking and successive processing.
3.1
Overall remarks
The two works that we have analysed in this paper make
use of sonification practices and tools but their purposes
are strictly musical – they would not be used as a scientific
display of any sort simply because they do not fit any particular scientific criterion in their construction. We could
probably repeat the experiment with sonification displays
which might have a “musical bend” but it is foreseeable that
the end result would not change (though opposed in sign),
because the purposes of music and sonification are substantially different.
It is interesting to clarify, then, what it takes to obtain a
successful result in combining together sonification and music: the data needs to have some “musical” qualities which
must lend themselves to an easy mapping into a musical
work; while just about anything today can be transformed
into a sound event, the “archetypal“ characteristics of music
(imitation, motivic development, counterpoint, etc.) and
their proper ”musical timing“ are actually the critical aspects to make it ”palatable as music“ of some sort. If the
mapping is not straightforward enough, it looses the possibility of being fed back into the sonification function.
4. CONCLUSIONS
This paper intends to be a contribution to the controversial
debate regarding the boundaries of two specific disciplines,
namely sonification and music composition. Our intention
was not to give a definitive answer to whether or not these
two disciplines do actually have anything in common, but
rather to try to enumerate the conditions under which such
communion can take place replacing a naı̈ve generalisation
with some sort of preliminary elaboration and observation.
References
Ambrosini, C. 2011. “Ogni Emozione dell’Aria.” http://
youtu.be/-1Ml9rPnbiY, retrieved May 24, 2013.
Bradshaw, D., and K. Ng. 2009. “Motion Capture, Analysis
and Feedback to Support Learning Conducting.” In Proceedings of the International Computer Music Conference
(ICMC). Montreal, Quebec, Canada.
Fabbriciani, R., A. Vidolin, and A. de Götzen. 2011. “Hyperbass flute improvisations.” http://www.youtube.com/
watch?v=Bty8KVf0Js8, retrieved May 24, 2013.
Ng, K., and P. Nesi. 2008. “i-Maestro Framework and
Interactive Multimedia Tools for Technology-Enhanced
Learning and Teaching for Music.” In Proceedings of
the International Conference on Automated Solutions
for Cross Media Content and Multi-channel Distribution
(AXMEDIS 2008). Florence, Italy: IEEE Computer Society Press, pp. 266–269.
Rosati, G., S. Spagnol, F. Avanzini, F. Oscari, and
S. Masiero. 2012. “Effect of task-related continuous auditory feedback during learning of tracking motion exercises.” Journal of neuroengineering and rehabilitation
(9):79.
Walker, B. N., and M. A. Nees. 2011. “Theory of Sonification.” In T. Hermann, A. Hunt, and J. G. Neuhoff,
(editors) The Sonification Handbook. Berlin: Logos Verlag.
Wallin, N., B. Merker, and S. Brown, (editors) . 1999. The
origins of music. Bradford Books.