Content-Based Organization and Visualization of Music Archives

See
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/221573225
Content-based Organization and Visualization of

Music Archives
Conference Paper · January 2002

DOI: 10.1145/641007.641121 · Source: DBLP
CITATIONS READS
250 77
3 authors, including:
Elias Pampalk Dieter Merkl

55 PUBLICATIONS 2,808 CITATIONS TU Wien
179 PUBLICATIONS 3,428 CITATIONS
SEE PROFILE
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
TEXAS (TEXt AnalysiS) View project
All content following this page was uploaded by Dieter Merkl on 21 January 2014.
The user has requested enhancement of the downloaded file.

Content-based Organization and Visualization of
Music Archives
∗
Elias Pampalk Andreas Rauber, Dieter Merkl
Austrian Research Institute for Department of Software Technology
Artificial Intelligence (OeFAI) Vienna University of Technology
Schottengasse 3, A-1010 Vienna, Austria Favoritenstr. 9-11/188, A-1040 Vienna, Austria
elias@oefai.at {andi, dieter}@ifs.tuwien.ac.at
ABSTRACT categorizations and usually consist of several hundred cat-

With Islands of Music we present a system which facili- egories and sub-categories which involve high maintenance
tates exploration of music libraries without requiring man- costs, in particular for dynamic collections. The difficulties
ual genre classification. Given pieces of music in raw audio of such taxonomies have been analyzed, for example, in [19].
format we estimate their perceived sound similarities based
on psychoacoustic models. Subsequently, the pieces are or- Another approach, taken by online music stores is to analyze
ganized on a 2-dimensional map so that similar pieces are the behavior of customers to give those showing similar in-
located close to each other. A visualization using a metaphor terests recommendations on music which they might appre-
of geographic maps provides an intuitive interface where is- ciate. For example, a simple approach is to give a customer
lands resemble genres or styles of music. We demonstrate looking for pieces similar to Für Elise recommendations on
the approach using a collection of 359 pieces of music. music which is usually bought by people who also purchased
Für Elise. However, extensive and detailed customer profiles
Keywords are rarely available.
Content-based Music Retrieval, Feature Extraction, Clus-
tering, Self-Organizing Map, User Interface, Genre, Rhythm The Islands of Music system we propose facilitates explo-
ration of music archives without relying on further infor-
mation such as customer profiles or predefined categories.
1. INTRODUCTION Instead, we estimate the perceived sound similarities be-
Large music archives, such as those of online music retailers, tween two pieces of music and organize them in such a way
usually offer several ways to find a desired piece of music. that similar pieces of music are close to each other on a 2-
A straightforward approach is to use text based queries to dimensional map display. We visualize this organization us-
search for the artist, the title or some phrase in the lyrics. ing a metaphor of geographic maps where islands represent
Although such queries are very efficient they do not offer any musical genres or styles and the arrangement of the islands
particular support for queries based on the perceived simi- reflects the inherent structure of the music collection.
larities of music. For example, a simple text query asking for
pieces with characteristics similar to Für Elise by Beethoven The main challenge is to calculate an estimation for the per-
would return pieces with either the same title or the same ceived similarity of two pieces of music. To achieve this, we
artist. Thus, pieces like Fremde Länder und Menschen by use audio data as it is available from CD or decoded MP3
Schumann would be ignored. files. The raw audio signals are preprocessed in order to ob-
tain a time-invariant representation of the perceived char-
The common solution is to organize music collections by a acteristics following psychoacoustic models. In particular,
hierarchical structure of predefined genres and styles such as we extract features which characterize dynamic properties
Classical, Jazz, Rock. Hence, a customer seeking something of the music, namely rhythm patterns.
similar to Für Elise can limit the search to all pieces in the
same category. However, such organizations rely on manual To cluster and organize the pieces on a 2-dimensional map
∗Part of this work was done while the author was an ERCIM display we use the Self-Organizing Map [12], a prominent
Research Fellow at IEI, Consiglio Nazionale delle Ricerche unsupervised neural network. This results in a map where
(CNR), Pisa, Italy. similar pieces of music are grouped together. In addition
we visualize clusters using Smoothed Data Histograms [21]
to simplify the identification of interesting regions on the
map and to obtain the island visualization. We demonstrate
the user interface using a collection of 359 popular pieces of
music resembling a wide spectrum of musical taste.
The remainder of this paper is organized as follows. Sec-

tion 2 briefly reviews related work. The novel feature ex-
traction process is presented in Section 3, followed by the
organization and visualization of the music archives, which Piece of music
e.g. MP3 file
is presented in Section 4. We give a brief discussion of the
user interface in Section 5 and present experiments in Sec-
tion 6. Finally, in Section 7 some conclusions are drawn. PCM
6−second sequences
2. RELATED WORK
Powerspectrum
A vast amount of research has been conducted in the field 1 Modulation amplitude
7
of content-based music and audio retrieval. For example,
methods have been developed to search for pieces of music
Critical−band rate scale
with a particular melody. The queries can be formulated Bark 2 Fluctuation strength
8
by humming and are usually transformed into a symbolic
melody representation, which is matched against a database
of scores usually given in MIDI format. Research in this Spectral masking
3 Rythm pattern
9
direction is reported in, e.g. [1, 2, 10, 14, 26]. Other than
melodic information it is also possible to extract and search Typical rhythm pattern
Decibel
for style information using the MIDI format. For example, dB−SPL 4 10
in [5] solo improvised trumpet performances are classified
into one of the four styles: lyrical, frantic, syncopated, or
Equal−loudness levels
pointillistic. Phon 5
The MIDI format offers a wealth of possibilities, however,

Specific loudness sensation
only a small fraction of all electronically available pieces of Sone 6
music are available as MIDI. A more readily available format
is the raw audio signal to which all other audio formats
can be decoded. One of the first audio retrieval approaches Figure 1: Overview of the feature extraction pro-
dealing with music was presented in [33], where attributes cess.
such as the pitch, loudness, brightness and bandwidth of
speech and individual musical notes were analyzed. Several
overviews of systems based on the raw audio data have been (SOMeJB) and has its origin in [23] where a mediaplayer is
presented, e.g. [9, 17]. However, most of these systems do used to decompose the acoustic waves into frequency bands.
not treat content-based music retrieval in detail, but mainly Subsequently, the activity in some of the bands are analyzed
focus on speech or partly-speech audio data. using a Fourier transformation. The resulting complex co-
efficients are used as feature vectors to train a SOM. In
Furthermore, only few approaches in the area of content- this paper we present a redesigned feature extraction pro-
based music analysis have utilized the framework of psy- cess based on psychoacoustic models. Furthermore, we de-
choacoustics. Psychoacoustics deals with the relationship veloped enhanced methods to interpret the trained SOMs in
of physical sounds and the human brain’s interpretation of terms of the underlining structure and its musical meaning.
them, cf. [34]. One of the first exceptions is [8], where psy-
choacoustic models are used to describe the similarity of
instrumental sounds. The approach is demonstrated using 3. FEATURE EXTRACTION
a collection of about 100 instruments, which are organized Digitized music in good sound quality (44kHz, stereo) with
using a Self-Organizing Map (SOM) in a similar way as pre- a duration of one minute is represented by approximately
sented in this paper. For each instrument a short sound 10MB of data in its raw format. These ones and zeros
(300 milliseconds) is analyzed and steady state sounds with describe the physical properties of the acoustical waves we
a duration of 6 milliseconds are extracted. These steady hear. From this huge amount of numbers we extract features
state sounds are interpreted as the smallest possible build- enabling us to calculate the similarities of two pieces of mu-
ing blocks of music. The dynamic properties of a sound sic. Selecting the features to extract and how to extract
are described through the sequence of building blocks. Al- them is the most critical decision in the process of creating
though this approach yields promising results, the applica- a content-based organization of a music archive. We present
tion to pieces of music with a length of several minutes is features which are robust towards non-perceptive variations
not straightforward. and on the other hand resemble characteristics which are
critical to our hearing sensation, namely, rhythm patterns
A model of the human perceptual behavior of music using in various frequency bands.
psychoacoustic findings was presented in [28] together with
methods to compute the similarity of two pieces of music. A The process of extracting the patterns consists of 10 trans-
more practical approach to the topic was presented in [31] formation steps and is divided into two main stages. In
where music given as raw audio is automatically classified the first stage, the loudness sensation per frequency band in
into genres based on musical surface and rhythm features. short time intervals is calculated from the raw music data.
The rhythm features are similar to the rhythm patterns we In the second stage, the loudness modulation in each fre-
extract, with the main difference that we analyze the rhythm quency band over a time period of 6 seconds is analyzed in
in 20 frequency bands separately. respect to reoccurring beats. Figure 1 gives an overview of
the process. The various feature extraction steps are pre-
Our work is part of the SOM enhanced Jukebox project sented in more detail in the following subsections.
3.1 Raw Audio Data 120
The pieces of music we use are given as MP3 files, which 100
we decode to the raw Pulse Code Modulation (PCM) audio
Loudness [dB−SPL]
format. As mentioned before, the raw audio format of music 80
in good quality requires huge amounts of storage. However,
60
humans can easily identify the genre of a piece of music even
if its sound quality is rather poor. Thus, for our experiments 40
we reduced the quality and as a consequence the amount of
20
data to a level which is computationally feasible while ensur-
ing that human listeners are still easily capable of identifying 0
the genre of a piece. In particular, we reduced stereo sound
quality to mono and down-sampled the music from 44kHz 10
−1
10
0 1
10
to 11kHz. Furthermore, we divided each piece into 6-second Frequency [kHz]
sequences and selected only every third of these after remov-
ing the first two and last two sequences to avoid lead-in and Figure 2: The equal loudness contours for 3, 20, 40,
fade-out effects. The duration of 6 seconds (216 samples) 60, 80, and 100 Phon are represented by the dashed
was chosen because it is long enough for human listeners to lines. The respective Sone values are 0, 0.15, 1, 4,
get an impression of the style of a piece of music while being 16, and 64 Sone. The dotted vertical lines mark the
short enough to optimize the computations. All in all, we positions of the center frequencies of the 24 critical-
reduced the amount of data by the factor of over 24 without bands. The dip around 2kHz to 5kHz corresponds
losing relevant information, i.e. a human listener is still able to the frequency spectrum we are most sensitive to.
to identify the genre or style of a piece of music given the
few 6-second sequences in lower quality.
Relative Fluctuation Strength

1
3.2 Specific Loudness Sensation 0.8
In the first stage of the feature extraction process, the spe- 0.6
cific loudness sensation (Sone) per critical-band (Bark) is
0.4
calculated in 6 steps starting with the PCM data. (1) First
the power spectrum of the audio signal is calculated using a 0.2
Fast Fourier Transformation (FFT). We use a window size 0

of 256 samples which corresponds to about 23ms at 11kHz, 0 4 8 12 16 20 24
Modulation Frequency [Hz]
and a Hanning window with 50% overlap. (2) The frequen-
cies are bundled into 20 critical-bands according to the Bark
scale [34]. These frequency bands reflect characteristics of Figure 3: The relationship between the modulation
the human auditory system, in particular of the cochlea in frequency and the weighting factors of the fluctua-
the inner ear. Below 500Hz the critical-bands are about tion strength.
100Hz wide. Above 500Hz the width increases rapidly with
the frequency. The 24th critical-band has a width of 3500Hz
and is centered at 13500Hz. (3) Spectral masking effects are 3.3 Rhythm Patterns
calculated based on [29]. Spectral Masking is the occlusion In the second stage of the feature extraction process, we
of a quiet sound by a louder sound when both sounds are calculate a time-invariant representation for each piece in
present simultaneously and have similar frequencies. (4) 3 further steps, namely the rhythm pattern. The rhythm
The loudness is calculated first in decibel relative to the pattern contains information on how strong and fast beats
threshold of hearing, also known as dB-SPL, where SPL is are played within the respective frequency bands.
the abbreviation for sound pressure level. (5) From the dB-
SPL values we calculate the equal loudness levels with their (7) First the amplitude modulation of the loudness sensation
unit Phon. The Phon levels are defined through the loud- per critical-band for each 6-second sequence is calculated us-
ness in dB-SPL of a tone with 1kHz frequency. A level of ing a FFT. (8) The amplitude modulation coefficients are
40 Phon resembles the loudness level of a 40dB-SPL tone weighted based on the psychoacoustic model of the fluctu-
at 1kHz. The loudness level of an acoustical signal with a ation strength [7]. The amplitude modulation of the loud-
specific dB-SPL value depends on the frequency of the sig- ness has different effects on our hearing sensation depending
nal. For example, a tone with 65dB-SPL at 50Hz has about on the modulation frequency. The sensation of fluctuation
40 Phon [34]. (6) Finally the loudness is calculated in Sone strength is most intense around 4Hz and gradually decreases
based on [4]. The loudness of the 1kHz tone at 40dB-SPL up to a modulation frequency of 15Hz (cf. Figure 3). In our
is defined to be 1 Sone. A tone twice as loud is defined to experiments we investigate the rhythm patterns up to 600
be 2 Sone and so on. Figure 2 summarizes the main charac- beats per minute (bpm) which is equivalent to a modulation
teristics of the psychoacoustic model used to calculate the frequency of 10Hz.
specific loudness sensation.
For each of the 20 frequency bands we obtain 60 values for
After the first preprocessing stage a piece of music is rep- modulation frequencies between 0 and 10Hz. This results
resented by several 6-second sequences. Each of these se- in 1200 values representing the fluctuation strength. (9) To
quences contains information on how loud the piece is at a distinguish certain rhythm patterns better and to reduce
specific point in time in a specific frequency band. irrelevant information, gradient and Gaussian filters are ap-
PCM Audio Signal Specific Loudness Sensation [Sone] 1 2 3
0.1 6 0.5 0.9 1.1
Amplitude 20
Bark
a) 0 10
0 0 0
4 5 6
−0.1 1 0 0.5 1.8 0.9
1.0 20 25
Amplitude
Bark
0 0 0
b) 0 10 7
0.7
8
1.8
9
0.9
−1.0 1 1
0 2 4 6 0 2 4 6
0 0 0
Time [s] Time [s] Median
0.7
Figure 4: The data before and after the first feature 0

extraction stage. The top row represents the trans-
formation of a 6-second sequence from Beethoven, (a) Beethoven, Für Elise
Für Elise and the bottom row a 6-second sequence
1 2 3
from Korn, Freak on a Leash. 9.1 4.6 6.9
0.3 0.3 0.3

4 5 6
plied. In particular, we use gradient filters to emphasize dis- 4.4 4.5 9
tinctive beats, which are characterized through a relatively

high fluctuation strength at a specific modulation frequency 7
0.3
4.8
8
0.3
4.2
9
0.3
4.7
compared to the values immediately below and above this
specific frequency. We apply a Gaussian filter to increase 0.3 0.3 0.3
the similarity between two characteristics in a rhythm pat- 10
9.1
11
4.4
Median
4.2
tern which differ only slightly in the sense of either being
in similar frequency bands or having similar modulation fre- 0.5 0.2 0.4
quencies.
(b) Korn, Freak on a Leash
Finally, to obtain a single representation for each piece of
music based on the rhythm patterns of its sequences, (10) the Figure 5: The rhythm patterns of Beethoven, Für
median of the corresponding sequences is calculated. We Elise and Korn, Freak on a Leash and their medi-
have evaluated several alternatives using Gaussian mixture ans. The vertical axis represents the critical-bands
models, fuzzy c-means, and k-means pursuing the assump- from Bark 1-20, the horizontal axis the modulation
tion that a piece of music contains significantly different frequencies from 0-10Hz, where Bark 1 and 0Hz is
rhythm patterns (see [20] for details). However, the median, located in the lower left corner.
despite being by far the simplest technique, yielded compa-
rable results to the more complex methods. Other simple
alternatives such as the mean proved to be too sensitive to Generally, the different patterns within a piece of music have
outliers. common properties. While Für Elise is characterized by a
rather horizontal shape with low values, Freak on a Leash
At the end of the feature extraction process each piece of has a characteristic vertical line around 7Hz that reflects
music is represented by a 20×60 matrix. In our experiments strong reoccurring rhythmic elements. It is also interesting
with 359 pieces we further reduced the dimensionality from to note that the values of the patterns of Freak on a Leash
1200 to 80 using Principial Component Analysis without are up to 18 times higher compared to those of Für Elise.
losing much of the variance in the data [20].
To capture these common characteristics within a piece of
3.4 Illustrations music the median is a suitable approach. The median of Für
Figure 4 illustrates the data before and after the first feature Elise indicates that there are common but weak activities in
extraction stage using the first 6-second sequences extracted the range of 3-10 Bark with a modulation frequency of up
from Beethoven, Für Elise and from Korn, Freak on a Leash. to 5Hz. The single sequences of Für Elise have many more
The sequence of Für Elise contains the main theme starting details, for example, the first sequence has a minor peak
shortly before the 2nd second. The specific loudness sensa- around 5 Bark and 5Hz modulation frequency. That the
tion depicts each piano key played and the rhythm pattern median cannot represent all details becomes more apparent
has very low values with no distinctive vertical lines. This when analyzing Freak on a Leash. However, the main char-
reflects that there are no strong beats reoccurring in the ex- acteristics, namely the vertical line at 7Hz as well as the
act same intervals. On the other hand, Freak on a Leash generic activity in the frequency bands are preserved.
which is classified as Heavy Metal/Death Metal is quite ag-
gressive. Melodic elements do not play a major role and the Further examples are depicted in Figure 6. The typical
specific loudness sensation is a rather complex pattern. rhythm pattern of Williams, Rock DJ has a strong bass
which is represented by the white spot around Bark 1-2 and
The rhythm patterns of all 6-second sequences extracted a little less then 2Hz modulation frequency (120bpm). The
from Für Elise and from Freak on a Leash as well as their maximum values are about twice as high as those of Freak
medians are depicted in Figure 5. The first subplots corre- on a Leash because the beat plays a far more dominating
spond to the sequences depicted in Figure 4. role in this dance club song. The beats of Bomfunk MC’s, In
Rock DJ
11.5
In Stereo
9.9
Yesterday
3.1 In the second step each model vector is adapted to better
fit the data it represents. To ensure that each unit j rep-
0.4 0.4 0.2 resents similar data items as its neighbors, the model vec-
tor mj is adapted not only according to the assigned data
items but also in regard to those assigned to the units in
Figure 6: The median of the rhythm patterns of Rob-
the neighborhood. The neighborhood relationship between
bie Williams, Rock DJ, Bomfunk MC’s, In Stereo,
two units j and k is usually defined by a Gaussian-like func-
and The Beatles, Yesterday. The axes represent the
tion hjk = exp(−d2jk /rt2 ), where djk denotes the distance
same scales as in Figure 5.
between the units j and k on the map, and rt denotes the
neighborhood radius which is set to decrease with each iter-
ation t.
Stereo, which combines the styles of Hip Hop, Electro and
House, are just as strong. However, the beats are also a lot Assuming a Euclidean vector space, the two steps of the
faster 5Hz (300bpm). The final example depicts the median batch-SOM algorithm can be formulated as
of the rhythm patterns of the song Yesterday by The Beat-
les. There are no strong reoccurring beats. The activation ci = argmin kxi − mj k , and
j
in the rhythm pattern is similar to the one of Für Elise, ex- P
cept that the values are generally higher and that there are hjci xi
m∗j = Pi ,
also activations in higher frequency bands. i0 hjci0
where m∗j is the updated model vector.

4. ORGANIZATION AND VISUALIZATION
Several variants of the SOM algorithm exist. A particu-
We use the typical rhythm patterns as input to the Self-
larly interesting variant regarding the organization of large
Organizing Map (SOM) [12] algorithm to organize the pieces
music archives is the adaptive GHSOM [6] which provides
of music on a 2-dimensional map display in such a way that
similar pieces are grouped close together. We then visualize a hierarchical organization and representation of the data.
Experiments using the GHSOM to organize a music archive
the clusters with a metaphor of geographic maps to create
are presented in [25].
a user interface where islands represent musical genres or
styles and the way the islands are automatically arranged
on the map represents the inherent structure of the music 4.2 Smoothed Data Histograms
archive. Several methods to visualize clusters based on the SOM can
be found in the literature. The most prominent method vi-
sualizes the distances between the model vectors of units
4.1 Self-Organizing Maps which are immediate neighbors and is known as the U-
The SOM is a powerful tool for explorative data analysis, matrix [32]. We use Smoothed Data Histograms (SDH) [21]
and in particular to visualize clusters in high-dimensional where each data item votes for the map units which repre-
data. Methods with similar abilities include Principial Com- sent it best based on some function of the distance to the
ponent Analysis [11], Multi-Dimensional Scaling [15], Sam- respective model vectors. All votes are accumulated for each
mon’s mapping [27], or the Generative Topographic Map- map unit and the resulting distribution is visualized on the
ping [3]. One of the main advantages of the SOM with map. As voting function we use a robust ranking where the
regard to our application is, that new pieces of music, which map unit closest to a data item gets n points, the second
are added to the archive, can easily be placed on the map ac- n-1, the third n-2 and so forth, for the n closest map units.
cording to the existing organization. Furthermore, the SOM All other map units are assigned 0 points. The parameter
is a very efficient algorithm which has proven to be capable n can interactively be adjusted by the user. The concept of
of handling huge amounts of data. It has a strong tradition this visualization technique is basically a density estimation,
in the organization of large text archives [13, 24, 18], which thus the results resemble the probability density of the whole
makes it an interesting choice for large music archives. data set on the 2-dimensional map (i.e. the latent space).
The main advantage of this technique is that it is compu-
The SOM usually consists of units which are ordered on tationally not heavier than one iteration of the batch-SOM
a rectangular 2-dimensional grid. A model vector in the algorithm.
high-dimensional data space is assigned to each of the units.
During the training process the model vectors are fitted to To create a metaphor of geographic maps, namely Islands
the data in such a way that the distances between the data of Music, we visualize the density using a specific color code
items and the corresponding closest model vectors are mini- that ranges from dark blue (deep sea) to light blue (shallow
mized under the constraint that model vectors which belong water) to yellow (beach) to dark green (forest) to light green
to units close to each other on the 2-dimensional grid, are (hills) to gray (rocks) and finally white (snow). Results of
also close to each other in the data space. these color codings can be found in [20]. In this paper we use
gray shaded contour plots where dark gray represents deep
For our experiments we use the batch-SOM algorithm. The sea, followed by shallow water, flat land, hills, and finally
algorithm consists of two steps that are iteratively repeated mountains represented by the white.
until no more significant changes occur. First the distances
between all data items {xi } and the model vectors {mj } are 4.3 Illustrations
computed and each data item xi is assigned to the unit ci Figure 7 illustrates characteristics of the SOM and the clus-
that represents it best. ter visualization using a synthetic 2-dimensional data set.
To describe what type of music can be found in specific
regions of the map we offer two approaches. The first is to
use pieces known to the user as landmarks. Map areas are
then described based on their similarity to known pieces. For
example, if the user seeks music like Für Elise by Beethoven
and this piece is located on the peak of a mountain, then this
mountain is a good starting point for an explorative search.
The main limitation of this approach is that large parts of
the map might not contain any music familiar to the user,
and thus lack a description. On the other hand, unknown
pieces can easily become familiar - if the user listens to them.
The second approach is to use general labels to describe

properties of the music. Similar techniques have been em-
ployed in the context of text-document archives [16, 22],
where map areas are labeled with words summarizing the
Figure 7: A simple demonstration of the SOM and contents of the respective documents. Based on the rhythm
SDH. From left to right, top to bottom the figures patterns we extract attributes such as maximum fluctuation
illustrate (a) the probability distribution in the 2- strength, strength of the bass, aggressiveness, how much low
dimensional data space, (b) the sample drawn from frequencies dominate the overall pattern, and the frequen-
this distribution, (c) the model vectors of the SOM cies at which beats occur.
in the data space, (d) the map units of the SOM in
the visualization space with the clusters visualized The maximum fluctuation strength is the highest value in
using the SDH (n =3 with spline interpolation). The the rhythm pattern. Pieces of music, which are dominated
model vectors and the map units of the SOM are by strong beats, have very high values. Typical examples
represented by the nodes of the rectangular grid. with high values include Electro and House music. Whereas,
for example, Classic music has very low values. The bass
is calculated as the sum of the values in the two lowest
One important aspect of the SOM is the neighborhood preser- frequency bands (Bark 1-2) with a modulation frequency
vation. Map units next to each other on the grid represent higher than 1Hz. The aggressiveness is measured as the ra-
similar regions in the data space. Another important as- tio of the sum of values within Bark 3-20 and modulation
pect is that the SOM defines a non-linear mapping from the frequencies below 0.5Hz compared to the sum of all. Gener-
data space to the 2-dimensional map. The distances between ally, rhythm patterns which have strong vertical lines sound
neighboring model vectors is not uniform, in particular, ar- more aggressive. The domination of low frequencies is cal-
eas in the data space with a high density are represented in culated as the ratio between the sum of the values in the
higher detail, thus by more model vectors than sparse areas. highest and lowest 5 frequency bands.
The SDH is a straightforward approach to visualize the clus- Using these attributes, geographic landmarks such as moun-
ter structure of the data set. Map units which are in the cen- tains and hills can be labeled with descriptions which in-
ters of clusters are represented by peaks while map units lo- dicate what type of music can be found in the respective
cated between clusters are represented as valleys or trenches. area. Details on the labeling of the Islands of Music can be
found in [20]. Another alternative is to create a metaphor
of weather charts. For example, areas with a strong bass are
5. USER INTERFACE visualized as areas with high temperatures, while areas with
In the previous sections we presented the technical compo-
low bass correspond to cooler regions. Hence, for example,
nents of the Islands of Music system. In this section we will
the user can easily understand that the pieces are organized
briefly discuss how the maps are intended to support the
in such a manner that those with a strong bass are in the
user to navigate through an archive and explore unknown
west and those with less bass in the east.
but interesting pieces.
The geographic arrangement of the maps reflects the inher- 6. EXPERIMENTS

ent hierarchical structure of genres and styles in an archive. In this section we briefly describe the results obtained from
On the highest level in the hierarchy larger genres are repre- our experiments with a music collection consisting of 359
sented by continents and islands. These might be connected pieces with a total play length of 23 hours representing a
through land passages or might be completely isolated by the broad spectrum of musical taste. A full list of all titles in
sea. On lower levels the structure is represented by moun- the collection can be found in [20].
tains and hills, which can be connected through a ridge or
separated by valleys. For example, in the experiments pre- Figure 8 gives an overview of the collection. The trained
sented in the next section, less aggressive music without SOM consists of 14×10 map units and the clusters are visu-
strong bass beats is represented by a larger continent. On alized using the SDH (n=3 with linear interpolation). Sev-
the south-east end of this continent there are two mountains, eral clusters can be identified immediately. We will discuss
one representing Classical music and the other representing the 6 labeled clusters in more detail.
music such as Yesterday from the Beatles and film music
using orchestras. Figure 9 shows simplified weather charts. With these it is
1 bfmc−instereo aroundtheworld
bfmc−instereo
aroundtheworld
bfmc−rocking bfmc−flygirls
bfmc−rocking
bfmc−flygirls believe
believe
bfmc−skylimit bfmc−stirup−
bfmc−skylimit
bfmc−stirup− eiffel65−blue
eiffel65−blue
bfmc−uprocking
bfmc−uprocking thebass
thebass togetheragain
togetheragain
letsgetloud
letsgetloud
latinolover
latinolover
2 wonderland
wonderland
bongobong
bongobong
themangotree
themangotree conga
conga
5
bfmc−1234
bfmc−1234 rhcp−easily
rhcp−easily
kiss
kiss rhcp−otherside
rhcp−otherside
bfmc−rock
bfmc−rock
rhcp−dirt
rhcp−dirt saymyname
saymyname
3 rhcp−getontop
rhcp−getontop sexbomb
sexbomb
4
6 rhcp−californication
rhcp−californication
rhcp−emitremmus
rhcp−emitremmus
bfmc−freestyler
seeyouwhen bfmc−freestyler
seeyouwhen
rhcp−scartissue
rhcp−scartissue
singalongsong sl−summertime
singalongsong sl−summertime
rhcp−universe
rhcp−universe
rhcp−velvet
rhcp−velvet
Figure 8: The visualization of the music collection

consisting of 359 pieces of music trained on a SOM
with 14×10 map units. The rectangular boxes mark Figure 10: Close-up of Cluster 1 and 2 depicting
areas into which the subsequent figures zoom into. 3×4 map units.
The islands labeled with numbers from 1 to 6 are
discussed in more detail in the text.
Bomfunk MCs (bfmc) are located here but also songs with
more moderate beats such as Blue by Eiffel 65 (eiffel65-
blue) or Let’s get loud by Jennifer Lopez (letsgetloud). All
but three songs of Bomfunk MCs in the collection are lo-
cated on the west side of this island. One exception is the
piece Freestyler (center-bottom Figure 10) which has been
the group’s biggest hit so far. Freestyler differs from the
other pieces by Bomfunk MCs as it is softer with more mod-
erate beats and more emphasis on the melody. Other songs
which can be found towards the east of the island are Around
the World by ATC (aroundtheworld), and Together again
by Janet Jackson (togetheragain) which both can be cat-
egorized as a Electronic/Dance. Around the island other
songs are located which have stronger beats, for example
towards the south-west, Bongo Bong by Mano Chao (bon-
Figure 9: Simplified weather charts. White indi- gobong) and Under the mango tree by Tim Tim (theman-
cates areas with high values while dark gray indi- gotree), both with male vocals, an exotic flair and similar
cates low values. The charts represent from left instruments.
to right, top to bottom the maximum fluctuation
strength, bass, non-aggressiveness, and domination In the Figure 10 Cluster 2 is depicted in the south-east. This
of low frequencies. island is dominated by pieces of the rock band Red Hot Chili
Peppers (rhcp). All but few of the band’s songs which are in
the collection are located on this island. To the west of the
possible to obtain a first impression of the styles of music island a piece is located which, at first does not appear to be
which can be found in specific areas. For example, music similar, namely Summertime by Sublime (sl-summertime).
with strong bass can be found in the west, and in particular This song is a crossover of styles such as Rock and Reg-
in the north-west. The bass is strongly correlated with the gae but has a similar beat pattern as Freestyler. However,
maximum fluctuation strength, i.e. pieces with very strong Summertime would make a good transition in a play-list
beats can also be found in the north-west, while pieces with- starting with Electro/House and moving towards the style
out strong beats nor bass are located in the south-east, to- of Red Hot Chili Peppers which resembles a crossover of dif-
gether with non-aggressive pieces. Furthermore, the south- ferent styles such as Funk and Punk Rock, e.g. In Stereo,
east is the main location of pieces where the lower frequen- Freestyler, Summertime, Californication. Not illustrated in
cies are dominant. However, the north-west corner of the the close-up but also interesting is that just to the south of
map also represents music where the low frequencies dom- Summertime another song of Sublime can be found namely
inate. As we will see later, this is due to the strong bass What I got.
contained in the pieces.
A close-up of Cluster 3 is depicted in the south-west of Fig-
A close-up of Cluster 1 in Figure 8 is depicted in the north ure 11. This cluster is dominated by aggressive music such
of the map in Figure 10. This island represents music with as the songs of the band Limp Bizkit (limp) which can be
very strong beats, in particular several songs of the group categorized as Rap-Rock. Other similar pieces are Freak on
morningbroken
morningbroken
nocturne
nocturne adiemus
adiemus
backforgood
backforgood
ga−iwantit
ga−iwantit nma−poison
nma−poison onlyyou
onlyyou future
future feellovetonight
feellovetonight
bigworld
bigworld
verve−bittersweet ga−moneymilk
verve−bittersweet ga−moneymilk whenyousay starwars
starwars indy
indy giubba
giubba
whenyousay br−anesthesia
br−anesthesia
party
party youlearn
youlearn threetimesalady
threetimesalady schneib
schneib
yesterday
yesterday
addict
addict allegromolto
allegromolto beethoven
beethoven fuguedminor
fuguedminor
ga−lie
ga−lie pr−angels
pr−angels lovemetender
lovemetender therose
therose
br−punkrock
br−punkrock ga−innocent
ga−innocent pinkpanther requiem
requiem vm−bach
vm−bach
limp−lesson
limp−lesson pinkpanther sml−icecream
sml−icecream
limp−stalemate
limp−stalemate ga−time
ga−time vm−classicalgas shakespeare
shakespeare vm−brahms
vm−brahms
vm−classicalgas
vm−toccata
vm−toccata
elvira
elvira branden,
branden, forelle
forelle
d3−kryptonite
d3−kryptonite pachelbl
pachelbl minuet,
minuet, schindler
schindler
fbs−praise ga−anneclaire
ga−anneclaire beautyandbeast
beautyandbeast
fbs−praise schwan
schwan walzer
walzer
limp−rearranged
limp−rearranged thecircleoflife
thecircleoflife
korn−freak limp−nobodyloves ga−heaven
korn−freak limp−nobodyloves ga−heaven stormsinafrica zapfenstreich
stormsinafrica zapfenstreich
limp−show
limp−show
limp−99
limp−99 pr−neverenough limp−wandering
pr−neverenough limp−wandering tell
tell zarathustra
zarathustra
pr−deadcell
pr−deadcell
rem−endoftheworld
rem−endoftheworld pr−binge
pr−binge
wildwildwest
wildwildwest adagio, air
adagio, air
avemaria
avemaria
jurassicpark
jurassicpark elise, flute
elise, flute
leavingport fortuna,
leavingport fortuna, funeral
funeral
merry
merry kidscene, mond
kidscene, mond
mountainking
mountainking
nachtmusik
nachtmusik
Figure 11: Close-up of Cluster 3 and 4 depicting
4×3 map units.
Figure 12: Close-up of Cluster 5 and 6 depicting

a Leash by Korn (korn-freak), Dead Cell by Papa Roach (pr- 3×4 map units.
deadcell), or Kryptonite by 3 Doors Down (d3-kryptonite).
In the north of this cluster, for example, the Punk Rock Song
by Bad Religion (br-punkrock) can be found. To the west the limitations of the approach. For example, the song Wild
of this cluster, just beyond the borders of this close-up, sev- Wild West by Will Smith (wildwildwest) does not sound
eral other songs by Limp Bizkit are located together with very similar to songs by Papa Roach or Limp Bizkit, how-
songs by Papa Roach and to the south-west Rock is dead by ever, they are located together in Cluster 3. Another prob-
Marilyn Manson. lem in the same region is the song It’s the end of the world
by REM (rem-endoftheworld) which is located next to songs
The pieces arranged around Cluster 4 are depicted in the such as Freak on a Leash by Korn. Problems in different re-
east of Figure 11. Generally the pieces in Cluster 4 sound gions include, for example, Between Angles and Insect by
less aggressive than those in Cluster 3. However, those in the Papa Roach (pr-angles) which is located in the south of the
south of this cluster are closely related to those of Cluster 3, Cluster 5 which is definitely a poor match.
including pieces such as Wandering by Limp Bizkit (limp-
wandering), Binge by Papa Roach (pr-binge), and the two The main reason to these problems can be found in the fea-
songs by Guano Apes (ga) which are a mixture of Punk ture extraction process. Although we analyze the dynamic
Revival, Alternative Metal, and Alternative Pop/Rock. To behavior of the loudness in several frequency bands, we do
the north of the cluster the songs Addict by K’s Choice and not take the sound characteristics directly into account as
Living in a Lie by Guano Appes are mapped next to each could be done, for example, by analyzing the cepstrum which
other. Living in a Lie deals with the end of a love story, and is a common technique in speech recognition. Another ex-
is dominated by a mood, which sounds very similar to the planation is the simplified median approach. Many pieces
mood of Addict which deals with addiction and includes lines usually consist of more than one typical rhythm pattern,
such as “I am falling” and “I am cold, alone”. The other combining these using the median can lead to a pattern
pieces in the north of the cluster are modern interpretations which might be less typical for a piece than the individual
of classical pieces by Vanessa Mae (vm). ones.
The final two clusters which we will describe in detail are de- For detailed evaluations the model vectors of the SOM can
picted in Figure 12. Cluster 5 represents concert music and be visualized as depicted in Figure 13. As indicated by the
classical music used for films, including the well known Star- weather charts the lowest fluctuation strength values are lo-
wars theme (starwars), the theme of Indiana Jones (indy), cated in the south-east of the map and can be found in map
and the end credits of Back to the Future III (future). How- unit (14,1). It is interesting to note the similarity between
ever, there are also two pieces in this cluster which do not the typical rhythm pattern of Für Elise (cf. Figure 5(a))
fit this style, namely Yesterday by the Beatles (yesterday) and this unit. On the other hand the unit (6,2) which repre-
and Morning has broken by Cat Stevens (morningbroken). sents Freak on a Leash is not a perfect match for its rhythm
pattern as a comparison to Figure 5(b) reveals. In partic-
Cluster 6 represents peaceful classical pieces such as Für ular the vertical line at about 7Hz is emphasized stronger
Elise by Beethoven (elise), Eine kleine Nachtmusik by Mozart in Freak on a Leash than in its corresponding model vector.
(nachtmusik), Fremde Länder und Menschen by Schumann Note, that the highest fluctuation strength values of Freak
(kidscene), Air from Orchestral Suite #3 by Bach (air), and on a Leash are around 4.2 while the model vector only cov-
Trout Quintet by Schubert. ers the range up to 3. Generally, the model vectors are a
good representation of the rhythm patterns contained in the
Although the results we obtained are generally very encour- collection, as each model vector represents the average of all
aging, we have come across some problems which point out pieces mapped to it.
(1,10) (2,10) (3,10) (4,10) (5,10) (6,10) (7,10) (8,10) (9,10) (10,10) (11,10) (12,10) (13,10) (14,10)
0.5 − 5.1 0.4 − 4.5 0.4 − 9.7 0.5 − 8.9 0.6 − 7.9 0.5 − 12.4 0.4 − 13 0.6 − 9.7 0.5 − 11.6 0.6 − 12.9
0.4 − 4.9 0.3 − 3.7 0.3 − 4.8 0.5 − 5.2 0.6 − 6.2 0.5 − 11.9 0.5 − 9.6 0.5 − 10.2 0.5 − 9.5 0.5 − 13.3
0.3 − 4.9 0.4 − 5.1 0.3 − 6.1 0.4 − 5.7 0.4 − 7.1 0.5 − 9.7 0.5 − 5.5 0.6 − 6.8 0.4 − 8.1 0.4 − 14.4
0.2 − 9 0.4 − 18.6
0.3 − 6.2 0.4 − 3.1 0.3 − 4.8 0.2 − 7.2 0.4 − 5.2 0.5 − 5.7 0.4 − 4.6 0.4 − 7.2 0.3 − 7.6 0.2 − 9.5
0.3 − 4.3 0.3 − 3.1 0.4 − 3.3 0.4 − 4.1 0.3 − 3.6 0.2 − 4.6 0.4 − 4.9 0.3 − 3.7 0.4 − 5.5 0.2 − 6.5
0.3 − 4.6 0.2 − 5.4 0.3 − 9.7
0.2 − 4.7 0.3 − 2.6 0.1 − 4.5 0.3 − 3.1 0.2 − 2.7 0.3 − 3.7 0.3 − 3.9 0.4 − 5.1
0.1 − 1.9 0.1 − 1.9 0.2 − 2.3 0.1 − 1.9 0.1 − 3.2 0.2 − 2.2 0.3 − 3.1 0.2 − 2.9 0.2 − 3.5 0.2 − 4.5
0.1 − 1.3 0.1 − 1.5 0.1 − 1.9 0.2 − 2.1 0.2 − 2.3 0.3 − 2.2 0.2 − 2.8 0.3 − 4.8
0.1 − 1.3 0.1 − 1.8 0.1 − 1.7 0.2 − 2.5 0.2 − 2.3 0.2 − 3.4 0.3 − 3.3 0.3 − 3.8
0.4 − 7
0.2 − 5
0.3 − 4
(1,9) (2,9) (3,9) (4,9) (5,9) (6,9) (7,9) (8,9) (9,9) (10,9) (11,9) (12,9) (13,9) (14,9)
0.2 − 4.9 0.3 − 5.1 0.3 − 5.1 0.4 − 3.8 0.2 − 4.2 0.3 − 4.1 0.3 − 5.1
0.3 − 2.6 0.2 − 3.2 0.2 − 2.8 0.2 − 3.3 0.3 − 2.8 0.2 − 3.4
0.2 − 2.1 0.2 − 3.3 0.2 − 2.5 0.4 − 3.3

(1,8) (2,8) (3,8) (4,8) (5,8) (6,8) (7,8) (8,8) (9,8) (10,8) (11,8) (12,8) (13,8) (14,8)
0.4 − 6.2
(1,7) (2,7) (3,7) (4,7) (5,7) (6,7) (7,7) (8,7) (9,7) (10,7) (11,7) (12,7) (13,7) (14,7)
0.6 − 7
0.2 − 4
(1,6) (2,6) (3,6) (4,6) (5,6) (6,6) (7,6) (8,6) (9,6) (10,6) (11,6) (12,6) (13,6) (14,6)
0.4 − 5.8
0.2 − 3.2 0.3 − 2.4 0.3 − 2.6 0.3 − 2.6 0.3 − 3.9 0.4 − 4.1
(1,5) (2,5) (3,5) (4,5) (5,5) (6,5) (7,5) (8,5) (9,5) (10,5) (11,5) (12,5) (13,5) (14,5)
0.3 − 5
0 − 2.6
(1,4) (2,4) (3,4) (4,4) (5,4) (6,4) (7,4) (8,4) (9,4) (10,4) (11,4) (12,4) (13,4) (14,4)
0.3 − 5.6 0.4 − 3.7 0.4 − 3.7 0.5 − 4.8
0.1 − 1.2 0.1 − 1.7 0.1 − 1.5 0.1 − 1.8

(1,3) (2,3) (3,3) (4,3) (5,3) (6,3) (7,3) (8,3) (9,3) (10,3) (11,3) (12,3) (13,3) (14,3)
0.1 − 3
(1,2) (2,2) (3,2) (4,2) (5,2) (6,2) (7,2) (8,2) (9,2) (10,2) (11,2) (12,2) (13,2) (14,2)
0.2 − 2.5 0.1 − 3.7

0.4 − 3
0.3 − 3
0 − 1.1
0 − 0.7
(1,1) (2,1) (3,1) (4,1) (5,1) (6,1) (7,1) (8,1) (9,1) (10,1) (11,1) (12,1) (13,1) (14,1)
0.4 − 7.1
0.3 − 2.6
0 − 1.1
0 − 0.6
Figure 13: The model vectors of the 14×10 music SOM. Each subplot represents the rhythm pattern of a
specific model vector. The horizontal axis represents modulation frequencies from 0-10Hz the vertical axis
represents the frequency bands Bark 1-20. The range depicted to the left of each subplot depicts the highest
and lowest fluctuation strength value within the respective rhythm pattern. The gray shadings are adjusted
so that black corresponds to the lowest and white to the highest value in each pattern.
Experiments, as well as a Matlab°

r
toolbox, are available Education, Science and Culture (BMBWK) in the form of
from the project homepage.1 a START Research Prize. The BMBWK also provides fi-
nancial support to the Austrian Research Institute for Arti-
ficial Intelligence. The authors wish to thank Simon Dixon,
7. CONCLUSIONS Markus Frühwirth, and Werner Göbel for valuable discus-
We have presented a system for content-based organization sions and contributions.
and visualization of music archives. Given pieces of music in
raw audio format a geographic map is created where islands
represent musical genres or styles. The inherent structure 9. REFERENCES
of the music collection is reflected in the arrangement of the [1] D. Bainbridge, C. Nevill-Manning, H. Witten,
islands, mountains, and the sea. Islands of Music enable L. Smith, and R. McNab. Towards a digital library of
exploration of music archives based on sound similarities popular music. In Proc. ACM Conf. on Digital
without relying on manual genre classification. Libraries, pages 161–169, Berkeley, CA, 1999. ACM.
[2] W. P. Birmingham, R. B. Dannenberg, G. H.

The most challenging part is to compute the perceived sim-
Wakefield, M. Bartsch, D. Bykowski, D. Mazzoni,
ilarity of two pieces of music. We have presented a novel
C. Meek, M. Mellody, and W. Rand. MUSART: Music
and straightforward approach focusing on rhythmic prop-
retrieval via aural queries. In Int. Symposium on
erties following psychoacoustic models. We evaluated our
Music Information Retrieval (ISMIR), 2001.
approach using a collection of 359 pieces of music and ob-
tained encouraging results. [3] C. M. Bishop, M. Svensén, and C. K. I. Williams.
GTM: The Generative Topographic Mapping. Neural
Future work will mainly deal with improving the feature Computation, 10(1):215–234, 1998.
extraction process. While low-level features seem to offer
a simple but powerful way of describing the music, more [4] R. Bladon. Modeling the judgment of vowel quality
abstract features are necessary to explain what the organi- differences. Journal of the Acoustical Society of
zation represents. Several alternatives to estimate the per- America, 69:1414–1422, 1981.
ceived similarity of music have been published recently (e.g.
[30]) and a combination might yield superior results. [5] R. B. Dannenberg, B. Thom, and D. Watson. A
machine learning approach to musical style
recognition. In Proc. Int. Computer Music Conf.
8. ACKNOWLEDGMENTS (ICMC), pages 344–347, Thessaloniki, GR, 1997.
Part of this research has been carried out in the project
Y99-INF, sponsored by the Austrian Federal Ministry of [6] M. Dittenbach, D. Merkl, and A. Rauber. The
Growing Hierarchical Self-Organizing Map. In Proc.
1
http://www.oefai.at/˜elias/music Int. Joint Conf. on Neural Networks (IJCNN),
volume VI, pages 15–19, Como, Italy, 2000. IEEE
View publication stats
[21] E. Pampalk, A. Rauber, and D. Merkl. Using
Computer Society. Smoothed Data Histograms for Cluster Visualization
in Self-Organizing Maps. In Proc. Int. Conf. on
[7] H. Fastl. Fluctuation strength and temporal masking Artifical Neural Networks (ICANN), 2002.
patterns of amplitude-modulated broad-band noise.
Hearing Research, 8:59–69, 1982. [22] A. Rauber. LabelSOM: On the Labeling of
Self-Organizing Maps. In Proc. Int. Joint Conf. on
[8] B. Feiten and S. Günzel. Automatic Indexing of a
Neural Networks (IJCNN), Washington, DC, 1999.
Sound Database Using Self-organizing Neural Nets.
Computer Music Journal, 18(3):53–65, 1994. [23] A. Rauber and M. Frühwirth. Automatically
analyzing and organizing music archives. In Proc.
[9] J. Foote. An overview of audio information retrieval.
European Conf. on Research and Advanced Technology
ACM Multimedia Systems, 7(1):2–10, 1999.
for Digital Libraries (ECDL), Springer Lecture Notes
[10] A. Ghias, J. Logan, D. Camberlin, and B. C. Smith. in Computer Science, Darmstadt, Germany, 2001.
Query by humming: Musical information retrieval in Springer.
an audio database. In Proc. ACM Int. Conf. on
Multimedia, pages 231–236, San Fancisco, CA, 1995. [24] A. Rauber and D. Merkl. The SOMLib Digital
ACM. Library System. In Proc. European Conf. on Research
and Advanced Technology for Digital Libraries, Paris,
[11] H. Hotelling. Analysis of a complex of statistical France, 1999. Springer.
variables into principal components. Journal of
Educational Psychology, 24:417–441 and 498–520, [25] A. Rauber, E. Pampalk, and D. Merkl. Using
1933. psycho-acoustic models and self-organizing maps to
create a hierarchical structuring of music by sound
[12] T. Kohonen. Self-Organizing Maps, volume 30 of similarities. In Proc. Int. Symposium on Music
Springer Series in Information Sciences. Springer, Information Retrieval (ISMIR), Paris, France, 2002.
Berlin, 3rd edition, 2001.
[26] P. Y. Rolland, G. Raskinis, and J. G. Ganascia.
[13] T. Kohonen, S. Kaski, K. Lagus, J. Salojärvi, Musical content-based retrieval: An overviewof the
J. Honkela, V. Paatero, and A. Saarela. Melodiscov approach and system. In Proc. ACM Int.
Self-Organization of a Massive Text Document Conf. on Multimedia, pages 81–84, Orlando, FL, 1999.
Collection. In Kohonen Maps, pages 171–182. Elsevier, ACM.
Amsterdam, 1999.
[27] J. W. Sammon. A nonlinear mapping for data
[14] N. Kosugi, Y. Nishihara, T. Sakata, M. Yamamuro, structure analysis. IEEE Transactions on Computers,
and K. Kushima. A practical query-by-humming 18:401–409, 1969.
system for a large music database. In Proc. ACM Int.
Conf. on Multimedia, pages 333–342, Los Angeles, [28] E. D. Scheirer. Music-Listening Systems. PhD thesis,
CA, 2000. MIT Media Laboratory, 2000.
[15] J. B. Kruskal and M. Wish. Multidimensional Scaling. [29] M. R. Schröder, B. S. Atal, and J. L. Hall. Optimizing
Number 07-011 in Paper Series on Quantitative digital speech coders by exploiting masking properties
Applications in the Social Sciences. Sage Publications, of the human ear. Journal of the Acoustical Society of
Newbury Park, CA, 1978. America, 66:1647–1652, 1979.
[16] K. Lagus and S. Kaski. Keyword selection method for [30] G. Tzanetakis and P. Cook. Musical genre
characterizing text document maps. In Proc. Int. classification of audio signals. IEEE Transactions on
Conf. on Artificial Neural Networks (ICANN), Speech and Audio Processing, 2002. To appear.
volume 1, pages 371–376, London, 1999. IEE.
[31] G. Tzanetakis, G. Essl, and P. Cook. Automatic
[17] M. Liu and C. Wan. A Study of Content-Based musical genre classification of audio signals. In Proc.
Classification and Retrieval of Audio Database. In Int. Symposium on Music Information Retrieval
Proc. Int. Database Engineering and Applications (ISMIR), 2001.
Symposium (IDEAS), Grenoble, France, 2001. IEEE.
[32] A. Ultsch and H. P. Siemon. Kohonen’s
[18] D. Merkl and A. Rauber. Document classification with Self-Organizing Feature Maps for Exploratory Data
unsupervised neural networks. In F. Crestani and Analysis. In Proc. Int. Neural Network Conf. (INNC),
G. Pasi, editors, Soft Computing in Information pages 305–308, Dordrecht, Netherlands, 1990. Kluwer.
Retrieval, pages 102–121. Physica Verlag, 2000.
[33] E. Wold, T. Blum, D. Kreislar, and J. Wheaton.
[19] F. Pachet and D. Cazaly. A taxonomy of musical Content-based classification, search, and retrieval of
genres. In Proc. Content-Based Multimedia audio. IEEE Multimedia, 3(3):27–36, 1996.
Information Access (RIAO), Paris, France, 2000.
[34] E. Zwicker and H. Fastl. Psychoacoustics, Facts and
[20] E. Pampalk. Islands of Music: Analysis, Organization,
Models, volume 22 of Springer Series of Information
and Visualization of Music Archives. Master’s thesis,
Sciences. Springer, Berlin, 2nd updated edition, 1999.
Vienna University of Technology, 2001.
http://www.oefai.at/˜elias/music/thesis.html.

Content-Based Organization and Visualization of Music Archives

Uploaded by

Copyright:

Available Formats

Content-Based Organization and Visualization of Music Archives

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Content-Based Organization and Visualization of Music Archives

Uploaded by

Copyright:

Available Formats

See

Content-based Organization and Visualization of

Conference Paper · January 2002

Elias Pampalk Dieter Merkl

TEXAS (TEXt AnalysiS) View project

The user has requested enhancement of the downloaded file.

ABSTRACT categorizations and usually consist of several hundred cat-

The remainder of this paper is organized as follows. Sec-

The MIDI format offers a wealth of possibilities, however,

Relative Fluctuation Strength

3.2 Specific Loudness Sensation 0.8

Fast Fourier Transformation (FFT). We use a window size 0

Figure 4: The data before and after the first feature 0

0.3 0.3 0.3

tinctive beats, which are characterized through a relatively

where m∗j is the updated model vector.

The second approach is to use general labels to describe

The geographic arrangement of the maps reflects the inher- 6. EXPERIMENTS

Figure 8: The visualization of the music collection

Figure 12: Close-up of Cluster 5 and 6 depicting

0.2 − 9 0.4 − 18.6

0.3 − 4.6 0.2 − 5.4 0.3 − 9.7

0.2 − 2.1 0.2 − 3.3 0.2 − 2.5 0.4 − 3.3

0.1 − 1.2 0.1 − 1.7 0.1 − 1.5 0.1 − 1.8

0.2 − 2.5 0.1 − 3.7

Experiments, as well as a Matlab°

[2] W. P. Birmingham, R. B. Dannenberg, G. H.

You might also like