Content-Based Organization and Visualization of Music Archives
Content-Based Organization and Visualization of Music Archives
Content-Based Organization and Visualization of Music Archives
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/221573225
CITATIONS READS
250 77
3 authors, including:
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Dieter Merkl on 21 January 2014.
2. RELATED WORK
Powerspectrum
A vast amount of research has been conducted in the field 1 Modulation amplitude
7
of content-based music and audio retrieval. For example,
methods have been developed to search for pieces of music
Critical−band rate scale
with a particular melody. The queries can be formulated Bark 2 Fluctuation strength
8
by humming and are usually transformed into a symbolic
melody representation, which is matched against a database
of scores usually given in MIDI format. Research in this Spectral masking
3 Rythm pattern
9
direction is reported in, e.g. [1, 2, 10, 14, 26]. Other than
melodic information it is also possible to extract and search Typical rhythm pattern
Decibel
for style information using the MIDI format. For example, dB−SPL 4 10
in [5] solo improvised trumpet performances are classified
into one of the four styles: lyrical, frantic, syncopated, or
Equal−loudness levels
pointillistic. Phon 5
Loudness [dB−SPL]
format. As mentioned before, the raw audio format of music 80
in good quality requires huge amounts of storage. However,
60
humans can easily identify the genre of a piece of music even
if its sound quality is rather poor. Thus, for our experiments 40
we reduced the quality and as a consequence the amount of
20
data to a level which is computationally feasible while ensur-
ing that human listeners are still easily capable of identifying 0
the genre of a piece. In particular, we reduced stereo sound
quality to mono and down-sampled the music from 44kHz 10
−1
10
0 1
10
to 11kHz. Furthermore, we divided each piece into 6-second Frequency [kHz]
sequences and selected only every third of these after remov-
ing the first two and last two sequences to avoid lead-in and Figure 2: The equal loudness contours for 3, 20, 40,
fade-out effects. The duration of 6 seconds (216 samples) 60, 80, and 100 Phon are represented by the dashed
was chosen because it is long enough for human listeners to lines. The respective Sone values are 0, 0.15, 1, 4,
get an impression of the style of a piece of music while being 16, and 64 Sone. The dotted vertical lines mark the
short enough to optimize the computations. All in all, we positions of the center frequencies of the 24 critical-
reduced the amount of data by the factor of over 24 without bands. The dip around 2kHz to 5kHz corresponds
losing relevant information, i.e. a human listener is still able to the frequency spectrum we are most sensitive to.
to identify the genre or style of a piece of music given the
few 6-second sequences in lower quality.
In the first stage of the feature extraction process, the spe- 0.6
cific loudness sensation (Sone) per critical-band (Bark) is
0.4
calculated in 6 steps starting with the PCM data. (1) First
the power spectrum of the audio signal is calculated using a 0.2
Bark
a) 0 10
0 0 0
4 5 6
−0.1 1 0 0.5 1.8 0.9
1.0 20 25
Amplitude
Bark
0 0 0
b) 0 10 7
0.7
8
1.8
9
0.9
−1.0 1 1
0 2 4 6 0 2 4 6
0 0 0
Time [s] Time [s] Median
0.7
The SDH is a straightforward approach to visualize the clus- Using these attributes, geographic landmarks such as moun-
ter structure of the data set. Map units which are in the cen- tains and hills can be labeled with descriptions which in-
ters of clusters are represented by peaks while map units lo- dicate what type of music can be found in the respective
cated between clusters are represented as valleys or trenches. area. Details on the labeling of the Islands of Music can be
found in [20]. Another alternative is to create a metaphor
of weather charts. For example, areas with a strong bass are
5. USER INTERFACE visualized as areas with high temperatures, while areas with
In the previous sections we presented the technical compo-
low bass correspond to cooler regions. Hence, for example,
nents of the Islands of Music system. In this section we will
the user can easily understand that the pieces are organized
briefly discuss how the maps are intended to support the
in such a manner that those with a strong bass are in the
user to navigate through an archive and explore unknown
west and those with less bass in the east.
but interesting pieces.
bongobong
bongobong
themangotree
themangotree conga
conga
5
bfmc−1234
bfmc−1234 rhcp−easily
rhcp−easily
kiss
kiss rhcp−otherside
rhcp−otherside
bfmc−rock
bfmc−rock
rhcp−dirt
rhcp−dirt saymyname
saymyname
3 rhcp−getontop
rhcp−getontop sexbomb
sexbomb
4
6 rhcp−californication
rhcp−californication
rhcp−emitremmus
rhcp−emitremmus
bfmc−freestyler
seeyouwhen bfmc−freestyler
seeyouwhen
rhcp−scartissue
rhcp−scartissue
singalongsong sl−summertime
singalongsong sl−summertime
rhcp−universe
rhcp−universe
rhcp−velvet
rhcp−velvet
addict
addict allegromolto
allegromolto beethoven
beethoven fuguedminor
fuguedminor
ga−lie
ga−lie pr−angels
pr−angels lovemetender
lovemetender therose
therose
br−punkrock
br−punkrock ga−innocent
ga−innocent pinkpanther requiem
requiem vm−bach
vm−bach
limp−lesson
limp−lesson pinkpanther sml−icecream
sml−icecream
limp−stalemate
limp−stalemate ga−time
ga−time vm−classicalgas shakespeare
shakespeare vm−brahms
vm−brahms
vm−classicalgas
vm−toccata
vm−toccata
elvira
elvira branden,
branden, forelle
forelle
d3−kryptonite
d3−kryptonite pachelbl
pachelbl minuet,
minuet, schindler
schindler
fbs−praise ga−anneclaire
ga−anneclaire beautyandbeast
beautyandbeast
fbs−praise schwan
schwan walzer
walzer
limp−rearranged
limp−rearranged thecircleoflife
thecircleoflife
korn−freak limp−nobodyloves ga−heaven
korn−freak limp−nobodyloves ga−heaven stormsinafrica zapfenstreich
stormsinafrica zapfenstreich
limp−show
limp−show
limp−99
limp−99 pr−neverenough limp−wandering
pr−neverenough limp−wandering tell
tell zarathustra
zarathustra
pr−deadcell
pr−deadcell
rem−endoftheworld
rem−endoftheworld pr−binge
pr−binge
wildwildwest
wildwildwest adagio, air
adagio, air
avemaria
avemaria
jurassicpark
jurassicpark elise, flute
elise, flute
leavingport fortuna,
leavingport fortuna, funeral
funeral
merry
merry kidscene, mond
kidscene, mond
mountainking
mountainking
nachtmusik
nachtmusik
Figure 11: Close-up of Cluster 3 and 4 depicting
4×3 map units.
The final two clusters which we will describe in detail are de- For detailed evaluations the model vectors of the SOM can
picted in Figure 12. Cluster 5 represents concert music and be visualized as depicted in Figure 13. As indicated by the
classical music used for films, including the well known Star- weather charts the lowest fluctuation strength values are lo-
wars theme (starwars), the theme of Indiana Jones (indy), cated in the south-east of the map and can be found in map
and the end credits of Back to the Future III (future). How- unit (14,1). It is interesting to note the similarity between
ever, there are also two pieces in this cluster which do not the typical rhythm pattern of Für Elise (cf. Figure 5(a))
fit this style, namely Yesterday by the Beatles (yesterday) and this unit. On the other hand the unit (6,2) which repre-
and Morning has broken by Cat Stevens (morningbroken). sents Freak on a Leash is not a perfect match for its rhythm
pattern as a comparison to Figure 5(b) reveals. In partic-
Cluster 6 represents peaceful classical pieces such as Für ular the vertical line at about 7Hz is emphasized stronger
Elise by Beethoven (elise), Eine kleine Nachtmusik by Mozart in Freak on a Leash than in its corresponding model vector.
(nachtmusik), Fremde Länder und Menschen by Schumann Note, that the highest fluctuation strength values of Freak
(kidscene), Air from Orchestral Suite #3 by Bach (air), and on a Leash are around 4.2 while the model vector only cov-
Trout Quintet by Schubert. ers the range up to 3. Generally, the model vectors are a
good representation of the rhythm patterns contained in the
Although the results we obtained are generally very encour- collection, as each model vector represents the average of all
aging, we have come across some problems which point out pieces mapped to it.
(1,10) (2,10) (3,10) (4,10) (5,10) (6,10) (7,10) (8,10) (9,10) (10,10) (11,10) (12,10) (13,10) (14,10)
0.5 − 5.1 0.4 − 4.5 0.4 − 9.7 0.5 − 8.9 0.6 − 7.9 0.5 − 12.4 0.4 − 13 0.6 − 9.7 0.5 − 11.6 0.6 − 12.9
0.4 − 4.9 0.3 − 3.7 0.3 − 4.8 0.5 − 5.2 0.6 − 6.2 0.5 − 11.9 0.5 − 9.6 0.5 − 10.2 0.5 − 9.5 0.5 − 13.3
0.3 − 4.9 0.4 − 5.1 0.3 − 6.1 0.4 − 5.7 0.4 − 7.1 0.5 − 9.7 0.5 − 5.5 0.6 − 6.8 0.4 − 8.1 0.4 − 14.4
0.3 − 6.2 0.4 − 3.1 0.3 − 4.8 0.2 − 7.2 0.4 − 5.2 0.5 − 5.7 0.4 − 4.6 0.4 − 7.2 0.3 − 7.6 0.2 − 9.5
0.3 − 4.3 0.3 − 3.1 0.4 − 3.3 0.4 − 4.1 0.3 − 3.6 0.2 − 4.6 0.4 − 4.9 0.3 − 3.7 0.4 − 5.5 0.2 − 6.5
0.2 − 4.7 0.3 − 2.6 0.1 − 4.5 0.3 − 3.1 0.2 − 2.7 0.3 − 3.7 0.3 − 3.9 0.4 − 5.1
0.1 − 1.9 0.1 − 1.9 0.2 − 2.3 0.1 − 1.9 0.1 − 3.2 0.2 − 2.2 0.3 − 3.1 0.2 − 2.9 0.2 − 3.5 0.2 − 4.5
0.1 − 1.3 0.1 − 1.5 0.1 − 1.9 0.2 − 2.1 0.2 − 2.3 0.3 − 2.2 0.2 − 2.8 0.3 − 4.8
0.1 − 1.3 0.1 − 1.8 0.1 − 1.7 0.2 − 2.5 0.2 − 2.3 0.2 − 3.4 0.3 − 3.3 0.3 − 3.8
0.4 − 7
0.2 − 5
0.3 − 4
(1,9) (2,9) (3,9) (4,9) (5,9) (6,9) (7,9) (8,9) (9,9) (10,9) (11,9) (12,9) (13,9) (14,9)
0.2 − 4.9 0.3 − 5.1 0.3 − 5.1 0.4 − 3.8 0.2 − 4.2 0.3 − 4.1 0.3 − 5.1
0.3 − 2.6 0.2 − 3.2 0.2 − 2.8 0.2 − 3.3 0.3 − 2.8 0.2 − 3.4
0.4 − 6.2
(1,7) (2,7) (3,7) (4,7) (5,7) (6,7) (7,7) (8,7) (9,7) (10,7) (11,7) (12,7) (13,7) (14,7)
0.6 − 7
0.2 − 4
(1,6) (2,6) (3,6) (4,6) (5,6) (6,6) (7,6) (8,6) (9,6) (10,6) (11,6) (12,6) (13,6) (14,6)
0.4 − 5.8
0.2 − 3.2 0.3 − 2.4 0.3 − 2.6 0.3 − 2.6 0.3 − 3.9 0.4 − 4.1
(1,5) (2,5) (3,5) (4,5) (5,5) (6,5) (7,5) (8,5) (9,5) (10,5) (11,5) (12,5) (13,5) (14,5)
0.3 − 5
0 − 2.6
(1,4) (2,4) (3,4) (4,4) (5,4) (6,4) (7,4) (8,4) (9,4) (10,4) (11,4) (12,4) (13,4) (14,4)
0.3 − 5.6 0.4 − 3.7 0.4 − 3.7 0.5 − 4.8
0.1 − 3
(1,2) (2,2) (3,2) (4,2) (5,2) (6,2) (7,2) (8,2) (9,2) (10,2) (11,2) (12,2) (13,2) (14,2)
0.3 − 3
0 − 1.1
0 − 0.7
(1,1) (2,1) (3,1) (4,1) (5,1) (6,1) (7,1) (8,1) (9,1) (10,1) (11,1) (12,1) (13,1) (14,1)
0.4 − 7.1
0.3 − 2.6
0 − 1.1
0 − 0.6
Figure 13: The model vectors of the 14×10 music SOM. Each subplot represents the rhythm pattern of a
specific model vector. The horizontal axis represents modulation frequencies from 0-10Hz the vertical axis
represents the frequency bands Bark 1-20. The range depicted to the left of each subplot depicts the highest
and lowest fluctuation strength value within the respective rhythm pattern. The gray shadings are adjusted
so that black corresponds to the lowest and white to the highest value in each pattern.