A9 Sound and Music Description
A9 Sound and Music Description
A9 Sound and Music Description
Overview
[Download code.]
This is a peer assessed assignment in which you will learn how to describe sounds within a sound
collection using a set of temporal and spectral descriptors. You will learn to use the Freesound
API to download sounds and their descriptors. Then with the set of obtained descriptors and
by using a similarity measure, you will cluster the sounds and also classify them into predefined
classes. You will work with instrumental sounds, thus learning what audio features are useful for
their characterization in particular tasks.
Learning outcomes
By the end of the assignment, you will have learnt about:
• Several common audio descriptors, and how they can capture different aspects of a sound.
• Selecting relevant descriptors for characterizing a particular category of sound for a particular
application (e.g., clustering or classification).
• Using a simple audio/sound similarity based on different types of descriptors (Euclidean
distance).
• Using a simple classifier (k-NN) and a simple clustering algorithm (k-means).
• Using web APIs to access data (Freesound API)
There are four parts in this assignment, each one using the results and output of the previous
one, and hence to be done in that order:
• Task 1: Download several sounds and their descriptors from Freesound, of different acoustic
instruments playing single notes.
• Task 2: Choose two descriptors that result into a good clustering of the downloaded sounds.
• Task 3: Cluster the downloaded sounds using k-means to obtain clusters corresponding to
different instruments.
• Task 4: Classify a sound, using nearest neighbour classifier, into one of several instrumental
classes.
We provide the code for each task, thus no need to get involved in any programming. Read
the example document to see examples of using the code.
Relevant Concepts
Code
The assignment package includes two python scripts and an example pdf. In addition to these
scripts you need to add another script, freesound.py, which you can download from https:
//github.com/MTG/freesound-python. Download the freesound.py script and put it along
1
with the other two scripts in your working directory. You don’t have to write any additional
code or modify the scripts, you will just have to make calls to the provided functions. If you
are interested in knowing more about the Freesound API, you can see examples of using it from
python in: https://github.com/MTG/freesound-python/blob/master/examples.py and you
can read the API documentation here: http://www.freesound.org/docs/api/
These are the files that you should have in your working directory for doing this assignment:
• soundDownload.py: Scripts with functions to download sounds and descriptors from Freesound.
• soundAnalysis.py: Script with functions that do the analysis using the downloaded sound
descriptors.
• freesound.py: The Freesound API client needed for API queries.
• queryText: A single word or a string of words without spaces (use hyphens), typically the
name of the instrument. e.g. (eg. ”violin”, ”trumpet”, ”cello”, ”bassoon”, etc.)
• tag: tag to be used for filtering the searched sounds (e.g., ”multisample”, ”single-note”,
”velocity”, ”tenuto”, etc.).
• duration: min and max duration (seconds) of the sound to filter, e.g., (0.2,15).
• API Key: your api key, which you can obtain from: www.freesound.org/apiv2/apply/
• outputDir: path to the directory where you want to store the sounds and their descriptors.
• topNResults: number of results (sounds) that you want to download.
2
Sound Descriptors
In this assignment, you will automatically download the following temporal and spectral descrip-
tors for every sound that the query finds. You will then select a subset of those for clustering
and classification. Most of these are spectral descriptors and widely used in various Music Infor-
mation Retrieval tasks. For information about them you can read the Essentia documentation:
http://essentia.upf.edu/documentation/algorithms_reference.html
Here is the list of descriptors that are downloaded:
Index Descriptor
0 lowlevel.spectral centroid.mean
1 lowlevel.dissonance.mean
2 lowlevel.hfc.mean
3 sfx.logattacktime.mean
4 sfx.inharmonicity.mean
5 lowlevel.spectral contrast.mean.0
6 lowlevel.spectral contrast.mean.1
7 lowlevel.spectral contrast.mean.2
8 lowlevel.spectral contrast.mean.3
9 lowlevel.spectral contrast.mean.4
10 lowlevel.spectral contrast.mean.5
11 lowlevel.mfcc.mean.0
12 lowlevel.mfcc.mean.1
13 lowlevel.mfcc.mean.2
14 lowlevel.mfcc.mean.3
15 lowlevel.mfcc.mean.4
16 lowlevel.mfcc.mean.5
Write a short paragraph mentioning the query text, tag and duration used for each of the three
instruments you chose. Explain why you chose those instruments, and why you selected the
specific search query text, tag and duration.
Attach the <queryText> soundList.txt file for each instrument that the script created in
each instrument folder. You should compress them into one zip file which you upload.
3
different instruments are shown with different colors. In addition, you can also plot the Freesound
ID of the sounds with the points. Only plot the sounds of the 3 instruments chosen. Make sure
that in targetDir you only have the 3 instruments chosen.
Choose a good pair of descriptors for the sounds of the 3 instruments you downloaded in
Task 1. A good pair of descriptors leads to a point distribution where all the sounds of an
instrument cluster together, with a good separation from the other instrument clusters. Try out
different combinations of descriptor pairs. Write a short paragraph on the descriptor pairs you
tried out, justifying your choices for selecting those particular descriptors. Based on the spectral
and temporal features of the instruments and sounds, give an explanation of why (or why not) a
good clustering is (or is not) achieved with the chosen pairs of descriptors.
Attach the 2-D scatter plots for the best descriptor pairs. You can upload more than one
scatter plot, up to a maximum of three plots. You can save the plot as a .png files from the
plotting window. You should compress them into one zip file which you upload.