Automatic Prediction of Hit Songs (2005)
Automatic Prediction of Hit Songs (2005)
Automatic Prediction of Hit Songs (2005)
hit song detection, We explore the automatic analysis of music to identify likely hit songs.
music We extract both acoustic and lyric information from each song and
classification separate hits from non-hits using standard classifiers, specifically Support
Vector Machines and boosting classifiers. Our features are based on
global sounds learnt in an unsupervised fashion from acoustic data or
global topics learnt from a lyrics database. Experiments on a corpus of
1700 songs demonstrate performance that is much better than random.
The lyric-based features are slightly more useful than the acoustic
features in correctly identifying hit songs. Concatenating the two features
does not produce significant improvements. Analysis of the lyric-based
features shows that the absence of certain semantic information indicates
that a song is more likely to be a hit.
2 Methods
There is a large body of literature on song writing and indeed some books even claim to teach
how to write a hit song (e.g. [3]). Clearly it isn’t as easy as supposed but certainly melody,
chords, lyrics and instrumentation play a role. In this study, we use a very simple approach. We
extract very general acoustic and lyric-based features from songs then use standard classifiers
to separate hits from non-hits. Our methods are described below.
1
sounds are pre-learned from a corpora of widely varying music. Our method is similar to those
which convert songs to vectors according to sounds learnt by supervised clustering (e.g. [5, 1]).
We use unsupervised clustering since we don’t know which sounds would be optimal for our
classification task.
Similar to previous studies, we characterize sounds using MFCC features thus focusing on tim-
brel aspects of the music. Many other characterizations are possible however such as those
based on rhythmic or melodic features. We first convert the songs in the training corpora to a set
of MFCC features then use K-means clustering to find the N most prominent clusters. We can
then convert any song to an N-dimensional vector representation by computing the likelihoods
of the sound represented by each cluster occuring in that song.
2
(even SVMs) although for computational reasons they are typically very simple. Each weak
learner focuses its attention on those training vectors where the previous weak learners failed.
We use a variant of boosting proposed in [10] in which the weak learners are simple linear clas-
sifiers on one dimension. This offers the advantage of being less sensitive to spurious features.
Components of the feature vector that do not add any advantage are ignored at the expense
of more promising components. Additionally, we are able to analyze the relative importance
of each feature in a principled way; a simple inspection of the weak learners highlights those
features that contribute most to classification.
3 Databases
In this section we describe the databases used in our study.
3
Note that in many cases, the lyrics from this site are not necessarily a verbatim transcript of
each song. For example, sometimes repeats of the chorus are simply denoted by “chorus” or not
transcribed at all. This may have been why our attempts to construct features based on repeating
properties of the lyrics were less than successful since the transcripts did not always faithfully
represent the lyrics.
4 Experiments
To investigate the performance of our proposed hit song classifiers we conduct a series of ex-
periments described below.
4
1
0.8
ROC area
0.6
0.4 SVM
Boosting
0.2 Random
0
0 20 40 60 80 100 120 140
Number Sounds
Figure 1: Average ROC area for acoustic-based features with various numbers of sounds for
SVM and boosting classifiers
4.2 Classification
We run experiments using 10-fold cross validation. This mitigates somewhat the effect of exper-
imenting with such a small database since by averaging over 10 “cuts” of the data into testing
and training sets, we reduce the impact of a particularly easy or particularly hard set. Our figure
of merit for each classifier is the area under the Receiver Operating Characteristic (ROC) curve.
An ROC curve plots sensitivity vs. (1-specificity), essentially describing the trade-off between
false negatives and false positives as the classifier’s threshold is varied. Random classifiers have
ROC area 0.5 and perfect classifiers have ROC area 1.0.
Figures 1 and 2 show the ROC area averaged over the 10 cross validation cuts of the experimen-
tal database for SVM and boosting classifiers trained on the acoustic-based and lyrics-based
features. We show results for varying numbers of audio sounds and topics. Also shown for
reference is the 0.5 ROC area which would result from random classification.
From these plots we see that even with the simple features and classifiers proposed, we can
achieve better than random performance. We also see that there the results are comparable for
the two classifiers. Lyric-based features appear to give slightly better performance overall than
acoustic-based features. The best result obtained using lyrics features is average ROC area 0.68
obtained using 8-topic models. This is slightly better than 0.66, the best result obtained using
acoustic features obtained for both 32 and 128-sound models.
We now consider combining acoustic and lyric-based features. We achieve this by concatenat-
ing the vectors for the two representations. Figure 3 shows results for this experiment. For
simplicity, this plot only shows results for concatenating equal length vectors. For example
the “Vector Size 16” result is obtained using feature vectors formed by concatenating length
8 acoustic vectors and length 8 lyric vectors. We also investigated non-equal combinations of
acoustic and lyric vectors and saw comparable results.
The best result for combining acoustic and lyric features is average ROC area 0.69. This is
obtained by concatenating 32-sound audio features and 8-topic lyric features. This is only
5
1
0.8
ROC area
0.6
0.4 SVM
Boosting
0.2 Random
0
0 20 40 60 80 100 120 140
Number Topics
Figure 2: Average ROC area for lyric-based features with various numbers of topics for SVM
and boosting classifiers
0.8
ROC area
0.6
0.4 SVM
Boosting
Random
0.2
0
0 50 100 150 200 250 300
Vector size
Figure 3: Average ROC area for combined acoustic and lyrics features with varying vector sizes
for SVM and boosting classifiers
6
moderately better than average ROC area 0.68, the best result obtained using lyrics alone. Given
the relatively small size of the experiment, it is unclear then whether the combined feature set
improves performance.
5 Analysis of Results
As discussed earlier, a nice property of the boosting classifier used is that by analyzing the
weights of the weak learners we can identify which dimensions of the feature vectors are most
helpful for classification. We therefore performed this analysis for the 8-topic lyric vectors.
Table 1 shows the most frequent words which characterize each topic in the 8 topic case. By
analyzing the boosting models, we found that by far the most important features for distinguish-
ing hits were Topic 1 and Topic 6. These appear to describe “heavy metal” and “peaceful/new
age” music. Interestingly, nearly all of the weak learners learnt negative boundaries. That is,
the absence of Topic 1 or Topic 6 meant the song was more likely to be a hit. Topic 4 which
describes generic love songs was the main topic whose presence meant the song was likely to
be a hit. However, its impact was much lower than Topics 1 or 6.
7
tion within smaller groupings, such as music by decade, or music by style. Different kinds of
acoustic and lyric features should also be studied. In particular, rhythmic and melodic features
merit exploration. Finally, since popular music is continually evolving, time-varying classifiers
should be studied.
References
[1] A. Berenzweig, D. P. W. Ellis, and S. Lawrence. Anchor space for classification and
similarity measurement of music. In ICME 2003, pages 29–32, 2003.
[3] J. Blume. 6 Steps to Songwriting Success: the Comprehensive Guide to Writing and
Marketing Hit Songs. Bilboard Books, revised and expanded edition, 2004.
[4] C. Burges. A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining
and Knowledge Discovery, 2(2):121–167, 1998.
[5] J. T. Foote. Content-based retrieval of music and audio. In SPIE, pages 138–147, 1997.
[8] B. Logan, A. Kositsky, and P. Moreno. Semantic analysis of song lyrics. In ICME 2004,
2004.
[9] R. E. Schapire. The strength of weak learnability. Machine Learning, 5(2):197–227, 1990.
[10] K. Tieu and P. Viola. Boosting image retrieval. In IEEE International Conference on
Computer Vision, pages 228–235, 2000.
[11] G. Tzanetakis and P. Cook. Music genre classification of audio signals. IEEE Transactions
on Speech and Audio Processing, 5(10):293–302, July 2002.