Raga Vijay ICPR14 v2
Raga Vijay ICPR14 v2
Raga Vijay ICPR14 v2
net/publication/277258868
CITATIONS READS
19 1,739
3 authors, including:
All content following this page was uploaded by Vijay Kumar on 27 May 2015.
Abstract—In this work, we propose a method to identify the from the main melody or bass line [23] of the music and
ragas of an Indian Carnatic music signal. This has several inter- classifying using K-NN, naive Bayes [24] or SVM [25] .
esting applications in digital music indexing, recommendation and
retrieval. However, this problem is hard due to (i) the absence Inspite of the above mentioned challenges, there is an
of a fixed frequency for a note (ii) relative scale of notes (iii) underlying structure found in a raga that can be captured.
oscillations around a note, and (iv) improvisations. In this work, For example, one can identify a raga by finding its most
we attempt the raga classification problem in a non-linear SVM prominent swara by counting the number of occurrences or
framework using a combination of two kernels that represent the duration of each swara [9]. This may give insights into the
the similarities of a music signal using two different features- set of notes and their frequencies in a raga, thereby helping in
pitch-class profile and n-gram distribution of notes. This differs identification. Gamakas, the variations of pitch around a note
from the previous pitch-class profile based approaches where can be used to identify a raga as only certain type of variations
the temporal information of notes is ignored. We evaluated the
proposed approach on our own raga dataset and CompMusic
are allowed in each raga. Characteristic-motifs, similar to
dataset and show an improvement of 10.19% by combining the Pakads in Hindustani music, are the repetitive characteristics
information from two features relevant to Indian Carnatic music. phrases of a raga that provide vital information in identifying
a raga as these characteristic phrases vary from one raga to
another.
I. I NTRODUCTION In this work, we attempt the raga classification problem
Raga - the melodic framework or formalization of melodies using a non-linear SVM and a combination of two different
found in Indian Classical music (Carnatic and Hindustani) kernels. We introduce the kernels to suit the Indian Carnatic
composes of a sequence of swaras depicting the mood and sen- music that represents the similarities of a raga based on pitch-
timents. Indian music has seven basic swaras (notes) namely class profile and n-gram note distribution. This differs from
(Sa, Ri, Ga, Ma, Pa, Dha, Ni). There are hundreds of ragas the previous pitch-class profile based approaches where the
in Indian Carnatic music derived from 72 parent or Janaka temporal information of notes is ignored. This approach allows
ragas [20], formed by the combination of 12 swarasthanas. us to learn a decision boundary in the combined space of Pitch-
Identifying ragas is a central problem for appreciating, com- class profile and n-gram note distribution, where different
paring and learning Indian music. Due to the overwhelming ragas are linearly separable. Given a music piece, we initially
number, complicated structure and minor variations of ragas, extract the predominant pitch values at every instant of time,
even humans find it difficult to identify them, without years convert to cents scale, map them to single octave and identify
of practice. In this work, we report our initial attempt to the stable note regions similar to [2]. These notes are then
automatically identify the Ragas in Indian Carnatic music. used to construct Pitch-class profile and n-gram distribution.
While Pitch-class profile represent the distribution of pitch
The identification of ragas is very cognitive, and comes values, n-gram distribution provide the information about the
only after adequate amount of exposure. For automatic iden- occurrence of short sequence of notes. Thus our approach
tification, some of the characteristics of ragas have to be incorporates the information from both of these features unlike
converted into appropriate features. This becomes particularly the previous approaches [2], [4] where either of these features
challenging for Indian music due to the following reasons were used but not both. We evaluate our approach on an
which needs to be addressed while converting a music piece extensive CompMusic dataset [2] consisting of 170 tunes
into swara strings. (i) A music piece may be composed from corresponding to 10 ragas and achieve an improvement of
multiple instruments during a performance. (ii) Unlike Western 10.19% in accuracy.
music, the notes in Indian music are not on a absolute scale
but on a relative scale (iii) There is no fixed starting swara in a Related Works: There are some attempts made in identi-
raga. (iv) Notes in Indian music do not have a fixed frequency fying the raga in a music. One method for raga classification
but rather band of frequencies (oscillations) around a note. (v) is through the transcription of raga directly into swaras at
The sequence of swaras in the ragas are not fixed and various every intervals of time and classifying using a classifier such
improvisations are allowed [21] while citing a raga as long as K-NN or SVM. In [6], relative frequencies are used
as the characteristics of raga are intact. These factors pose a instead of using an absolute frequencies as the notes have
serious challenge for automatic detection of ragas. Also, these fixed ratio of frequencies. Though this approach addresses
factors make the problem distinct from applications such as the issue with scale, it cannot handle multiple instruments.
genre recognition, comparison of vibrato music [22], emotions In [7], authors try to rectify this issue by identifying and
recognition etc., which are usually solved by extracting well extracting the fundamental frequency of the singer. All the
defined features such as MFCC [25], melodic histogram [23] other frequencies in the scale are then marked down based on
their respective ratio with the identified fundamental frequency
*Equal contribution of the singer. Simple string matching techniques are then
Sheet1
Simple K-NN classifier with KL-divergence distance is used to to another. For every raga, only a certain type of
classify the ragas. They conducted experiments on extensive Gamakas (variations) are allowed around a swara
dataset of 10 ragas consisting of 170 tunes with at least 10 giving an important clue for identification.
tunes in each raga. Note that, we conduct our experiments 4) Characteristic-motifs - are the characteristic-phrases
on this dataset and achieve superior performance. Also, note of a raga in Carnatic music which help in identifying
that in the approaches based on pitch-class profiles, temporal a raga [5]. These are similar to Pakads in Hindustani
information of notes is ignored. However, in our approach, music.
we partially capture the temporal information of notes of a
raga by computing the n-gram histogram. By combining the
n-gram histogram with pitch-class profiles, performance is B. Kernels for Pitch-Class profiles:
further improved. Pitch-class profile distribution is used as a feature to
Kernels are used earlier to improve the performance of classify ragas in [2], [10]. They provide a discriminative
many audio and music related tasks such as classification information based on distribution of notes in a raga. Even in
and segmentation [16]–[18] by designing application specific different ragas with a same set of notes, the phrases often differ
kernels for SVM. enough that one can see a recognizable difference in their pitch
profiles. We thus define a kernel for pitch-class profiles that
define the similarity of ragas based on pitch-class distribution.
II. K ERNELS FOR I NDIAN C ARNATIC M USIC
We use the procedure employed in [2] to obtain the pitch-
In this section, we describe some of the characteristic
class profile of a music signal. Pitch values are detected at
features of Indian music and describe how the information
regular intervals of 10 ms from a given polyphonic audio music
provided by them can be used to identify a raga.
using a predominant melody extraction algorithm such as [13].
The frequency at each interval of the signal is determined by
A. Features of Raga applying a Discrete Fourier Transform to different blocks of
Authors in [1], [2] characterize the raga by following the audio and considering only the most energetic frequencies
characteristics: present in the audio signal. Pitch values are then extracted from
these frequencies and are tracked based on how continuous
1) arohana and avarohana: A raga has a fixed ascent a pitch values are in time and frequency domain. The pitch
(arohana) and descent (avarohana) swaras, without values are then converted to the cents scale with a tuning
any strictness of their sequence in recitation. There scale of 220 Hz. All the pitch values are then mapped to a
are certain rules that are essential while citing a raga, single octave and stable note regions are identified. Finally,
though are not strictly followed. Also, many ragas pitch values in these regions are quantized to nearest available
0 0
1 1
0 0
0 0
0 1
11 12
6 6
0 0
0 0
0 0
0 0
0 0
0 0
1 1
0 0
0 0
0 0
7 8
16 13
2 2
1 0
0 0
0 0
0 0
0 0
pitch&values&for&two&instances&of&pakad& 0 0 2"grams(for(two(instances(of(pakad(
0 0
Melody&
800$ 30"
0 0
0 0
Audio&signal&
0 0
Tune$1$
extrac3on& Melody&pitch& 700$
2
1
Tune$2$
1
1
0
2 25"
Tune"1"
Tune"2"
system& sequence& 600$
0
0
0
0
0
0
0 0
0 0 20"
500$ 0 0
Pitch&
0 0
0 0
0 0
400$ 2 3 15"
1 1
3 0
2 2
300$
10"
200$
5"
100$
Time&
0$ 0"
1$
93$
185$
277$
369$
461$
553$
645$
737$
829$
921$
1013$
1105$
1197$
1289$
1381$
1473$
1565$
1657$
1749$
1841$
1933$
2025$
2117$
2209$
2301$
2393$
2485$
2577$
2669$
2761$
2853$
2945$
3037$
3129$
3221$
3313$
3405$
3497$
3589$
3681$
3773$
3865$
1" 11" 21" 31" 41" 51" 61" 71" 81" 91" 101" 111" 121" 131" 141"
!100$
Fig. 2: Procedure to compute the n-gram. Predominant melody Fig. 3: Two tunes belonging to same raga may have different
is extracted from a given polyphonic audio signal and a pitch pitch-class profiles (left) but their n-gram histogram of notes
class profile is constructed. From the pitch class profile, stable (right) show high similarity. We have used n = 2.
notes are identified and n-grams are computed.
The histogram bins can be weighted in multiple ways. In φ(xi ) = [H1T H2T ...HnT ]T = [HkT ]T where k = 1, 2, . . . n
[2], two types of binning are considered. One is based on the Hk is a k-gram histogram of notes.
number of instances of a note, and another is the total duration
of a note over all instances in the music piece. We consider the The use of all the k-grams (k = 1, 2 . . . , n) is motivated
duration weighting of a bin which basically defines the average by the fact that occurrence of the characteristic-phrases in a
time spent around a raga. We define a kernel for pitch-class music is usually noisy and may contain insertion of notes in
profile that gives a measure of similarity between two music between them. We limited to 4-gram histogram of notes as
pieces. It is a well known practice to use Kullback-Leibler it becomes computationally expensive to go beyond 4-grams.
(KL) divergence for comparing histograms. However, as KL- In Fig 3, we show an example that demonstrates how the n-
divergence is not symmetric, we symmetrize it as gram histogram of two tunes are highly similar even though
the pitch class profiles show some variation.
D̂KL (ψ(xi ), ψ(xj )) = dˆKL (ψ(xi )|ψ(xj ))+dˆKL (ψ(xj )|ψ(xi ))
We define a kernel to represent the similarity of ragas
X ψ(xi (k)) based on n-gram histogram of notes. We found the radial basis
dˆKL (ψ(xi )|ψ(xj )) = ψ(xi (k)) log (1) function (RBF) kernel to be effective for capturing the this
ψ(xj (k))
k similarity defined as,
where ψ(xi (k)) is the k-th bin of the pitch-class profile ψ(xi ) ||φ(xi ) − φ(xj )||22
of the music sample xi . Finally, we create a kernel for Pitch- K2 (i, j) = exp(− ) (3)
2σ 2
class profile as follows,
K1 (i, j) = exp(−DKL (ψ(xi ), ψ(xj ))) (2) In the absence of tonic note, we align the n-gram distri-
butions of two tunes in terms of locations of corresponding
scale degrees. Given a n-gram distribution of two tunes, we
C. Kernel for n-gram distribution: initially align the 1-gram distribution of two tunes through
Pitch-class profiles provide information about the distribu- cyclic rotation as explained in the previous section. Once
tion of notes, however, they miss the temporal information of the correspondence between scale degrees of two 1-grams is
notes. Ragas usually contain repetitive Characteristic-phrases obtained, we use this correspondence to align the n-grams.
or motifs which provide a complementary information in Note that, both of the above defined kernels based on
identifying a raga. However, extracting these characteristic- Pitch-class profiles and n-gram histogram are valid as the term
motifs from a music itself is a challenging problem. Even exp(−a) is always positive and greater than 0, for a ≥ 0.
humans find it difficult to identify them without years of
practice. This is basically due to complex structure of these
III. C LASSIFICATION
characteristic-motifs and their occurrence. These are usually
spread throughout the raga without any specific time in their We identify a raga by combining the information from two
occurrence. They may also contain insertions of other swaras different and relevant features, Pitch-class profiles and n-gram
(notes) in between making the problem difficult to identify distribution. We incorporate this systematically into an SVM
Polyphonic'audio' Pitch'class'profile'
Kernel'for'Pitch6
class'profile'
K1()'
Melody'Extrac.on'
algorithm' α1K1'+'α2K2'
Predominant'melody'
SVM'
Kernel'for'n6gram'
histogram'
K2()'
N6gram'histogram'
Fig. 4: Overview of our proposed approach. Predominant melody is extracted from a given polyphonic audio signal and pitch
values are identified. An SVM model is learnt using two different non-linear kernels that define the similarities of ragas based
on Pitch-class profiles and n-gram histogram of notes.
framework by defining a combined kernel over them. This is in For multi-class problems, one can build multiple binary
contrast to previous pitch-class profile based approaches where classifiers and adopt strategies like one-vs-rest or one-vs-one to
the temporal information of notes is ignored. infer a decision or extend the binary SVM to handle multiple
classes through a single optimization.
Given a set of training pairs (xi , yi ) ∈ X ×Y, xi ∈ Rd , yi ∈
{−1, 1}, traditional SVM tries to find the maximum-margin The type of kernel and parameters to select depend largely
hyperplane defined by the parameter w that separates points on the application at hand. Several kernel functions have been
with yi = 1 from those having yi = −1. This is achieved by proposed in the literature starting from generic RBF kernel to
solving the following optimization problem: an application specific kernels [15]. Any clue for similarity
n could be captured in the form of kernel, as long as it is closed
1 X
arg min ||w||2 + C ξi under positive semi-definiteness, K ≥ 0. One can also define
w,b,ξi 2
i=1 a kernel as a linear combination of the individual kernels each
s.t. yi (wt xi + b) ≥ 1 − ξi , ξi ≥ 0, ∀i (4) representing different kinds of similarity, under a condition that
each individual kernel is valid.
where ξi ’s are the slack variables denoting the violations made X
by the training points. During inference, a test sample x is pre- K= αi Ki (7)
dicted by finding the sign of (wt x + b). Alternatively, solution i
could be achieved by maximization of dual formulation: The weights αi can be selected heuristically based on the
n
X 1X cross-validation errors, or learned in a multiple kernel learning
J(α) = αi − αi αj yi yj xti xj framework [14]. For raga classification problem, we define our
i=1
2 i,j
X kernel as linear combination α1 K1 and α2 K2 of two different
s.t. αi yi = 0, 0 ≤ αi ≤ C. (5) kernels representing similarities of an audio based on Pitch-
class profiles and n-gram histogram of notes. This provides
For the above formulation,
Pm a test sample x is predicted by a mechanism to systematically combine the similarities using
finding the sign of i=1 αi yi xti x + b where m denote the two heterogeneous features into a single max-margin frame-
number of support vectors. The dual formulation allows to ap- work. The weights αi are selected here based on the cross-
ply the Kernel Trick to compute the dot product in the feature validation error. Our entire approach is summarized in Fig 4.
space < φ(x1 ), φ(x2 ) >= K(x1 , x2 ), K : Rn × Rn → R
without explicitly computing the features φ(xi ). With the use IV. R ESULTS AND D ISCUSSIONS
of a kernel, above formulation becomes
n A. Datasets:
X 1X
J(α) = αi − αi αj yi yj K(xi , xj ) We evaluate the performance of our proposed approach on
2 i,j
i=1 our dataset and CompMusic dataset [2]. These two datasets are
summarized in Table I. While our data set is small consisting
X
s.t. αi yi = 0, 0 ≤ αi ≤ C. (6)
of only 4 ragas with limited instruments, CompMusic dataset
Any
Pm given test sample can be labeled using the sign of is extensive consisting of 10 ragas and variety of musical
i=1 αi yi K(xi , x) + b. instruments.
TABLE I: Summary of Our and CompMusic dataset. regions of the pitch class are considered while in P3, all the
regions are considered. The difference in P1 and P2 lies in
Dataset Composition of tunes type of weighting the bins. In P1, a note bin is weighted by the
Our dataset 60 tunes, 5 artists, 4 ragas, 2 instruments number of instances of the note, and in P2 by the total duration
CompMusic over all instances of the note in the music piece. In [2], k-NN
170 tunes, 31 artists, 10 ragas, 27 instruments
dataset classifier with KL-divergence distance is used as a classifier.
For our approach, we report results using n = 2, 3 and 4 grams
while calculating the n-gram kernel K2 . We selected values of
(i) Our Dataset: To evaluate our method, we created α1 and α2 as 5 and 3 respectively based on cross-validation
a dataset comprising of 4 ragas namely Kalyanavasantham, errors. We set the value of RBF parameter σ = 0.01 and
Nattakurinji, Ranjani, and Bilhari. All audio files are of type C = 5 through a grid-search and using cross-validation errors.
instrumental of type flute and are of approximately 20 minute Results are shown in Table IV and the best results for both
duration from CD recordings. We divided these full length methods are shown in Table III. It is clear that, our approach
recordings into 1 minute audio clips to create our dataset. Each which combines Pitch-class profiles and n-gram histogram of
clip is 44.1 KHz sampled, stereo-channel and m4a encoded. notes achieves superior performance compared to [2] where
only pitch-class profile is used.
(ii) CompMusic Dataset: We test our method on an-
other dataset from the authors of [2]. CompMusic dataset is
an extensive dataset that includes compositions from several TABLE III: Comparison of best results of approach [2] with
artists spanning several decades, male and female, and all the our approach on various datasets (%).
popular instruments. The clips were extracted from the live
performances and CD recordings of 31 artists, both vocal CompMusic
Method Our dataset
(male and female) and instrumental (Veena, Violin, Mandolin dataset
. Koduri et al. [2] 96.3 73.2
and Saxophone) music. The dataset consisted of 170 tunes
from across 10 ragas with at least 10 tunes in each raga (except Our approach 97.3 83.39
Ananda Bhairavi with 9 tunes). The duration of each tune
averages 1 minute. The tunes are converted to mono-channel,
22.05 kHz sampling rate, 16 bit PCM. The composition of
dataset is shown in Table II. TABLE IV: Comparison of performance of approach [2] for
various pitch-class profiles with our approach on our dataset
(%).
TABLE II: Composition of CompMusic dataset.
Method 1-NN 3-NN 5-NN 7-NN
Average P1 [2] 96.1 95.0 93.67 90.17
Total
Raga duration Composition of Tunes P2 [2] 96.3 96.0 95.5 94.0
tunes
(sec) P3 (12 bins) [2] 94.2 86.2 83.2 79.5
Abheri 11 61.3 6 vocal, 5 instrumental P3 (24 bins) [2] 94.0 85.0 79.6 71.5
Abhogi 10 62.0 5 vocal, 5 instrumental . P3 (36 bins) [2] 94.8 92.3 88.5 83.5
Ananda P3 (72 bins) [2] 94.8 90.3 81.8 75.8
09 64.7 4 vocal, 5 instrumental
Bhairavi P3 (240 bins) [2] 92.5 83.0 84.8 70.2
Arabhi 10 64.9 8 vocal, 2 instrumental
Our approach (2-gram) 96.0
Atana 21 56.7 12 vocal, 9 instrumental
Our approach (3-gram) 97.7
Begada 17 61.1 9 vocal, 8 instrumental
Our approach (4-gram) 97.3
Behag 14 59.7 12 vocal, 2 instrumental
Bilhari 13 61.3 10 vocal, 3 instrumental
Hamsadwani 41 57.0 14 vocal, 27 instrumental In another experiment, we tested our approach on Comp-
Hindolam 24 60.0 15 vocal, 9 instrumental Music dataset. We used the same experimental procedure as
described above. Results are shown in Table V with best results
shown in Table III. Results in the table clearly demonstrates
B. Results the superiority of our approach. The best accuracy obtained
by our approach is 83.39% which is higher than their best
We initially conducted experiments on our dataset. We
reported accuracy 73.2%. This clearly agrees to our intuition
implemented the feature extraction procedure for ragas as
that including temporal information of a raga with pitch-class
described in [2]. Polyphonic audio signals are converted to
profiles improves the performance.
predominant melody using melody extraction software [11].
Pitch-class profiles and n-grams are extracted as explained in
the Section 2. We randomly divide the dataset into training and C. Effectiveness of n-grams and Pitch-class profiles:
testing set so that half is used for training and other half is
used for testing. We conducted 10 trials with random training In order to understand the effectiveness of n-gram and
and testing sets and report the mean accuracy. We compare our pitch-class profiles in identifying a raga, we performed an
approach with the approach proposed by [2]. Authors in [2] experiment by considering either of these features and together.
calculated the pitch-class profile in multiple ways namely, P1, Table VI shows the result of the experiment. It is clear that,
P2 and P3 based on whether only stable regions are considered both these features provide a vital clue about a raga and
or not and weighting of the bins. In P1 and P2, only stable combining them improves the performance.
[2] Koduri Gopala-Krishna, Sankalp Gulati and Preeti Rao, A Survey of
TABLE V: Comparison of performance of approach [2] for Raaga Recognition Techniques and Improvements to the State-of-the-Art,
various pitch-class profiles with our approach on CompMusic Sound and Music Computing, 2011.
dataset (%). [3] Honglak Lee, Peter Pham, Yan Largman and Andrew Y Ng, Unsuper-
vised feature learning for audio classification using convolutional deep
Method 1-NN 3-NN 5-NN 7-NN belief networks, Neural Information Processing Systems, 2009.
P1 [2] 51.3 48.6 52.6 51.2 [4] Gurav Pandey, Gaurav P, Chaitanya Mishra and Paul Ipe, Tansen : A Sys-
P2 [2] 73.2 72.1 71.4 67.9 tem For Automatic Raga Identification, Indian International Conference
P3 (12 bins) [2] 72.0 68.1 69.2 68.9 on Artificial Intelligence, 2003.
P3 (24 bins) [2] 71.3 67.7 69.2 66.0 [5] Vignesh Ishwar, Shrey Dutta, Ashwin Bellur, Hema A. Murthy, Motif
P3 (36 bins) [2] 68.8 65.91 66.5 64.3 Spotting in an Alapana in Carnatic Music, International Society on Music
P3 (72 bins) [2] 67.2 63.8 64.8 61.0 Information Retrieval, 2013
P3 (240 bins) [2] 63.6 57.9 58.5 57.2 [6] Preeti Rao and Anand Raju, Building a melody retrieval system, National
Conference on Communications, 2002.
Our approach (2-gram) 79.43
[7] Rajeswari Sridhar, and T.V. Geetha, Raga Identification of Carnatic
Our approach (3-gram) 81.32 music for Music Information Retrieval, International Journal of Recent
Our approach (4-gram) 83.39 Trends in Engineering, 2009.
[8] S. Shetty and K. Achary, Raga Mining of Indian Music by Extracting
Arohana-Avarohana Pattern, International Journal of Recent Trends in
TABLE VI: Improvement in classification performance due to Engineering, 2009.
various kernels (%). [9] S. Shetty and K. Achary, Raga Identification of Carnatic music for
Music Information Retrieval, International Journal of Recent Trends in
SVM with SVM with SVM with kernel Engineering, 2009.
Feature [10] P. Chordia and A. Rae, Raag recognition using pitch- class and pitch-
kernel K1 kernel K2 K = α1 K1 +α2 K2
2-gram 70.51 57.34 79.43 class dyad distributions, International Society on Music Information
3-gram 70.51 60.25 81.32 Retrieval, 2007.
4-gram 70.51 63.41 83.39 [11] http://essentia.upf.edu.
[12] Swift, Gordon N, Ornamentation in South Indian Music and the Violin,
Journal of the Society for Asian Music, 1990.
[13] J. Salamon, E. Gomez, Melody Extraction from Polyphonic Music
Our initial attempt has demonstrated the utility of combin- Signals using Pitch Contour Characteristics, IEEE Transactions on
ing two relevant features of music by defining kernels popular Audio, Speech and Language Processing, 2012.
in machine learning. There is much more to achieve before we [14] Mehmet Gonen, Ethem Alpaydn, Multiple Kernel Learning Algorithms,
obtain a reliable raga recognition system. Journal of Machine Learning Research, 2011.
[15] Subhransu Maji, Alexander C. Berg, Jitendra Malik, Efficient Clas-
sification for Additive Kernel SVMs, Pattern Analysis and Machine
V. C ONCLUSION Intelligence, 2012.
In this paper, we looked into the problem of raga identifi- [16] Lie Lu, Hong Jiang Zhang, Stan Z. Li, Content-based audio classifi-
cation in Indian Carnatic music. Based on the observation that, cation and segmentation by using support vector machines, MultiMedia
Systems, 2003.
existing methods are either based on pitch-class profiles or n-
[17] Na Yang, Rajani Muraleedharan, JoHannah Kohl, Ilker Demirkol,
gram histogram of notes but not both, we tried to incorporate Wendi Heinzelman, and Melissa Sturge-Apple, Speech-based Emotion
both of them in a multi-class SVM framework by linearly Classification Using Multiclass SVM with Hybrid Kernel and Threshold-
combining the two kernels. Each of these kernels capture the ing Fusion, IEEE Workshop on Spoken Language Technology, 2012
similarities of a raga based on Pitch-class profiles and n- [18] Joder, C., Essid, S., and Richard, G., Alignment Kernels for Audio Clas-
gram histogram of notes. This is in contrast to previous pitch- sification with application to Music Instrument Recognition, European
class profile based approaches where the temporal information Signal Processing Conference, 2008
of notes is ignored. We evaluated our proposed approach [19] Pranay Dighe, Harish Karnick, Bhiksha Raj, Swara Histogram Based
Structural Analysis And Identification Of Indian Classical Ragas, Inter-
on CompMusic dataset and our own dataset and show that national Society for Music Information Retrieval, 2013.
combining the clues from pitch-class profiles and n-gram [20] Mandayam Bharati Vedavalli, Sangita sastra sangraha: A Guide to
histogram indeed improves the performance. theory of Indian music, Page 25.
[21] Bruno Nettl, Melinda Russell, In the Course of Performance: Studies
ACKNOWLEDGMENT in the World of Musical Improvisation, Chapter 10, Page 219.
[22] Felix Weninger, Noam Amir, Ofer Amir, Irit Ronen, Florian Eyben,
We sincerely thank Shrey Dutta, IIT Madras for providing Bjrn Schuller, Robust feature extraction for automatic recognition of
many critical inputs and insightful comments regarding the vibrato singing in recorded polyphonic music., International Conference
Indian Carnatic music and its characteristics. We also thank on Acoustics, Speech and Signal Processing, 2012.
Gopala Koduri, Music Technology Group, Universitat Pompeu [23] Umut Simsekli, Automatic Music Genre Classification Using Bass
Fabra for suggestions and providing CompMusic dataset and Lines, International Conference on Pattern Recognition, 2010.
their code for comparisons. Vijay Kumar and Harit Pandya are [24] Zhouyu Fu, Guojun Lu, Kai Ming Ting, Dengsheng Zhang, Learning
supported by TCS research fellowship. Naive Bayes Classifiers for Music Classification and Retrieval, Interna-
tional Conference on Pattern Recognition, 2010.
[25] Kamelia Aryafar, Sina Jafarpour, Ali Shokoufandeh, Automatic musi-
R EFERENCES cal genre classification using sparsity-eager support vector machines,
[1] Pranay Dighe, Parul Agrawal, Harish Karnick, Siddartha Thota and International Conference on Pattern Recognition, 2012.
Bhiksha Raj, Scale independent raga identification using chromagram
patterns and swara based features, IEEE International Conference on
Multimedia and Expo Workshops, 2013.