AI-Based Vocal Judging Application
AI-Based Vocal Judging Application
1
Tecxick Software Solutions, Pokunuvita, 12404, Sri Lanka
chpasilva@tecxick-soft.com
2
NSBM Green University, Mahahenwatta, Pitipana, Homagama, 10200, Sri Lanka
ranaweera.r@nsbm.ac.lk
619
ANNUAL INTERNATIONAL CONFERENCE ON BUSINESS INNOVATION (ICOBI) 2022
sinning quality of a singer. These Vocal Pitch Detection algorithm is librosa algorithm
Techniques can be detected using a spectrum in the reason of having Template based Chord
graph with frequencies. The vocal separation and detection compatibility.
technique detection will be possible with this
generated frequencies data. Template Based Chord Recognition is based
on the hypothesis that only the chord definition is
necessary to extract chord labels from a musical
Fast Fourier Transform (FFT) Algorithm can piece. A Chord Template is a 12- dimensional
be used in audio processing requirements. When vector representing the 12 semi notes (or chroma)
working with sound waves, FFT algorithm of the chromatic scale (Oudre et al., 2011). This
helps to convert the input wave signal into a Recognition will be used to identify the
frequency graph that can be used to determine the corresponding pitch for the instrumental and
Pitch, Harmony, Notes, and other music related vocal track.
data (Bharathi et al., 2011; Hastuti et al., 2019;
Krause et al., 2021; Kumaraswamy et al., 2019;
Pandiaraj et al., 2011; Yang et al., 2012). This
transformed data will be helpful when identifying
Frequency Peaks.
2. LITERATURE REVIEW
These values can be plotted in a Spectrum for a
better understanding. These data can be used to
analyze the pitch and other vocal techniques to
get a better grading. Also, these Spectrums can
be used to identify the beat and the tempo of a
Figure 1. - Separated Song Spectrums. First one
song(Tiwari et al., 2010). These Findings will
is the Raw Song, second one is the Separated
help the current project to determine a better
Instruments, third one is the separated vocals.
grade.
REPET-SIM method identifies the repeating
background audio using REPET (Repeating The Musical Notation Detection can be done by
Pattern Extraction Technique) and uses a comparing the frequency to the A4 Concert Pitch
Similarity Matrix. And this identified data is which is 440Hz (fn = 2n/12*440) (Cooper et al.,
used to subtract the background from the 1996; Kumaraswamy et al., 2019; Scheerer et al.,
foreground (Vocals) (Drugman et al., 2018; Rafii 2018; Yang et al., 2012). These Data can be used
et al., 2010). With that, more accurate separation to compare the Notes on the two tracks. To find
can be done by using that as shown in Figure 1. the correct octave Equation 1 can be used.
This is a better solution when it comes to
opensource and free section. Although this is not 𝑂𝑛 = log2(f2/f1) (1)
accurate as the paid libraries, this will help the
vocal judge system to gather enough data for most
recordings with wide variety of vocals. This Equation 1 – Equation for calculating the
method is used on the Librosa Library. Octave of the Midi Notation. O is the Octave, f2
is Upper Frequency and F1 is the Lower
To detect the Pitch of the instrumental track and Frequency
the vocal track, a Template based Recognition
algorithm will be used. This template-based To calculate the midi notation the Equation 2 can
recognition system will use the chord templates be used.
from audio, speech, instruments and other pre
trained model. For the time being, the selected
621
ANNUAL INTERNATIONAL CONFERENCE ON BUSINESS INNOVATION (ICOBI) 2022
622
ANNUAL INTERNATIONAL CONFERENCE ON BUSINESS INNOVATION (ICOBI) 2022
Since the Singing techniques detection is still in 𝑇𝑒𝑚𝑝𝑜 𝐷𝑖𝑓𝑓 = 𝑡𝑒𝑚𝑝𝑜 − 𝑣𝑡𝑒𝑚𝑝𝑜
the research level (Krause, Müller and Weiß, Equation 3 – Formula to calculate the Tempo
2021; D’Amario, Daffern and Bailes, 2018; Difference
Sheldon, 1998) using it in this application will
reduce its total accuracy on the grading From the calculation (Equation 3), tempo is the
mechanism that is used in this implementation. Detected tempo value from the instrumental track
To reduce that factor, current implementation of and the vtempo is the detected tempo value from
the application uses an advanced analysis of the the Vocal track. It is possible to detect the tempo
detected pitch of the vocal track as Perfect using the frequency peaks as beat time and gets an
Frequency Match Occurrences, Musical average tempo by calculating the average time
Chords Comparison, Total Pitchouts, and between the beats(Schuller et al., 2008; Yang et
Matched Musical Notations. al., 2012). These values can be used to get a grade
out of 100 as the first grading step. For the Pitch
After getting analysis mentioned, System will Comparison, Absolute comparison on Vocal and
calculate average grading for the detected Instrumental track alone is not a most accurate
changes and The Difference from the perfect method. The Algorithm must also look at the
match. This will help to get a more accurate grade Matching Chords and Overlapping Musical
of the song analysis. A sample analysis is shown Notes.
in Figure 6 by using the song Burn by Ellie
Goulding1. Perfect Matches are calculated by comparing
the instrumental frequency value and the filtered
vocal frequency value. This will subtract the
vocal frequency value from the instrumental
frequency and flip it to a positive number if it is
negative and check whether the difference is less
than 0.5. This 0.5 works like the border of the
note. Frequencies have likely 0.5 difference from
each note (Faghih et al., 2022; Krause et al.,
2021; Scheerer et al., 2018).
Figure 6. Sample Output of Advanced analysis To calculate the Matching Notes, first using the
of the Frequency Data midi note conversation equation, The vocal and
instrumental frequency will be converted into the
midi notes. After that conversation, it will
As for the grading algorithm, system will
compare for equal values, if it is equal, that
generate an average grade based on the advanced
means it is a Matched Note. This will be the
analysis of pitch and tempo balance.
second stage of Pitch grading.
Matching Chords will calculate from the
To detect the tempo of each track, this will check
detected musical notation from the above. This
for Frequency peaks and click on the song file
1
https://www.youtube.com/watch?v=CGyEd0aKWZE
623
ANNUAL INTERNATIONAL CONFERENCE ON BUSINESS INNOVATION (ICOBI) 2022
4. DISCUSSION
This new approach on analyzing songs and
applying a performance score is evaluated by
manually comparing it with FLStudio’s
Individual Plugins.
624
ANNUAL INTERNATIONAL CONFERENCE ON BUSINESS INNOVATION (ICOBI) 2022
2
https://www.youtube.com/watch?v=m7Bc3pLyij0
625
ANNUAL INTERNATIONAL CONFERENCE ON BUSINESS INNOVATION (ICOBI) 2022
REFERENCES doi:
10.12928/TELKOMNIKA.v17i1.11606
Bharathi, V., Asaph Abraham, A., & Ramya, R. Jha, A., Gupta, S., Dubey, P., & Chhabria, A.
(2011). Vocal pitch detection for musical (2022). Music Feature Extraction And
transcription. 2011 - International Recommendation Using CNN Algorithm.
Conference on Signal Processing, ITM Web of Conferences, 44, 03026. doi:
Communication, Computing and 10.1051/itmconf/20224403026
Networking Technologies, ICSCCN-2011, Krause, M., Müller, M., & Weiß, C. (2021).
724–726. doi: Singing voice detection in opera
10.1109/ICSCCN.2011.6024645 recordings: A case study on robustness and
Carlisle, M. (2020). Krisp. New York: Time generalization. Electronics (Switzerland),
Incorporated, 196, 0–100. 10(10). doi: 10.3390/electronics10101214
Cooper, D., & Ng, K. C. (1996). A Monophonic Kumaraswamy, B., & Poonacha, P. G. (2019).
Pitch-Tracking Algorithm Based on Octave Error Reduction in Pitch Detection
Waveform Periodicity Determinations Algorithms Using Fourier Series
Using Landmark Points. In Music Journal , Approximation Method. IETE Technical Review
Autumn (Vol. 20, Issue 3). Retrieved from (Institution of Electronics and
https://about.jstor.org/terms Telecommunication Engineers, India),
Drugman, T., Huybrechts, G., Klimkov, V., & 36(3), 293–302. doi:
Moinet, A. (2018). Traditional Machine 10.1080/02564602.2018.1465859
Learning for Pitch Detection. IEEE Signal Oudre, L., Févotte, C., & Grenier, Y. (2011).
Processing Letters, 25(11), 1745–1749. Probabilistic template-based chord
doi: 10.1109/LSP.2018.2874155 recognition. IEEE Transactions on Audio,
Faghih, B., Chakraborty, S., Yaseen, A., & Speech and Language Processing, 19(8),
Timoney, J. (2022). A New Method for 2249–2259. doi:
Detecting Onset and Offset for Singing in 10.1109/TASL.2010.2098870
Real-Time and Offline Environments. Pandiaraj, S., Gloria, L., Keziah, N. R.,
Applied Sciences, 12(15), 7391. doi: Vynothini, S., & Kumar, K. R. S. (2011).
10.3390/app12157391 A proficient vocal training system with
Febrian, A., Rante, H., Sukaridhoto, S., & pitch detection using SHR. ICECT 2011 -
Alimudin, A. (2020). Music Scoring for 2011 3rd International Conference on
Film Using Fruity Loops Studio. E3S Web Electronics Computer Technology, 3, 303–
of Conferences, 188. doi: 307. doi:
10.1051/e3sconf/202018800004 10.1109/ICECTECH.2011.5941760
Firth, M. (n.d.). Minimising latency of pitch Rafii, Z., & Pardo, B. (2010). REPET-SIM FOR
detection algorithms for live vocals on low- SINGING VOICE SEPARATION.
cost hardware. Rovito, M. (2016). HOW-TO Make Multitrack
Folorunso, S. O., Afolabi, S. A., & Owodeyi, A. Stems with NI Stem Creator. Electronic
B. (2022). Dissecting the genre of Nigerian Musician , 32(1). Retrieved from
music with machine learning models. https://www.proquest.com/magazines/ho
Journal of King Saud University - w-make-multitrack-stems-with-ni-stem-
Computer and Information Sciences, Scheerer, N. E., & Jones, J. A. (2018). Detecting
34(8), 6266–6279. doi: our own vocal errors: An event-related
10.1016/j.jksuci.2021.07.009 study of the thresholds for perceiving and
Hastuti, K., Syarif, A. M., Fanani, A. Z., & compensating for vocal pitch errors.
Mulyana, A. R. (2019). Natural automatic Neuropsychologia, 114, 158–167. doi:
musical note player using time-frequency 10.1016/j.neuropsychologia.2017.12.007
analysis on human play. Telkomnika Schuller, B., Eyben, F., & Rigoll, G. (2008).
(Telecommunication Computing Tango or Waltz?: Putting ballroom dance
Electronics and Control), 17(1), 235–245. style into tempo detection. Eurasip Journal
626
ANNUAL INTERNATIONAL CONFERENCE ON BUSINESS INNOVATION (ICOBI) 2022
627