0% found this document useful (0 votes)
74 views8 pages

AI-Based Vocal Judging Application

The document describes developing an AI-based application to analyze vocal performances and provide objective scoring to help judges in musical competitions. It discusses using algorithms like FFT and template-based chord recognition to detect pitch, notes, and other metrics from vocal and instrumental tracks to generate automated scores.

Uploaded by

James Kunang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views8 pages

AI-Based Vocal Judging Application

The document describes developing an AI-based application to analyze vocal performances and provide objective scoring to help judges in musical competitions. It discusses using algorithms like FFT and template-based chord recognition to detect pitch, notes, and other metrics from vocal and instrumental tracks to generate automated scores.

Uploaded by

James Kunang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

AI-Based Vocal Judging Application

Pasan A. Silva1, Rasika Ranaweera2

1
Tecxick Software Solutions, Pokunuvita, 12404, Sri Lanka
chpasilva@tecxick-soft.com
2
NSBM Green University, Mahahenwatta, Pitipana, Homagama, 10200, Sri Lanka
ranaweera.r@nsbm.ac.lk

ABSTRACT without caring about the actual signing


performance. (Forbes, 1994; Wilson, 1930).
Some contesters in musical contests and singing Also, there are some incidents they bribe the
competitions are lacking a proper coach and there judges. If we can introduce an AI-based platform
may be misjudgments by the judges of the that will run on a computer which can analyze the
competitions. To overcome that it is possible to voice and give the exact judgement that is given
apply AI and Deep Learning and get a computer by a judge, this will choose the actual competitors
to give an average reasonable score for the without any personal interests. But as these are
performance by analyzing the frequencies. In this running on a computer, a machine without any
research, A possible solution for this occasion is emotional intelligence, some judgment factors
described. The objective of this research is to must be done by a human judge.
help the judges or the individual users to give a
fair and more accurate grading for the songs. The Most of the singing competitions judge the
introduced solution uses FFT Algorithm as the competitors by the Vocal Techniques, Rhythm
main method to analyze the frequencies. And this and tempo, Timing, and harmony. As for this
will get a final average grade by analyzing, Pitch, Application, The Tone/ Pitch and Vocal
Musical Notations and Tempo of the Techniques will be detected and grated as for a
performance. The accuracy of the analyzing reference or the Judge. Also, the expressions are
system is about 97% at the current time. This may a main part of a judging contest.
be varied by the selection of separation, AI Based Music Analyzing Application is a tool
Frequency analyzing, and Midi note detection that can use by individuals and music judges to
algorithms. These detections are evaluated by get a computer-generated score for a
comparing it to FL Studio which is a software performance to get a better understanding of a
that is available for professional music singer or a competitor related to vocals and
development. singing. These results will be based on the
technologies and factors which are Judging
Factors, Vocal Techniques, Pitch
1. INTRODUCTION Identification Algorithms, and FFT
Algorithms.
When it comes to the musical contests and
singing contests some contesters don’t have a Vocal Techniques are huge variety of techniques
proper coach and if there are multiple judges in a that is used by the singers. Mostly, Breath
judging panel, sometimes they misjudge, and Control, Humming, Dictation, Pitch, and
judges based on appearance of the contestant Tone. These items will be used to judge the

619
ANNUAL INTERNATIONAL CONFERENCE ON BUSINESS INNOVATION (ICOBI) 2022

sinning quality of a singer. These Vocal Pitch Detection algorithm is librosa algorithm
Techniques can be detected using a spectrum in the reason of having Template based Chord
graph with frequencies. The vocal separation and detection compatibility.
technique detection will be possible with this
generated frequencies data. Template Based Chord Recognition is based
on the hypothesis that only the chord definition is
necessary to extract chord labels from a musical
Fast Fourier Transform (FFT) Algorithm can piece. A Chord Template is a 12- dimensional
be used in audio processing requirements. When vector representing the 12 semi notes (or chroma)
working with sound waves, FFT algorithm of the chromatic scale (Oudre et al., 2011). This
helps to convert the input wave signal into a Recognition will be used to identify the
frequency graph that can be used to determine the corresponding pitch for the instrumental and
Pitch, Harmony, Notes, and other music related vocal track.
data (Bharathi et al., 2011; Hastuti et al., 2019;
Krause et al., 2021; Kumaraswamy et al., 2019;
Pandiaraj et al., 2011; Yang et al., 2012). This
transformed data will be helpful when identifying
Frequency Peaks.

2. LITERATURE REVIEW
These values can be plotted in a Spectrum for a
better understanding. These data can be used to
analyze the pitch and other vocal techniques to
get a better grading. Also, these Spectrums can
be used to identify the beat and the tempo of a
Figure 1. - Separated Song Spectrums. First one
song(Tiwari et al., 2010). These Findings will
is the Raw Song, second one is the Separated
help the current project to determine a better
Instruments, third one is the separated vocals.
grade.
REPET-SIM method identifies the repeating
background audio using REPET (Repeating The Musical Notation Detection can be done by
Pattern Extraction Technique) and uses a comparing the frequency to the A4 Concert Pitch
Similarity Matrix. And this identified data is which is 440Hz (fn = 2n/12*440) (Cooper et al.,
used to subtract the background from the 1996; Kumaraswamy et al., 2019; Scheerer et al.,
foreground (Vocals) (Drugman et al., 2018; Rafii 2018; Yang et al., 2012). These Data can be used
et al., 2010). With that, more accurate separation to compare the Notes on the two tracks. To find
can be done by using that as shown in Figure 1. the correct octave Equation 1 can be used.
This is a better solution when it comes to
opensource and free section. Although this is not 𝑂𝑛 = log2(f2/f1) (1)
accurate as the paid libraries, this will help the
vocal judge system to gather enough data for most
recordings with wide variety of vocals. This Equation 1 – Equation for calculating the
method is used on the Librosa Library. Octave of the Midi Notation. O is the Octave, f2
is Upper Frequency and F1 is the Lower
To detect the Pitch of the instrumental track and Frequency
the vocal track, a Template based Recognition
algorithm will be used. This template-based To calculate the midi notation the Equation 2 can
recognition system will use the chord templates be used.
from audio, speech, instruments and other pre
trained model. For the time being, the selected

621
ANNUAL INTERNATIONAL CONFERENCE ON BUSINESS INNOVATION (ICOBI) 2022

m = 12 ∗ log2(𝑓𝑚/440 Hz) + 69 (2)

Equation 2 – Calculation of the Midi notation


Using Concert C and Frequency. M is Midi
Note, Fm if the Frequency that needs to find the
mini note.
Figure 3. Detected Instrumental
In Musical Programs and Competitions, the song Pitch
will be recorded into multiple Stems (Rovito,
2016). These stems are the files that contain
separate recordings such as instruments (drums,
harmony, guitar, etc.). Usually, these stem files
are in the original form that which when it is
recorded, so it takes a larger space on the disk.
These stem files can be used for better enhanced Figure 4. Pitch Comparison on the detected
detection for more accurate analysis. values

These generated pitch ranges can be used to


identify the musical notation. This can be done
3. Methodology when comparing to the frequency to the concert
To detect the Pitch of the Instrumental and the A (440Hz) (Firth, n.d.; Jha et al., 2022; Yang et
Vocal track, the targeted method is to apply the al., 2012). These results can be used to compare
Template-Based Chord Recognition for the track with the song data. These similar notations can
using FFT Algorithm. If the users don’t have the be used to detect the frequency changes on the
Stem Files for Vocal and instrumental data, instrumental track and the vocal track.
REPET-SIM method can be used to separate the
Vocal data from the Raw Audio file. With this To minimize the error which is generated from
method it will be possible to get a Chroma-Based the background noise, A noise gate can be added
Feature Representation that can be used to apply to the detected values array. Also, it is possible to
Template-Based Pattern Matching (Bartsch and use a technology such as Noise Suppressor such
Wakefield, 2005). This data will be useful to as Krisp to minimize the noise which is recorded
compare between the tracks for overlapping pitch at the audio (Carlisle, 2020).
changes, chord mismatches, and to give a grade
for the pitch difference. These comparisons can Using the Musical Notation Data Detected from
be converted to graphical formats such as charts Above methods, it is possible get a comparison
to give a better understanding of the pitch on the vocal and instrumental track to see how
comparison for the user. The comparison is well the singer performed in the pitch balance in
shown by the Figures 2, 3, and 4. the song. This grading is one of the main grading
components of the system.

This data is detected by getting the difference of


the matching musical notation from the tracks.
Next stage is to determine the unmatched
notations is matching to a musical chord. If it is
not, that must be reduced from the total grade.
Also, it is possible get a grading from the
matching musical chords (Figure 5).
Figure 2. Detected Vocal Pitch.

622
ANNUAL INTERNATIONAL CONFERENCE ON BUSINESS INNOVATION (ICOBI) 2022

and get the median of these peaks and clicks.


Also this is known as the Rhythm Histogram
(Folorunso et al., 2022; Schuller et al., 2008). For
calculate the grade for the tempo, this uses the
Equation 3.
Figure 5. Sample output from matching musical 𝑇𝑒𝑚𝑝𝑜 𝐷𝑖𝑓𝑓 = 𝑡𝑒𝑚𝑝𝑜 (3)
notation which is detected from the CREPE − 𝑣𝑡𝑒𝑚𝑝𝑜
Algorithm

Since the Singing techniques detection is still in 𝑇𝑒𝑚𝑝𝑜 𝐷𝑖𝑓𝑓 = 𝑡𝑒𝑚𝑝𝑜 − 𝑣𝑡𝑒𝑚𝑝𝑜
the research level (Krause, Müller and Weiß, Equation 3 – Formula to calculate the Tempo
2021; D’Amario, Daffern and Bailes, 2018; Difference
Sheldon, 1998) using it in this application will
reduce its total accuracy on the grading From the calculation (Equation 3), tempo is the
mechanism that is used in this implementation. Detected tempo value from the instrumental track
To reduce that factor, current implementation of and the vtempo is the detected tempo value from
the application uses an advanced analysis of the the Vocal track. It is possible to detect the tempo
detected pitch of the vocal track as Perfect using the frequency peaks as beat time and gets an
Frequency Match Occurrences, Musical average tempo by calculating the average time
Chords Comparison, Total Pitchouts, and between the beats(Schuller et al., 2008; Yang et
Matched Musical Notations. al., 2012). These values can be used to get a grade
out of 100 as the first grading step. For the Pitch
After getting analysis mentioned, System will Comparison, Absolute comparison on Vocal and
calculate average grading for the detected Instrumental track alone is not a most accurate
changes and The Difference from the perfect method. The Algorithm must also look at the
match. This will help to get a more accurate grade Matching Chords and Overlapping Musical
of the song analysis. A sample analysis is shown Notes.
in Figure 6 by using the song Burn by Ellie
Goulding1. Perfect Matches are calculated by comparing
the instrumental frequency value and the filtered
vocal frequency value. This will subtract the
vocal frequency value from the instrumental
frequency and flip it to a positive number if it is
negative and check whether the difference is less
than 0.5. This 0.5 works like the border of the
note. Frequencies have likely 0.5 difference from
each note (Faghih et al., 2022; Krause et al.,
2021; Scheerer et al., 2018).
Figure 6. Sample Output of Advanced analysis To calculate the Matching Notes, first using the
of the Frequency Data midi note conversation equation, The vocal and
instrumental frequency will be converted into the
midi notes. After that conversation, it will
As for the grading algorithm, system will
compare for equal values, if it is equal, that
generate an average grade based on the advanced
means it is a Matched Note. This will be the
analysis of pitch and tempo balance.
second stage of Pitch grading.
Matching Chords will calculate from the
To detect the tempo of each track, this will check
detected musical notation from the above. This
for Frequency peaks and click on the song file

1
https://www.youtube.com/watch?v=CGyEd0aKWZE

623
ANNUAL INTERNATIONAL CONFERENCE ON BUSINESS INNOVATION (ICOBI) 2022

will search through a database that contains


musical chords with notations and look for The first comparison is the generated spectrum is
similar ones. If the singer sang according to the shown in Figure 7 and 8,
musical chord, this would add a point to the
grade. This is the third step on overall grading
system.
Pitchout Comparison is the last step on grading
mechanism, this will check for the pitchouts and
range outs of the vocal track. For this value,
instead of average pitchout frequency this will
check how many pitchouts singer done in the full Figure 7. The spectrum of the song Burn from
song. And those marks will be deducted from the 20k Hz 8192 res Monitor (First 125 frames)
overall grading. This will be the last step on from FL Studio
Advanced Grading Mechanism.
After calculating all mentioned steps above,
Overall grading will be calculated by getting an
average out of above values. The system is using
the formula mentioned below to get a score out
of the above values,

𝑆𝑐𝑜𝑟𝑒 = ((10 − 𝑡𝑝𝑚)𝑖𝑓 0 < 𝑡𝑝𝑚


≤ 10 𝑒𝑙𝑠𝑒 0) Figure 8. Spectrom of the full song
+ ( 10 mentioned above with 8192 Hz, 558 res
− 𝑡𝑚𝑛 ) 𝑖𝑓 0 (4) Monitor from the algorithm
< 𝑡𝑚𝑐
≤ 10 𝑒𝑙𝑠𝑒 0) Next testable versions were the musical
− ( 𝑡𝑝𝑜 𝑖𝑓 0 < 𝑡𝑝𝑜 notation of the songs. As this contains a
< 10 𝑒𝑙𝑠𝑒 10)
larger dataset and the FL Studio only
supported Realtime feedback (Febrian et al.,
2020), The data were tested by checking
Equation 4 - Formula used to get the score out whether the cord shown on the FL Studio
of analysed values from the pitch detection Monitor matched the musical Notation
tpm = Total Perfect Matches detected by the Algorithm on a specific time
tmn = Total Matched Notes
tmc = Total Matching Chords
tpo = Total Pitch Outs
After getting the score, the system will calculate
an average from the score that is calculated. This
will be the user average grade for the pitch
detection.

4. DISCUSSION
This new approach on analyzing songs and
applying a performance score is evaluated by
manually comparing it with FLStudio’s
Individual Plugins.

624
ANNUAL INTERNATIONAL CONFERENCE ON BUSINESS INNOVATION (ICOBI) 2022

frame. Figure 9 and 10 shows the


comparison of both methods.

Figure 9. Midi note of 90th time frame


detected by Fl Studio Figure 11. Tempo of the song Happier2 from FL
Studio Tempo Tapper.

Figure 12. Tempo detected by the Algorithm


Figure 10. Midi Note which is detected
from the matching Time Frame From these Factors we can get an average
accuracy of 97% on the Pitch, Tempo, and
For determining the accuracy of tempo Frequency detection.
detection, it is validated through the Manual
tempo tap of FL Studio. Samples are shown
in Figure 11 and 12
5. CONCLUSION
An AI based Vocal judging System will help the
individual singers and judges get a better
understanding about how they perform and give
a reasonable score for the performance. The
limitation of this system is the emotional
intelligence, as for that judging factor, a human
Judge will be needed. The overall objective is
successfully achieved and addressed. The general
accuracy of this system will depend on the
algorithms that are used to detect Pitch, Tempo,
and Frequencies.

2
https://www.youtube.com/watch?v=m7Bc3pLyij0

625
ANNUAL INTERNATIONAL CONFERENCE ON BUSINESS INNOVATION (ICOBI) 2022

REFERENCES doi:
10.12928/TELKOMNIKA.v17i1.11606
Bharathi, V., Asaph Abraham, A., & Ramya, R. Jha, A., Gupta, S., Dubey, P., & Chhabria, A.
(2011). Vocal pitch detection for musical (2022). Music Feature Extraction And
transcription. 2011 - International Recommendation Using CNN Algorithm.
Conference on Signal Processing, ITM Web of Conferences, 44, 03026. doi:
Communication, Computing and 10.1051/itmconf/20224403026
Networking Technologies, ICSCCN-2011, Krause, M., Müller, M., & Weiß, C. (2021).
724–726. doi: Singing voice detection in opera
10.1109/ICSCCN.2011.6024645 recordings: A case study on robustness and
Carlisle, M. (2020). Krisp. New York: Time generalization. Electronics (Switzerland),
Incorporated, 196, 0–100. 10(10). doi: 10.3390/electronics10101214
Cooper, D., & Ng, K. C. (1996). A Monophonic Kumaraswamy, B., & Poonacha, P. G. (2019).
Pitch-Tracking Algorithm Based on Octave Error Reduction in Pitch Detection
Waveform Periodicity Determinations Algorithms Using Fourier Series
Using Landmark Points. In Music Journal , Approximation Method. IETE Technical Review
Autumn (Vol. 20, Issue 3). Retrieved from (Institution of Electronics and
https://about.jstor.org/terms Telecommunication Engineers, India),
Drugman, T., Huybrechts, G., Klimkov, V., & 36(3), 293–302. doi:
Moinet, A. (2018). Traditional Machine 10.1080/02564602.2018.1465859
Learning for Pitch Detection. IEEE Signal Oudre, L., Févotte, C., & Grenier, Y. (2011).
Processing Letters, 25(11), 1745–1749. Probabilistic template-based chord
doi: 10.1109/LSP.2018.2874155 recognition. IEEE Transactions on Audio,
Faghih, B., Chakraborty, S., Yaseen, A., & Speech and Language Processing, 19(8),
Timoney, J. (2022). A New Method for 2249–2259. doi:
Detecting Onset and Offset for Singing in 10.1109/TASL.2010.2098870
Real-Time and Offline Environments. Pandiaraj, S., Gloria, L., Keziah, N. R.,
Applied Sciences, 12(15), 7391. doi: Vynothini, S., & Kumar, K. R. S. (2011).
10.3390/app12157391 A proficient vocal training system with
Febrian, A., Rante, H., Sukaridhoto, S., & pitch detection using SHR. ICECT 2011 -
Alimudin, A. (2020). Music Scoring for 2011 3rd International Conference on
Film Using Fruity Loops Studio. E3S Web Electronics Computer Technology, 3, 303–
of Conferences, 188. doi: 307. doi:
10.1051/e3sconf/202018800004 10.1109/ICECTECH.2011.5941760
Firth, M. (n.d.). Minimising latency of pitch Rafii, Z., & Pardo, B. (2010). REPET-SIM FOR
detection algorithms for live vocals on low- SINGING VOICE SEPARATION.
cost hardware. Rovito, M. (2016). HOW-TO Make Multitrack
Folorunso, S. O., Afolabi, S. A., & Owodeyi, A. Stems with NI Stem Creator. Electronic
B. (2022). Dissecting the genre of Nigerian Musician , 32(1). Retrieved from
music with machine learning models. https://www.proquest.com/magazines/ho
Journal of King Saud University - w-make-multitrack-stems-with-ni-stem-
Computer and Information Sciences, Scheerer, N. E., & Jones, J. A. (2018). Detecting
34(8), 6266–6279. doi: our own vocal errors: An event-related
10.1016/j.jksuci.2021.07.009 study of the thresholds for perceiving and
Hastuti, K., Syarif, A. M., Fanani, A. Z., & compensating for vocal pitch errors.
Mulyana, A. R. (2019). Natural automatic Neuropsychologia, 114, 158–167. doi:
musical note player using time-frequency 10.1016/j.neuropsychologia.2017.12.007
analysis on human play. Telkomnika Schuller, B., Eyben, F., & Rigoll, G. (2008).
(Telecommunication Computing Tango or Waltz?: Putting ballroom dance
Electronics and Control), 17(1), 235–245. style into tempo detection. Eurasip Journal

626
ANNUAL INTERNATIONAL CONFERENCE ON BUSINESS INNOVATION (ICOBI) 2022

on Audio, Speech, and Music Processing,


2008. doi: 10.1155/2008/846135
Tiwari, M. D., Tripathi, R. C., Agrawal,
Anupam., & Association for Computing
Machinery. (2010). Proceedings of the
First International Conference on
Intelligent Interactive Technologies &
Multimedia (IITM 2010) : December 27-
30, 2010, Indian Institute of Information
Technology, Allahabad, India. Association
of Computing Machinery (ACM IIIT
Allahabad chapter).
Yang, R., Bian, J., & Xiong, L. (2012).
Frequency to MIDI converter for Musical
Instrument Microphone System. 2012 2nd
International Conference on Consumer
Electronics, Communications and
Networks, CECNet 2012 - Proceedings,
2597–2599. doi:
10.1109/CECNet.2012.6201463

627

You might also like