Performance Analysis and Scoring of The Singing Voice: JANUARY 2009
Performance Analysis and Scoring of The Singing Voice: JANUARY 2009
Performance Analysis and Scoring of The Singing Voice: JANUARY 2009
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/228755768
CITATIONS
READS
207
3 AUTHORS:
Oscar Mayor
Jordi Bonada
12 PUBLICATIONS 42 CITATIONS
SEE PROFILE
SEE PROFILE
Alex Loscos
BMAT
30 PUBLICATIONS 420 CITATIONS
SEE PROFILE
Mayor et al.
In this article we describe the approximation we follow to analyze the performance of a singer when singing a reference
song. The idea is to rate the performance of a singer in the same way that a music tutor would do it, not only giving a
score but also giving feedback about how the user has performed regarding expression, tuning and tempo/timing
characteristics. Also a discussion on what visual feedback should be relevant for the user is discussed. Segmentation at
an intra-note level is done using an algorithm based on untrained HMMs with probabilistic models built out of a set of
heuristic rules that determine regions and their probability of being expressive features. A real-time karaoke-like system
is presented where a user can sing and visualize simultaneously feedback and results of the performance. The
technology can be applied to a wide set of applications that range from pure entertainment to more serious education
oriented.
INTRODUCTION
Singing voice is considered to be the most expressive
musical instrument. Singing and expressing emotions
are strongly coupled, making clearly distinguishable
when a singer performs sad, happy, tender, or
aggressive.
1 SYSTEM OVERVIEW
In our system, the analysis of the singing voice includes
first a note segmentation which consists on aligning the
singing performance to a reference midi and then an
expression segmentation which is basically an
Mayor et al.
3 NOTE ALIGNMENT
We are performing a note segmentation with prior
knowledge of the reference midi melody the user is
MIDI note
sequence
note
MODEL
sequence
Gb
F
F
E
Mayor et al.
Once the best expression type path has been chosen, the
most probable label for each expression type has then to
be estimated.
4 EXPRESSION CATEGORIZATION
Expression categorization and transcription of the
performance is carried out as well using segmental
HMMs based on hypothetic probabilistic models.
Expression paths are modeled as sequences of attack,
sustain, vibrato, release, transition states and their
possible connections. Besides, different labels can be
assigned to each state to distinguish between different
ways of performing. For instance, in case of a transition,
some possible labels include scoop-up, portamento or
normal. These paths are considered by the expression
recognition module and the path with highest
probability among all is the one chosen. The
probabilities are based in heuristic rules based on the
analysis descriptors. You can see this in detail in [6].
5 PERFORMANCE RATING/SCORING
After doing the analysis of the user performance, we
compute:
- Elemental ratings based in high level descriptors
like pitch and volume.
silence
SCOOP UP
NORMAL
attack
transition
vibrato
release
silence
MIDI
sustain
silence
attack
sustain
release
silence
vibrato
attack
release
silence
vibrato
transition
Mayor et al.
6 APPLICATIONS
The technology that we have developed can be applied
in many fields from entertainment and games to more
education focused applications.
6.1 Singing Education
In the classical singing education, the master-apprentice
model is used where teacher gives instructions and
feedback on the performance to the student about:
-
Acoustic quality
Physiological aspect of the
(posture of the vocal apparatus)
Tunning
Timing
performance
Mayor et al.
Mayor et al.
10 ACKNOWLEDGEMENTS
The work described in this paper has been supported
and funded by the Music Technology Group in the
Pompeu Fabra University and by Yamaha Corp.
REFERENCES
[1]
Viitaniemi, T., Klapuri, A. & Eronen, A. A
probabilistic model for the transcription of
single-voice melodies. Proceedings of the 2003
Finnish
Signal
Processing
Symposium,
FINSIG'03, Tampere, Finland, 2003.
Figure 11: Singing tutor application
8 EVALUATION
Five commercial pop songs have been used to evaluate
the system and some amateur singers have been asked
to sing the songs. The recorded performances have been
analyzed and note segmentation and expression
transcription have been performed. From these analysis
results, more than 1500 notes have been evaluated
achieving more than 95% accuracy in the note
segmentation, using as reference manual segmentation
by a musician and allowing a tolerance of 30
milliseconds, so boundaries automatically segmented
within this margin are considered as correct. For the
expression transcription evaluation, there is no a simple
way to evaluate the results, as sometimes there are many
ways to correctly transcript the same performance, for
[2]
[3]
[4]
Mayor et al.
[5]
[6]
[7]
[8]
[9]
[10]
[11]
Rossiter, D. & Howard, D.M. ALBERT: A realtime visual feedback computer tool for
professional vocal development, Journal of
Voice, 10(4):321-336, 1996.
[12]
[13]
[14]
[15]
[17]