OC2021 Paper 4
OC2021 Paper 4
OC2021 Paper 4
ABSTRACT not only brings convenience to our lives but also builds a
bridge for communication between different peoples and
The work of finding which phonemes distinguish the countries [2, 21]. The current language recognition mainly
different regions within the same dialect will be of great focuses on the recognition of different languages, such as
significance to the improvement the recognition technology, Oriental Language Recognition Competition (OLR
national and information security and dialect protection. To Challenge) and Language Recognition Evaluation (LRE) [3].
address this issue, this paper investigates the refinement OLR Challenge is held every year while LRE has an
recognition of different regions of the same dialect based on evaluation every two years. The main tasks of these
the corpus designed and recorded by CASIA (CASIA competitions are to recognize several languages such as
Dialect Corpus, CASIA DC) firstly and then finds the Oriental languages (Cantonese, Indonesian, Japanese,
distinguishing phonemes by the probabilistic accumulation Chinese…), Arabic, English, Slavic, Iberian, etc. Even if
of phonemes method. Based on i-vector model, the specific to a certain language, such as Chinese, the
recognition results indicate that the recognition rates of recognition task is also to identify different dialects within
different regions of the same dialect are different from one Chinese. At the same time, the smaller differences in
dialect to another. The recognition rate of the Mandarin phonetics, vocabulary, grammar, etc. in different regions of
dialects is lower than that of other dialects. Through the the same dialect than different dialects impose great
probabilistic accumulation of phonemes, we find that the challenges to the recognition of different regions within the
phonemes with significant difference can distinguish same dialect [4].
different regions of the same dialect, which will provide Recognition of different types of languages / dialects is
significance for the synthesis and recognition of dialects in important, but it also has important significance for the
the future. recognition of different areas within the same dialect. From
the aspect of improving scientific technology, the situation
Index Terms—distinguishing phonemes, language that the differences in different regions of the same dialects
recognition, regions of dialects, i-vector model, probabilities are smaller than that in different dialects poses a certain
accumulation of phonemes challenge to the traditional language / dialect recognition
technology. In order to identify different areas of the same
1. INTRODUCTION dialect accurately, scholars have to propose more refined
systems and technique which will effectively promote the
In order to find which phonemes distinguish the different continuous optimization of language recognition technology
regions within the same dialect, the first step is language and system.
recognition, that is, the recognition of different regions Recognition of different regions within the same dialect
within the same dialect. and find the distinguishing phonemes is also important for
The main task of language recognition is to quickly and national security. It has been found that the dialect will be
accurately identify the type of language according to a given strongly associated with the geographic location by the
segment [1]. It has numerous applications in the fields of recognition of different areas within the dialects. The refiner
speech recognition, voiceprint recognition, machine the dialect is, the more precise the position is. Through the
translation, communication and information retrieval, which investigation of the distribution of Chinese dialects, it is
7. ACKNOWLEDGEMENTS
8. REFERENCES
The phonemes in the table mean that the phoneme has no [1] E. Ambikairajah, H. Li, L. Wang, B. Yin, and V. Sethu, ―Language
significant differences in the corresponding two dialect Identification: A Tutorial,‖ IEEE Circuits and Systems Magazine, pp.
regions, which indicates that the two dialect areas cannot be 82-108, 2011.
[2] Y. Muthusamy, E. Barnard, and R. Cole, ―Reviewing automatic
distinguished, but the other regions can be identified. language identification,‖ Signal Processing Magazine, IEEE, vol.11,
In Jin dialect, phoneme ―s‖ has a significant difference in no. 4, pp. 33-41, 1994.
all dialect regions; while in Wu dialect, phoneme ―ing‖ has a [3] https://www.nist.gov/itl/iad/mig/language-recognition.
significant difference in all regions. ―s‖ and ―ing‖ would [4] H. Qi, and J. Xiao, ―The Phonological Features of Yunnan Yongren
Dialect,‖ Science and Innovation, vol. 7, no. 1, pp. 26-30, 2019.
distinguish all the regions in Jin and Wu dialects [5] W. Cai, Z. Cai, W. Liu, X. Wang, and M. Li, ―Insights into end-to-
respectively. Otherwise, the combination of the phonemes is end learning scheme for language identification,‖ in ICASSP 2018-
required in recognition of these regions in the same dialect. 2018 IEEE International Conference on Acoustics, Speech and
By comparison of these two tables, we can conclude that Signal Processing, April 15-20, Calgary, Alberta, Canada,
Proceedings, 2018.
there are more phonemes with no significant difference in
[6] W. Cai, D. Cai, H. Shen, and M. Li, ―Utterance-Level End-to-End
Wu dialect than in Jin dialect which indicates that Wu Language Identification Using Attention-Based CNN-BLSTM,‖ in
dialect is more complicated than Jin dialect. ICASSP 2019-2019 IEEE International Conference on Acoustics,
Speech and Signal Processing, May 12-17, Brighon, UK,
6. CONCLUSIONS Proceedings, 2019.
[7] A. Lozano-Diez, R. Zazo-Candil, J. Gonzalez-Dominguez, D. T.
Toledano, and J. Gonzalez-Rodriguez, ―An End-to-end Approach to
The CASIA Dialect Corpus (CASIA DC) is a large-scale Language Identification in Short Utterances using Convolutional
dialect Chinese dialect corpus that was constructed in 2019. Neural Networks,‖ in INTERSPEECH 2015- 16th Annual Conference
Its original purpose was to provide effective data resource of the International Speech Communication Association, September
6-10, Dresden, Germany, Proceedings, 2015, pp. 403–407.
support for dialect recognition and speaker recognition. [8] P. A. Torres-Carrasquillo, E. Singer, M. A. Kohler, R. J. Greene, D. A.
Based on CASIA DC, this paper conducted a refinement Reynolds, and J. R. Deller, ―Approaches to language identification
dialect recognition. I-vector model with PLDA classifier using Gaussian mixture models and Shifted Delta Cepstral features,‖
was used in the experiments, and the accuracy of different in Proceedings of ICSLP, 2002, pp. 89–92.
[9] S. Fernando, V. Sethu, E. Ambikairajah, and J. Epps, ―Bidirectional
regions of the same dialect is different between different Modelling for Short Duration Language Identification,‖ in
dialects. The accuracy of Wu and Jin dialects is about 85%, INTERSPEECH 2017 - 18th Annual Conference of the International
and some even can reach 90%, while that of Jilu and Speech Communication Association, August 20-24, Stockholm,
Jianghuai Mandarin is only about 80%. This indicates that Sweden, Proceedings, 2017, pp. 2809–2813.
[10] I. Lopez-Moreno, J. Gonzalez-Dominguez, O. Plchot, D. Martinez, J.
the internal differences in Mandarin dialects are smaller than Gonzalez-Rodriguea, and P. J. Moreno, ―Automatic Language
those in other dialects. Identification using Deep Neural Networks,‖ Acoustics, Speech, and
By accumulating the probabilities of phonemes of Signal Processing, IEEE International Conference, 2014.
different dialect regions within each dialect, we can [11] J. Gonzalez-Dominguez, I. Lopez-Moreno, P. J. Moreno, and J.
Gonzalez-Rodriguez, "Frame-by-frame language identification in
conclude that some phonemes can effectively distinguish
short utterances using deep neural networks," Neural Networks, vol.
different regions within the same dialect, but concrete to a 64, pp. 49-58, 2015.
dialect, the contributing phonemes should be analyzed [12] P. Shen, X. Lu, Sh. Li, and H. Kawai, ―Feature Representation of
specifically. And this will provide useful clues for future Short Utterances based on Knowledge Distillation for Spoken
Language Identification,‖ in INTERSPEECH 2018- 19th Annual
speech synthesis and speech recognition.
Conference of the International Speech Communication Association,
This work provides valuable clues for national and September 2-6, Hyderabad, India, Proceedings, 2018, pp. 1813–
information security and dialect protection. In the future, we 1817.
[13] W. Geng, W. Wang, Y. Zhao, X. Cai, and B. Xu, ―End-to-end
Language Identification using Attention-based Recurrent Neural