OC2021 Paper 4
ABSTRACT not only brings convenience to our lives but also builds a
bridge for communication between different peoples and
The work of finding which phonemes distinguish the countries [2, 21]. The current language recognition mainly
different regions within the same dialect will be of great focuses on the recognition of different languages, such as
significance to the improvement the recognition technology, Oriental Language Recognition Competition (OLR
national and information security and dialect protection. To Challenge) and Language Recognition Evaluation (LRE) [3].
address this issue, this paper investigates the refinement OLR Challenge is held every year while LRE has an
recognition of different regions of the same dialect based on evaluation every two years. The main tasks of these
the corpus designed and recorded by CASIA (CASIA competitions are to recognize several languages such as
Dialect Corpus, CASIA DC) firstly and then finds the Oriental languages (Cantonese, Indonesian, Japanese,
distinguishing phonemes by the probabilistic accumulation Chinese…), Arabic, English, Slavic, Iberian, etc. Even if
of phonemes method. Based on i-vector model, the specific to a certain language, such as Chinese, the
recognition results indicate that the recognition rates of recognition task is also to identify different dialects within
different regions of the same dialect are different from one Chinese. At the same time, the smaller differences in
dialect to another. The recognition rate of the Mandarin phonetics, vocabulary, grammar, etc. in different regions of
dialects is lower than that of other dialects. Through the the same dialect than different dialects impose great
probabilistic accumulation of phonemes, we find that the challenges to the recognition of different regions within the
phonemes with significant difference can distinguish same dialect [4].
different regions of the same dialect, which will provide Recognition of different types of languages / dialects is
significance for the synthesis and recognition of dialects in important, but it also has important significance for the
the future. recognition of different areas within the same dialect. From
the aspect of improving scientific technology, the situation
Index Terms—distinguishing phonemes, language that the differences in different regions of the same dialects
recognition, regions of dialects, i-vector model, probabilities are smaller than that in different dialects poses a certain
accumulation of phonemes challenge to the traditional language / dialect recognition
technology. In order to identify different areas of the same
1. INTRODUCTION dialect accurately, scholars have to propose more refined
systems and technique which will effectively promote the
In order to find which phonemes distinguish the different continuous optimization of language recognition technology
regions within the same dialect, the first step is language and system.
recognition, that is, the recognition of different regions Recognition of different regions within the same dialect
within the same dialect. and find the distinguishing phonemes is also important for
The main task of language recognition is to quickly and national security. It has been found that the dialect will be
accurately identify the type of language according to a given strongly associated with the geographic location by the
segment [1]. It has numerous applications in the fields of recognition of different areas within the dialects. The refiner
speech recognition, voiceprint recognition, machine the dialect is, the more precise the position is. Through the
translation, communication and information retrieval, which investigation of the distribution of Chinese dialects, it is
