Wellesley College
East Asian Languages & Cultures
In this study, we focus on particle errors and discuss an annotation scheme for Korean learner corpora that can be used to extract heuristic patterns of particle errors efficiently. We investigate different properties of particle errors... more
Some examples of non-object TCs are given as follows. i Lazarusj-ka [ j shyophingha-ki ]-ka/ey swipta (Locative) Lazarus-nom do shopping-nml-nom/for easy 'Lazarus is easy to do shopping (in)' ii yenphilj-i [ j kulssi-lul sseu-ki ]-ka/ey... more
This paper argues for the necessity of zero pronoun annotations in Korean treebanks and provides an annotation scheme that can be used to develop a gold standard for testing different anaphor resolution algorithms. Relevant issues of... more
This paper argues for the necessity of zero pronoun annotations in Korean treebanks and provides an annotation scheme that can be used to develop a gold standard for testing different anaphor resolution algorithms. Relevant issues of... more
This paper explores several important issues in developing syntactically annotated Korean corpora for higher-level language processing, including semantic-discourse parsing, question-answering, machine translation, information retrieval,... more
- by Sun-Hee Lee
This paper presents preliminary work on a corpus-based study of Korean demonstratives. Through the development of an annotation scheme and the use of spoken and written corpora, we aim to determine different functions of demonstratives... more
- by Sun-Hee Lee
We detect errors in Korean post-positional particle usage, focusing on optimizing omission detection, as omissions are the single-biggest factor in particle errors for learners of Korean. We also develop a system for predicting the... more
Post-positional particles are a significant source of errors for learners of Korean. Following methodology that has proven effective in handling English preposition errors, we are beginning the process of building a machine learner for... more
We present a novel scheme for annotating the realization and ellipsis of Korean particles. Annotated data include 100,128 Ecel (a space-based word unit) in spoken and written corpora composed of four different genres in order to evaluate... more
We aim to sufficiently define annotation for post-positional particle errors in L2 Korean writing, so that future work on automatic particle error detection can make progress. To achieve this goal, we outline the linguistic properties of... more
Some examples of non-object TCs are given as follows. i Lazarusj-ka [ j shyophingha-ki ]-ka/ey swipta (Locative) Lazarus-nom do shopping-nml-nom/for easy 'Lazarus is easy to do shopping (in)' ii yenphilj-i [ j kulssi-lul sseu-ki ]-ka/ey... more
We further work on detecting errors in postpositional particle usage by learners of Korean by improving the training data and developing a complete pipeline of particle selection. We improve the data by filtering non-Korean data and... more
This paper presents preliminary work on a corpus-based study of Korean demonstratives. Through the development of an annotation scheme and the use of spoken and written corpora, we aim to determine different functions of demonstratives... more
We present a novel scheme for annotating the realization and ellipsis of Korean particles. Annotated data include 100,128 Ecel (a spacebased word unit) in spoken and written corpora composed of four different genres in order to evaluate... more