PACLIC-27
Myths in Korean Morphology and Their Computational
Implications
Hee-Rahk Chae
Department of Linguistics and Cognitive Science
Hankuk University of Foreign Studies
Yongin, Gyeonggi, 449-791, Korea
hrchae@hufs.ac.kr
Abstract
This paper examines some popular
misanalyses in Korean morphology. For
example, contrary to popular myth, the
verbal ha- and the element -(nu)n- cannot
be analyzed as a derivational affix and as
a present tense marker, respectively. We
will see that ha- is an independent word
and that -(nu)n- is part of a portmanteau
morph. In providing reasonable analyses
of them, we will consider some
computational implications of the
misanalyses. It is really mysterious that
such wrong analyses can become so
popular in a scientific field of linguistics.
1
Introduction
This paper aims at examining some popular
misanalyses in Korean morphology. Focusing on
the verbal ha- and what is called the present
tense marker -(nu)n-, we will see that, contrary to
popular myth, they cannot be analyzed as a
derivational affix and as a present tense marker,
respectively. In providing reasonable analyses of
them, we will consider some implications of the
misanalyses, especially from a computational
point of view.
Most Korean linguists assume that the ha- in
kongpwu-ha- (‘to study’), for example, is a
derivational affix and, hence, kongpwu-ha- as a
whole is a verb.1 However, as we can see shortly,
ha- itself is an independent word and [kongpwu
ha-] is a phrase. More Korean linguists assume
that the element -(nu)n- is a present tense marker.
However, the Korean tense system becomes far
simpler, if we assume that the present tense
marker is null (-ø-) rather than -(nu)n-.
2
The Morpho-syntactic Status of Some
Dependent Elements
As an agglutinative language, Korean has rather
complex structures of word-like expressions.
Hence, it is not always easy to determine the
1
Noticeable exceptions are Song (1967: 64-71), Suh
(1991: 486, 1994: 578, 1996: 346), Chae (1996) and
some others. They have shown, for example, that ha- in
kongpwu-ha- cannot be a derivational affix and that
[kongpwu ha-] and [kongpwu-lul ha-] are realizations of
the same syntactic structure.
The Japanese counterpart of the Korean ha- is suru.
The unit of verbal noun plus suru is also regarded as a
word by most Japanese linguists. However, this is very
dubious.
a) bengkyou-bakari/wa/... suru
study
-only/Contr do
b) ??bengkyou yoku/nagaku/... suru
well/long time
c) [bengkyou-to undou]-bakari/wa/... suru
-and exercise
Although it is not very natural for such independent
words as yoku and nagaku to come between the two
elements, as we can see in (b), delimiters like -bakari
and -wa are allowed as in (a). In addition, the verbal
noun before suru can be conjoined, as we can see in (c).
These facts show that bengkyou-suru is not a word but a
phrase (and, hence, such expressions should not be
registered as head words in dictionaries).
505
Copyright 2013 by Hee-Rahk Chae
27th Pacific Asia Conference on Language, Information and Computation
pages 505-511
PACLIC-27
not been duly appreciated in the tradition of
Korean linguistics (cf. Chae and No 1998: sec.
III, Chae 2007: sec. II). According to Chae
(2007), all the members of postpositions and
delimiters are clitics, and nouns, adjectives (or
descriptive verbs), adnominals and adverbs have
clitic members as well as regular members.
Based on these observations, he provides a new
classification system of parts of speech in Korean.
This system comprises not only regular words
but also clitics because both of them are words
syntactically.
Taking clitics into consideration, we can
distinguish three different types of dependent
elements: derivational affixes (DA), inflectional
affixes (IA) and clitics.
morpho-syntactic status of a dependent element,
whether it is a derivational affix, an inflectional
affix or something else.
When a root/stem and another element which
seems to be dependent on it stand next to each
other, the dependent element can usually be
analyzed either as a derivational affix or as an
inflectional affix. In Korean, however, many
such elements cannot be analyzed as either of
them. For example, postpositions are neither
derivational affixes nor inflectional affixes (Chae
and No 1998: 73).2
(1) [[nae-ka nol-te-n]
kos-eyse
I-Nom play-Retro-Rel place-at
chac-ass-ta]
find-Past-Decl
‘(I) found (it) in the place where I used
to play.’
(2) [[Xroot-DA]stem-IA] - Clitics … (Words)
The former two constitute parts of words, while
the latter, i.e. clitics, are words themselves even
though they are dependent on neighboring
elements phonologically. Among the two wordinternal elements, derivational affixes are more
closely related to their roots than inflectional
affixes to their stems.
It is rather unfortunate that clitics have not
been seriously taken into account in analyzing
Korean sentences, which means that the very
building blocks of sentences, i.e. (regular and
clitic) words, have not been recognized properly.
Of course, the main reason for this unfortunate
tradition is due to the fact that clitics are not
independent phonologically. That is, the very
nature of the language itself is partly responsible
for such a tradition.
It is not easily understandable, however, that
many regular words are also considered as
dependent elements in Korean. Firstly, such
expressions as the following are assumed to be
compounds (Lee 2005: 44).
The postposition -eyse is not a derivational affix.
If it is, we have to assume that the relative clause
[nae-ka nol-ten] in (1) modifies an adverb (i.e.
kos-eyse) rather than a noun (i.e. kos). It is very
clear that relative clauses cannot modify adverbs.
Postpositions, including -eyse, cannot be
analyzed as inflectional affixes, either. Firstly,
they make nominal expressions have adverbial
functions. Although it is true that some nouns
have adverbial functions (especially, those which
represent time or space), it would be very
unnatural to argue that the “inflected forms” of
pronouns and proper nouns can have all the
adverbial functions which are expressed by the
postpositions. Secondly, the whole range of
different postpositions is not likely to form an
inflectional paradigm. There are more than ten
atomic postpositions and more than dozens of
(even hundreds of) complex postpositions in
Korean.
Elements like postpositions can best be
analyzed as clitics, 3 i.e. those units which are
separate words syntactically but are not
independent phonologically. Korean has a
variety of clitics. However, their existence has
(3) a. nach-sel-ta,
pich-na-ta
face-[]-Decl
light-[]-Decl
‘to be unfamiliar’ ‘to shine’
b. nach-i (manhi) sel-ta, pich-i na-nta
-Nom
-Nom
2
The abbreviations used for grammatical terms in this
paper are as follows. Nom: Nominative, Acc:
Accusative, Retro: Retrospective, Rel: Relativizer, Past:
Past Tense, Pres: Present Tense, Decl: Declarative,
Progr: Progressive.
3
Clitics are “grammatical units with some properties of
inflectional morphology and some of independent
words” (Zwicky and Pullum 1983, Zwicky 1985). They
have the former properties as far as phonological
phenomena are concerned and the latter properties when
syntactic phenomena are concerned.
It may be true that the predicates in such verbal
expressions as those in (a) have some degree of
idiomatic meanings. However, (the degree of)
idiomaticity has nothing to do with the morphosyntactic status of expressions (cf. Roh 2013: 37).
Please note that, as we can see in (b), the
nominative marker -i can be attached to the noun
506
PACLIC-27
before the predicate. In addition, such adverbs as
manhi ‘many/much’ can be inserted between the
noun and the predicate. These facts clearly
indicate that the expressions in (a) are all phrases
rather than compound words. Secondly, such
verbal elements as ha-, toy- and sikhi- are
assumed to be derivational affixes not only in
most Korean grammar books and dictionaries but
also in most research papers (cf. footnote 1).
3
The Verbal ha-
In this section, we will firstly examine the
morpho-syntactic status of the verbal ha-. Then,
we will consider what kinds of implications the
popular misanalysis has for automatic analyses.
3.1
The Morpho-syntactic Status
The agglutinative nature of Korean makes it
difficult to distinguish between word-internal
elements like (derivational and inflectional)
affixes and word-external elements like clitics.
What makes the belief that the verbal ha- is a
derivational affix be mysterious is that it is not
even a clitic but a wholly independent word. Let
us examine the following examples:
(4) a. phakoy-ha-ta
destruction-do-Decl ‘to destroy’
b. phakoy-toy-ta
-become ‘to be destroyed’
c. phakoy-sikhi-ta
-let … do ‘to (let …) destroy’
(6) cyon-i
kongpwu-ha-ko
John-Nom study
do-Progr
iss-ta
be-(Pres)-Decl
‘John is studying.’
(5) a. phakoy-lul ha-ta
-Acc
b. phakoy-ka toy-ta
-Nom
c. phakoy-lul sikhi-ta
-Acc
(7) cyon-i kongpwu(-lul) cal/manhi/…
-Acc well/much/…
ha-ko iss-ta
‘John is studying well/much/…’
As they are analyzed as derivational affixes, all
the expressions in (4) are regarded as verbs
rather than verb phrases. However, they cannot
be verbs as we can see from the data in (5),
which show that accusative or nominative
markers, which can only come at the end of
object/subject phrases, can be inserted in
between.
Among the numerous examples of misanalyses
(cf. Chae 2010), the type in (4) is the least
expected one, because a regular word is analyzed
as a derivational affix. Regular words are
completely independent from the preceding
root/word and, hence, they do not belong to the
dependent elements listed in (2). They are more
independent units than clitics. Derivational
affixes are the least independent from its root.
There is another unexpected type of misanalysis.
In this type part of a word which cannot be a
separate morpheme is analyzed as one. Although
there are not many examples of this type, it is
also unusual in the sense that morphemes are not
difficult to factor out, especially in an
agglutinative language. In the remaining sections
of this paper, we will focus on only one example
from each of these two types of misanalyses: the
“light verb” ha- and the assumed present tense
marker -(nu)n-. We will not only elucidate their
morpho-syntactic statuses but also consider
computational implications of the misanalysis.
Judging from the data in (7), which show that
external elements can be inserted between
kongpwu and ha-, it becomes clear that ha- is a
word and [kongpwu ha-] is a phrase. That is,
kongpwu and ha- are two independent words
(Song 1967, Suh 1991, Chae 1996, Chae and
Chong 2011, among others). Firstly, the
accusative marker -(l)ul can be inserted between
them. Secondly, such adverbs as cal and manhi
can also be inserted between them freely. We do
not need any more evidence to establish the
morpho-syntactic status of ha- as an independent
word.
Those who take the wordhood of kongpwuha- for granted argue that such expressions as
[kongpwu cal ha-] are derived from the phrase
[kongpwu-lul ha-], deleting the accusative
marker -lul and adding the adverb cal. Under this
kind of argumentation, it is assumed that
[kongpwu cal ha-] has nothing to do with the
“word” kongpwu-ha-. However, there are serious
problems with such an approach. First of all, it is
not understandable at all that kongpwu-ha- does
not have any (formal) relationship with
[kongpwu-lul ha-] or [kongpwu cal ha-]. These
latter expressions have no special meanings
different from that of the former expression,
507
PACLIC-27
except that they contain -lul and cal, respectively.
Secondly, the argument is not falsifiable, which
leads to a non-scientific research. It is not
falsifiable because all units of [NP V] can be
argued to be words rather than phrases:
(8) a. pap-ul (cal) mek-ta
rice-Acc well eat-Decl
b. pap (cal) mek-ta
‘to eat boiled rice (well)’
(9) a. hakkyo-ey (cacu) ka-ta
school-to often go-Decl
b. hakkyo (cacu) ka-ta
‘to go to school (often)’
If kongpwu-ha- is argued to be a word despite
such expressions as [kongpwu-lul ha-] and
[kongpwu cal ha-], 4 it can also be argued that
[pap mek-] in (8b) and [hakkyo ka-] in (9b) are
words rather than phrases. Under this kind of
argumentation, we can say that [pap cal mek-]
and [hakkyo cacu ka-] are derived from [pap-ul
mek-] in (8a) and [hakkyo-ey ka-] in (9a),
respectively, rather than from the “words” [pap
mek-] and [hakkyo ka-]. However, even those
who assume that kongpwu-ha- is a word will not
accept that [pap mek-] and [hakkyo ka-] are
words.
the only difference between them is due to the
(non-)existence of the adverb cal, which is
impossible to capture under the popular approach.
Secondly, it is very difficult, though may not be
impossible, to capture the semantic relationship
between the two expressions. Thirdly, all the
lexical entries involved have to be registered
twice, leading to a significant amount of
redundancy (Chae 2010). Although kongpwu-hais registered in the dictionary, kongpwu and hahave to be registered as well. Notice that these
words appear in the phrase [kongpwu cal ha-], in
which the adverb cal is in between the two words.
Lastly, the system will produce two different
analyses of kongpwu-ha-: as a lexical item and as
a syntactic construct. As we have kongpwu and
ha- as separate lexical items, there is no
reasonable way of preventing the combination of
them to produce [kongpwu ha-], which is the
same as the lexical item kongpwu-ha-.
We have seen problems with only one
example. From a computational point of view,
the sheer number of ha-expressions in Korean
makes the popular misanalysis more difficult to
maintain. It may be the case that expressions
containing ha- would be more than half of the
whole verbal expressions in representative
Korean corpora.
4
3.2
Computational Implications
In this section, we will examine the behavior of
the verbal element -(nu)n-. Although it is usually
assumed to be a present tense marker, the
assumption is based on superficial observations.
A more careful observation will lead to the
conclusion that the present tense marker, more
accurately, the non-past tense marker is null (-ø-)
rather than -(nu)n-. Of course, there are some
previous works which argue for this position like
Kang (1988), Suh (1994) and others. However,
the argument has not been taken seriously in
Korean linguistics, just like that for the
wordhood of ha- in kongpwu-ha- (cf. footnote 1).
If we cannot factor out a regular word ha- from
expressions like kongpwu-ha-, we cannot provide
a systematic analysis of the expressions
containing it. In that case, kongpwu-ha- and
[kongpwu cal ha-], for example, can only be
analyzed with reference to two unrelated
mechanisms. The former should be listed in the
dictionary because it is assumed to be a word.
The latter, on the other hand, should be treated in
the syntactic component on the basis of the three
lexical items kongpwu, cal, and ha- and relevant
syntactic rules and/or principles.
The situation becomes more serious in
automatic analyses than in manual analyses. First
of all, it is impossible to capture any formal
relationships
between
kongpwu-haand
[kongpwu cal ha-], because they are outputs of
two different components and they do not even
share any lexical items. However, it is clear that
4
The Verbal Element -(nu)n-
4.1
The Morpho-syntactic Status
The popular belief that -(nu)n- is a present tense
marker is based on such data as the following:5
One might argue that the verbal ha- cannot be regarded
as an independent word because it does not have its own
meaning. However, semantic facts do not necessarily go
together with morpho-syntactic facts. That is, the
meaning of a unit cannot tell whether it is a word or not.
5
508
The verbal marker -(nu)n has two variants: -nun after a
verb ending in a consonant and -n after a verb ending in
a vowel.
PACLIC-27
Korean verbal endings have different forms
according to sentence type and speech level.
There are at least four different sentence types:
declaratives, interrogatives, directives and
propositives. There are six different speech levels,
from the least formal to the most formal. Among
the twenty four possible combinations of the two
grammatical categories, only one combination
requires the element -n- or -nun-: that of the
declarative sentence6 and the (least formal) plain
level sentence, as we can see in (10a) and (11a).
The element does not appear in the other
combinations. As we can see in (13), it cannot
combine with the interrogative, directive or
propositive ending, even when the speech level
concerned is the plain level. In addition, as we
can see in (14), it cannot combine with any of the
other speech level endings.
We can easily solve these problems if we
assume that the non-past tense marker is -ø-.
Under this assumption, the variants of -(nu)n-, i.e.
-n- and -nun-, are just parts of the (present)
declarative endings of verbs in the plain speech
level. That is, we can assume that -nta and -nunta
are “portmanteau” morphs, 7 i.e. those morphs
which can be analyzed into more than one
morpheme (Crystal 1980, Spencer 1991). 8 The
(10) a. cyon-i
cip-ey ka-n-ta
John-Nom house-to go-Pres-Decl
b. cyon-i cip-ey ka-ass-ta
-Past
‘John goes/went home.’
(11) a. cyon-i
pap-ul mek-nun-ta
John-Nom rice-Acc eat-Pres-Decl
b. cyon-i pap-ul mek-ess-ta
-Past
‘John eats/ate boiled rice.’
When we compare the two sentences in (10) and
in (11), it seems to be very obvious that -(nu)n- is
in a paradigmatic relation with the past tense
marker -ass/ess.
However, if we observe the behavior of the
element -(nu)n- more carefully, we will see that
there are many problems with the popular belief.
First of all, -(nu)n- is not actually in a
paradigmatic relation with the past tense marker.
(12) a. ka(*-n)-keyss-ta
Go -Modality-Decl
ka(-ass)-keyss-ta
b. mek(*-nun)-keyss-ta
eat
mek(-ess)-keyss-ta
6
The past tense marker can occur before the
irrealis modality marker -keyss, but the assumed
present tense marker cannot.
Secondly, the distribution of -(nu)n- is very
limited:
What seems to be “exclamative endings,” among others,
also contain -nun-.
a) cip-ey ka-nunkwuna/nunkwun.
house-to go-Ending
‘(He/She) does go home!’
b) cal mek-nunkwuna/nunkwun.
well eat-Ending
‘How well (he/she) eats!’
c) san-i
khu/cak-kwuna/kwun.
mountain-Nom be big/small-Ending
‘How big/small the mountain is!’
(13) a. ka(*-n)-(nu)nya, mek(*-nun)-(nu)nya
-Interrogative
b. ka(*-n)-kela, mek(*-nun)-ela
-Directive
c. ka(*-n)-ca, mek(*-nun)-ca
-Propositive
Compared with the endings after adjectives (or
descriptive verbs) in (c), those after verbs have the extra
element -nun- in (a-b). However, there is enough
evidence to show that Korean does not have a separate
sentence type of exclamative. What seems to be
exclamative sentences have the formal properties of
declarative sentences. Hence, the sentences above
belong to declaratives in Korean (Lee 2005: 170-171).
7
We are in line with Yongkyoon No’s assumption in “…
the selection from allomorphs -nunta/nta/ta …” (Chae
and No 1998: 91). He regards -nunta, -nta and -ta as
allomorphs of one and same morpheme.
8
Portmanteau morphs are defined/described in the
literature as follows: “A term used in morphological
analysis referring to cases where a single morph can be
analysed into more than one morpheme, …” (Crystal
1980: 276); “… the term portmanteau, which in this
context means type of fusion of two morphemes into
one. … we have four morphemes all realized by a single
portmanteau morph … In a portmanteau morph, then,
(14) a. ka(*-n/ok-ass)-a/e,
mek(*-nun/ok-ess)-e
b. ka(*-n/ok-ass)-ney,
mek(*-nun/ok-ess)-ney
c. ka(*-n/ok-ass)-o,
mek(*-nun/ok-ess)-uo/o
d. ka(*-n/ok-ass)-a/e-yo,
mek(*-nun/ok-ess)-e-yo
e. ka(*-n/ok-ass)-pnita/supnita,
mek(*-nun/ok-ess)-supnita
509
PACLIC-27
two portmanteau morphs indicate the present
tense of the plain level declarative sentence. The
former is used when the stem of the verb
concerned ends in a vowel, and the latter when it
ends in a consonant. The point here is that they
are indivisible morphs which contain not only
the information about the sentence type and the
sentence level but also the information about the
tense of the verb concerned.
Under the -ø-tense marker approach, -(nu)nis inseparable from the predicative ending -ta and,
hence, cannot take the position of tense markers.
In addition, the non-past and the past tense
markers take the same position:
(15) a. ka-ø-nta, mek-ø-nunta (cf. (10-11))
b. ka-ø-keyss-ta, mek-ø-keyss-ta
(cf. (12))
c. ka-ø-(nu)nya, mek-ø-(nu)nya
(cf. (13a))
d. ka-ø/ass-e, mek-ø/ess-e (cf. (14a))
As we can see from this reanalysis of the data in
(10-14), we can account for the ungrammatical
data in (12-14) very naturally. In (12), the
inseparable -(nu)n- and -ta are separated from
each other. In (13) and (14), -(nu)n- stands alone
without its inseparable “partner” -ta.
Before leaving this section, we need to
introduce a constraint, with reference to the
following data:
4.2
Computational Implications
As we have seen with reference to the data in
(13) and (14), among dozens of possible
combinations of speech level and sentence type,
only one combination of the plain level and the
declarative sentence requires -n- or -nun-. All the
other combinations cannot have the element.
Then, it would be very difficult to account for the
distribution of -(nu)n- computationally, if we
assume that it is a present tense marker. Please
notice that, as is shown in (14), the past tense
marker -ass/ess can occur in the position where
the element -(nu)n- is not allowed to occur.
When we deal with computational systems,
we have to consider the understanding process
and the productions process separately, just as
the two areas of speech recognition and speech
synthesis show. From an understanding point of
view, the traditional approach fails to interpret
many present tense forms. For example, ka-a and
mek-e are correct present tense forms, although
they do not have -(nu)n- (cf. (14)). From a
production point of view, the approach produces
a lot of ill-formed expressions: including all the
ill-formed ones in (12-14). It would not be easy
to filter out these expressions.
5
Conclusion
In this paper, we have surveyed some popular
misanalyses in Korean morphology, focusing on
two unexpected ones: the verbal ha- as a
derivational affix and the verbal element -(nu)nas a present tense marker. We have shown that
careful observations reveal that ha- is an
independent verb and that -nun- and -n- are parts
of portmanteau morphs rather than independent
morphemes themselves. It is really mysterious
that such wrong analyses can become so popular
in a scientific field of linguistics.
(16) a. *ka-ass-nunta, *mek-ess-nunta
b. *ka-ø-keyss-nunta,
*
mek-ø-keyss-nunta
In (a), although the past tense marker takes the
same position as that of the non-past tense
marker (cf. (15a)), the expressions concerned are
ungrammatical. They are ungrammatical just
because the portmanteau morph -nunta occurs
with the past tense marker. In (b), although the
morph -nunta occurs with the non-past marker,
the expressions are ungrammatical as well. We
need to postulate that the morph has to be
immediately preceded by the non-past tense
marker. Notice that this constraint accounts for
both types of data in (16).
Acknowledgments
We are thankful to the anonymous reviewers,
whose valuable comments have been very
helpful in improving the quality of this paper.
This work was supported by a 2013 research
grant from Hankuk University of Foreign Studies.
References
Chae, Hee-Rahk. 1996. Properties of ha- and
Light Predicate Constructions [written in
Korean]. Language Research, 32(3), 409-476.
several categories are realized by one surface formative,
an instance of a one-many correspondence between
form and function” (Spencer 1991: 50-51).
510
PACLIC-27
Chae, Hee-Rahk. 2007. Clitics and a
Classification of Parts of Speech in Korean
[written in Korean]. Korean Journal of
Linguistics, 32(4), 803-826.
Chae, Hee-Rahk. 2010. Basic Units of Lexicons
and Ontologies: Words, Senses and Concepts.
Proceedings of the 24th Pacific Asia
Conference on Language, Information and
Computation, 35-44, Tohoku University.
Chae, Hee-Rahk and Wuk-Jae Chong. 2011. A
Procedure for the Identification of Word
Units in Korean. Harvard Studies in Korean
Linguistics XIV, 67-76, Harvard-Yenching
Institute.
Chae, Hee-Rahk and Yongkyoon No. 1998. A
Survey of Morphological Issues in Korean:
Focusing
on
Syntactically
Relevant
Phenomena.
Korean
Linguistics
9,
International Circle of Korean Linguistics.
Crystal, David. 1980. A First Dictionary of
Linguistics and Phonetics. Blackwell.
Kang, Beom-mo. 1988. Functional Inheritance,
Anaphora, and Semantic Interpretation. Ph.D.
dissertation, Brown University.
Lee, Iksop. 2005. A Korean Grammar [written in
Korean]. Seoul National University Press.
Roh, Chang-Hwa. 2013. A Study of Verb
Sequences in Korean: Focusing on [V-e V]
Expressions [written in Korean]. Ph.D.
dissertation, Hankuk University of Foreign
Studies.
Sepencer, Andrew. 1991. Morphological Theory.
Blackwell.
Song,
Seok-Choong.
1967.
Some
Transformational Rules in Korean. Ph.D.
dissertation, Indiana University.
Suh, Cheong-Soo. 1991. On the Korean Verbs
Ha and TOY [written in Korean]. Language
Research 27(3), 481-505.
Suh, Cheong-Soo. 1994. Korean Grammar
[written in Korean]. The Deep-Rooted Tree
Publishing.
Suh, Chong-Soo. 1996. Contemporary Korean
Grammar [written in Korean]. Hanyang
University Press.
Zwicky, Arnold. M. 1985. Clitics and Particles.
Language, 61(2), 283-305.
Zwicky, Arnold M. and Geoffrey K. Pullum.
1983. Cliticization vs. Inflection: English n’t.
Language 59(3), 502-513.
511