Academia.eduAcademia.edu

Multilingual Generation of Grammatical Categories

We present an interlingual semantic representation for the synthesis of morphological aspect of English and Russian by standard backpropagation. Grammatical meanings are represented symbolically and translated into a binary representation. Generalization is assessed by test sentences and by a translation of the training sentences of the other language. The results are relevant to machine translation in a hybrid systems approach and to the study of linguistic category formation.

Multilingual Generation of Grammatical Categories Gabriele Scheler Institut fur Informatik Technische Universitat Munchen 80290 Munchen scheler@informatik.tu-muenchen.de March 31, 1994 Abstract We present an interlingual semantic representation for the synthesis of morphological aspect of English and Russian by standard backpropagation. Grammatical meanings are represented symbolically and translated into a binary representation. Generalization is assessed by test sentences and by a translation of the training sentences of the other language. The results are relevant to machine translation in a hybrid systems approach and to the study of linguistic category formation. 1 Introduction In this paper, we propose a representation for grammatical meanings which are relevant to the morphological forms progressive/simple in English and imperfective/perfective in Russian. We show that these grammatical meanings are sucient to predict the correct aspectual form in the generation of an English or Russian sentence by presenting a classi er which computes the necessary input-output function. The representation has the symbolic form of a number of distinct features and values for these features. Descriptive work, exploring aspectual distinctions in many di erent contexts and collecting di erent uses and conveyed meanings, has sometimes led to a formulation of aspectual meanings in terms of binary feature sets [Bondarko, 1983], [Breu, 1980], [Scheler, 1984]. Being the outcome rather than the starting point of research, these feature sets however were not validated 1 for their capability of producing and predicting the correct morphology for a given sentence. In model-theoretic approaches to aspectual semantics [Hinrichs, 1988], [Dowty, 1979], [Verkuyl, 1993] the main e ort consists in constructing logical representations for aspectual meanings. These representations are usually constructed for individual sentences, without paying attention to the functional dependency between overt morphological form and semantic representation and its computability. We use simple feature-value representations instead of model-theoretic constructs for two reasons:  we want the morphological form to be e ectively computable from the semantic representation.  we apply Occam's razor in the cybernetic formulation of Braitenberg [Braitenberg, 1984]: Attribute a complex behavior to simple internal representations. As long as the behavior is predicted correctly, stay with it. Our work is similar in intention to work done by Munro et al. [Munro and Tabasko, 1991], [Munro et al., 1991] on translation of spatial prepositions. Munro et al. use an interlingual approach to machine translation, designing a semantic representation, translating overt morphological forms (English prepositions) into it, and then translating new morphological forms (German prepositions) from the representation, using backpropagation for the translations. However there are major di erences. Generalization of the learned function is not attempted by Munro et al., which is a serious aw, for practical application as well as theoretically. The learned classi er can translate the given examples by essentially looking up a function table, but it cannot predict the correct translation in a new context. The failure of prediction is probably due to the somewhat arbitrary character of semantic relations (e.g., \at" gets a context-independent meaning as always \N1 touching N2") and the form of the input representation which is limited to prepositions in the form \Noun-Prep-Noun" where the nouns are not tagged or categorized to increase the scope of the input forms. In contrast, we have a system for the categorization of English and Russian aspect, capable in principle of handling all uses. The semantic representation is learnable, from free, albeit tagged text. We have generalization as a vital aspect to show that a functional dependence between semantic representation and grammatical form exists, not just a learned function table of correspondences. 2 2 Feature-based Representation The complete analysis for the translation of aspectual forms consists of two parts: (a) transforming a binary input representation into a semantic representation, which is a task for a method of function approximation, and (b) categorizing the semantic representation consisting of a number of individual grammatical meanings into the language-speci c grammatical categories, which is a pattern classi cation task. Figure 1 shows a whole system for the translation of aspects. source text ? tagged syntactic representation encoding ? binary source text representation set of grammatical meanings - encoding ? functional approximation binary feature vector ? pattern classi cation language-speci c grammatical category assignment ? target text Figure 1: Learning grammatical category assignment The analysis proceeds as follows: - Set up a number of semantic features, which determine grammatical aspect and which are derivable from input texts, e.g. \habituality" , possible overt expression in English text: \usually" , \used to", \every year", \on Sundays" etc. - Determine values for those features, e.g. \habitual", \non-habitual". - Tag English or Russian input sentences with grammatical markers, which provide an initial classi cation in particular of verb lexemes and temporal adverbials. 3 - Find a functional mapping between tagged input sentence and feature representation. - Find a functional mapping (classi cation) of feature representation to two categories, either English progressive/simple, or Russian imperfective/ perfective. The rst part of the task, namely the translation of a syntactic representation into a semantic representation is described in a companion paper, [Scheler, 1994a]. In the following, we concentrate on the pattern classi cation task, and the role of the set of grammatical meanings, which are important for translation as well as for more theoretical considerations of linguistic category formation. The set of grammatical meanings used is given in Appendix A. They have been obtained by heuristic setting and ne tuning in the following way: Grammatical meanings which have been claimed in the literature ([Scheler, 1984] and further references therein) as relevant for the selection of a morphological aspect have been set up as unde ned semantic feature values. These were ordered into mutually exclusive groups, which are called \features". The resulting feature sets can be ne-tuned by adding features/values until learning is perfect for the training set. It is also possible to use feature selection techniques for elimination of super uous features/values (cf. [Duda and Hart, 1973]). Note that features have to be functionally implicated in the selection of a morphological aspect to improve learning for the training set. Elimination of features may be important for generalization to new patterns, to exclude generalizing on irrelevant information. Thus we have an empirical test for the validity of a feature set. Pattern classi cation has been performed with standard backpropagation, which is a powerful method of statistical regression [White, 1993], [Hornik, 1993], [Hornik et al., 1989], [Rumelhart et al., 1986]. 3 Experiments and Results In this section we present the results of the experiments on categorization. Experiments concerning the transformation of tagged input to semantic representation are reported in [Scheler, 1994a]. An example for a tagged input representation is given in Figure 2. Standard backpropagation provides a method of pattern classi cation for binary or numeric patterns. We have used the SNNS-simulator [Zell and others, 1993]. For the input representation several binary codings of the symbolic feature representation for individual input sentences have been created. An 4 additional value for each feature '*' as \neutral" has been provided for. This is particularly useful in the generation of the semantic representations from a source text, because there will usually not be enough overt grammatical indicators to set each feature to a de nite value. An example for an English sentence with symbolic and binary feature representations is given in Figure 2. The binary codings used were: sentence: I go to church every Sunday. tagged syntactic representation: * on periods sing def lexcat7 simple present to place. symbolic feature representation: * * * non relational holistic * * all periods action general telic * * single habitual simple 0 0' 0 0 0' 0 0' 1 1' 1 1' 0 0' 0 0 0' 0 1' 0 1' 1 1' 0 1 0' 0 0 0' 0 0' 0 1' 0 1' 0 0 0' 0 0 0 0 0' 0 0' 0 0 1' 0 0 1' 0 0 0' 0 0 0 0' 1 0' 1 0' 0 0 1' 0 1 0 0' 0 0 0 0' 0 0' 1 0' 1 0' 0 0 0' 0 0 0' 0 0 0' 0 1 1' 0 1 1' 0 0 0' 0 0 0' 0 0 1' 0 0 1' 0 1 1' 0 1 1' 1 0 0' 0 0 0' 0 0 1' 0 1 0 symbolic output: bincode: lincode: xedlength: Figure 2: Symbolic and binary feature representations binary code: for each feature (15), it was determined how many bits are necessary to code all possible values for this feature. For 2 and 3 values ( + '*') 2 bits are necessary, for 4 and 5 values 3 bits are necessary. The coding has a length of 11  2 + 4  3 = 34 bits.  linear code: each value for each feature is coded in 1-of-n fashion, i.e. n + 1 bits are needed for n feature values (+1 for '*'). The result is 44 bits.  binary code with xed length: for each feature the same number of bits is reserved ( xed length). The number of bits is determined by binary coding of that feature which has most values. In this case, 3  15 features = 45 bits. In this way for each language exemplary sentences, taken from [Thompson and Martinet, 1969] for English and [Chawronina and Sirocenskaja, 1976] for Russian have been coded. They are given with their symbolic feature  5 encoding in Appendix B. 20 sentences each were used as training set, and 13 English and 10 Russian sentences provided the test set. For generalization, the test set was applied without learning, as well as the training set for the respective other language to create a situation of translation. The results of the classi cation experiment, i.e. learning and generalization of grammatical category assignment, are presented in Table 1 and Table 2. The method used was \Standard Backpropagation" from SNNS, presentation of patterns in topological order, 5 hidden units, 2 or 3 output units, input units as indicated, learning rate  = 0:2. coding (input nodes) bincode (34) lincode (44) xedlength(45) learning correctness (abs=20) 100% 100% 100% test sentences (abs=23) 17/74% 21/91% 18/79% translation generalization total (abs=20) (abs=43) 12/60% 29/68% 17/85% 38/88% 15/70% 33/77% Table 1: Learning English morphological aspect coding (input nodes) bincode (34) lincode (44) xedlength(45) learning correctness (abs=20) 100% 100% 100% test sentences translation generalization total (abs=23) (abs=20) (abs=43) 19/82% 17/85% 36/84% 22/96% 19/95% 41/95% 18/78% 17/85% 35/81% Table 2: Learning Russian morphological aspect The results can be summarized as follows: (a) Learning of the training set was easy, fully possible and quick for each coding and both types of input. (b) Generalization was consistently better for Russian morphology. Various explanations are possible: The training sets were better for Russian; the feature set is more suitable for Russian aspect (e.g. English progressive requires more features for mood, which are not included); Russian aspect is used more consistently, while English progressive is embedded in a system of tenses and special forms ('kept doing', 'used to do'), which were not generated, and which may lead to incorrect choices of aspect. Further experiments are necessary to see whether it is a consistent phenomenon or an artefact of a speci c training situation. 6 (c) Generalization to new examples was present and can be considered successful. There was a fairly great di erence according to the coding used, the linear code being most appropriate for back-propagation pattern classi cation. However, binary codes may be easily learnable with another method (cf. [Scheler, 1992], [Scheler, 1994b] for an alternative approach to symbolic pattern classi cation). We can also see that a bad matching of coding to learning method may lead to a (false) rejection of learnability. This system may not have reached peak performance yet, either because the features need further ne tuning, or because we can improve the learning method. (d) There is not much di erence of generalization for additional test sentences and translations. This shows that category formation proceeds from arbitrary input patterns, as long as these are cognitively adequate and describe a possible state of a airs. The feature set is interlingual, no special language-speci c feature sets are required. How do the results on 63 sentences relate to the language faculty in general? There are 5 6  108 theoretically possible feature representations. However, much less will actually occur, because many feature value combinations are not permissible or contradictory. We must consider that many sentences will translate to identical semantic representations in terms of aspectually relevant grammatical meanings. Accordingly, generalization between 88%{ 95% from a limited set of 20 patterns to 43 arbitrary new patterns leads us to expect the learned classi er to scale up well to large corpora. : 4 Conclusion We have presented a feature representation for aspectual grammatical meanings and a learned classi er for English and Russian morphological aspect. The learned classi er can be incorporated into an existing machine translation system as a specialized module which should be capable of improving rule coded translation of grammatical aspect and reduce coding e ort for new languages considerably (only tagging input and back-propagation learning is required). We have seen that the quality of generalization is dependent on the suitability of coding and learning method used. This is an important caveat on the empirical interpretation of a statistics-based approach. Until we have better ways of judging which classi cation and learning techniques are actually available in human symbolic processing, we can interpret our results simply as showing that a \ at" feature-value representation is in principle sucient for generation of morphological aspect, and by extension grammatical categories in general. 7 The implication for the study of linguistic categorization is: rather than viewing the generation of grammatical categories as a single process relating a denotationally interpreted logical form to a morphological category, we look at it as a stepwise or continuous process from perception of events, cognitive constructs of events, objects and time, grammatical meanings to morphological form, using essentially the same techniques of categorization, feature representation and functional dependence for each step. Thus we may eventually link specialized linguistic investigation with a biologically inspired view of human cognition. A Grammatical Meanings: Features and Values 1. event-time past - present - future (3) 2. proximity (past) recent - hodiernal - hesternal - less than a year - more than a year (5) 3. proximity (for futuric action) immediately - not immediate (2) 4. relational relational to past - relational to present - non relational (3) 5. event-extension punctual (instantaneous) - extended (processual,durative) - holistic (3) 6. reference point in time (event:) occurs at - starts at - extends around (3) 7. reference period (event:) occurs at point in - occurs at period in - occurs at starting point - occurs at end point (4) 8. reference times/periods (event occurs at:) all of them (all/most) - some of them (2) 9. action-status action - non action(2) 10. reference-type existential (inde nite) - referential (de nite) - general (mass noun reference) (3) 8 11. event-type state - atelic event - telic event - cause state (4) 12. degree of completion attempt - halfway through - completed - negated (4) 13. duration long duration - limited duration (short duration) (2) 14. number of occurrences single - repeated (2) 15. habituality habitual - non habitual (2) B Symbolic feature representations B.1 English examples 1. At six o'clock I am bathing the baby. * * * non relational processual extends around * * action existential telic halfway through * single habitual 2. He is always doing homework. present * * non relational processual * * * action existential telic halfway through long duration single habitual 3. I taste salt in my porridge. present * * non relational punctual * * * non action referential state * * * non habitual 4. I've been hearing all about this accident from him. past recent * relational to present processual * * * non action existential state completed long duration single non habitual 5. Tom is thinking of emigrating. What do you think of it? present * * non relational processual * * * action existential state halfway through * repeated non habitual 6. The children are being very quiet. present * * non relational processual * * * non action referential state halfway through long duration single non habitual 7. I hear you well. present * * non relational processual extends around * * non action referential atelic halfway through * single non habitual 8. Their children are really very quiet. * * * non relational processual * * * non action general state * * * non habitual 9. I can't open the door, I am having a bath. 9 present * * non relational processual extends around * * action referential telic halfway through * single non habitual 10. Are you liking this excursion? No I'm hating it. present * * non relational processual extends around * * non action referential state halfway through long duration single non habitual 11. I hate excursions. * * * non relational processual * * * non action general state * * * non habitual 12. I don't expect much of him. present * * non relational processual * * * non action general state * * * non habitual 13. I am expecting a letter today. present * * non relational processual * occurs at period in * non action referential state halfway through * single non habitual 14. I usually wear a coat but I am not wearing one today. * * non relational processual * * * non action existential state * * * habitual 15. My neighbor is practising the violin, she usually practices at about this time. present * * non relational processual extends around * * action referential atelic halfway through * single non habitual 16. I go to church on Sundays. * * * non relational holistic * * all periods action general atelic * * single habitual 17. She goes abroad every year. * * * non relational holistic * * all periods action existential telic completed * single habitual 18. Birds don't build nests in autumn. * * * non relational holistic * * all periods action general telic negated * single non habitual 19. We leave London at 10am next Tuesday. future * not immediate non relational punctual occurs at occurs at point in * action referential cause state completed * single non habitual 20. He worked in that bank for four years. past more than a year * non relational holistic * occurs at period in * action existential atelic completed * single habitual 21. The wind was rising. past * * non relational processual * * * non action referential cause state halfway through * single non habitual 22. At 8 he was having breakfast. past * * non relational processual extends around * * action referential telic halfway through * single non habitual 23. He was always having breakfast at 8 in the morning. 10 past * * non relational processual extends around * * action existential telic halfway through * single habitual 24. When he saw me he put the receiver down. past * * non relational punctual occurs at * * action referential cause state completed * single non habitual 25. When he saw me he had just put the receiver down. past * * relational to past punctual occurs at * * action referential cause state completed * single non habitual 26. She heard voices and realized that there were three people in the next room. past * * non relational processual extends around * * non action referential state halfway through * * non habitual 27. She saw empty glasses and cups and realized that three people had been in the room. past * * relational to past holistic * * * non action existential state completed * * non habitual 28. He had been trying to get her on the phone. past * * relational to past punctual * * * action existential cause state completed * repeated non habitual 29. I was talking to Tom the other day. past longer * non relational processual * occurs at period in * action referential atelic halfway through long duration single non habitual 30. I talked to Tom several times. past * * non relational holistic * * some periods action existential atelic completed * repeated non habitual 31. I talked to Tom the other day. past longer * non relational processual * occurs at period in * action referential atelic completed * single non habitual 32. I have been talking to Tom several times. past * * relational to past holistic * * some periods action existential atelic halfway through * repeated non habitual 33. I am always tripping over this suitcase. present * * non relational punctual * * * action referential cause state completed * repeated non habitual B.2 Russian examples 1. Ja napisal pis'mo svojemu drugu i otpravil ego. past * * non relational holistic * * * action referential telic completed * single non habitual 2. Erik dolgo ucil slova is teksta. 11 past * * non relational processual * * * action referential telic halfway through long duration single non habitual 3. Kogda on vyucil ich, on nacal pisat' upraznenija. past * * non relational processual starts at * * action referential telic halfway through * single non habitual 4. Cto ty delala? Ja pisala pis'mo materi. past recent * non relational holistic * * * action referential telic halfway through * single non habitual 5. A upraznenija ty uze napisala ? past recent * relational to present holistic * * * action existential telic completed * single non habitual 6. Rabocie postroili novuju skolu i osen'ju ona nacala rabotat'. past longer * non relational holistic * * * action existential telic completed * single non habitual 7. Oni stroili ee polgoda. past longer * non relational processual * occurs at period in * action referential telic halfway through long duration single non habitual 8. Ja zaplatila den'gi i vzjala gazety. past * * non relational punctual * * * action referential cause state completed * single non habitual 9. On vspomnil pravilo i choroso otvetil na vopros prepodavatelja. past recent * non relational punctual * * * action existential cause state completed * single non habitual 10. Obycno ja beru knigi v universitetskoj biblioteke. * * * non relational holistic * * some periods action referential cause state * * single habitual 11. Kazdyj mesjac on posylaet den'gi svoej materi. * * * non relational holistic * * all periods action existential cause state completed * single habitual 12. Eta studentka nikogda ne opazdyvaet. * * * non relational punctual * * all periods action existential cause state negated * single habitual 13. No segodnja ona oposdala na pjat' minut. past hodiernal * non relational punctual * occurs at point in * action referential cause state completed * single non habitual 14. Vcera ja ocen' ustala, poetomu segodnja ja vstala posdno. past hesternal * non relational holistic * occurs at period in * non action referential cause state completed * single non habitual 15. V detstve on choroso risoval i vse govorili cto on budet chudosnikom. 12 past more than a year * non relational holistic * occurs at period in * action existential atelic * * * habitual 16. Kogda ja byla malen'koj, otec casto daril mne knigi. past more than a year * non relational punctual * occurs at point in some periods action existential cause state completed * single habitual 17. Zavtra ja pojdu v magazin i kuplju sebe pal'to. future * not immediate non relational punctual * occurs at point in * action referential cause state completed * single non habitual 18. Vcera vecerom ja smotrela televizor a on pisal pis'mo bratu. past hesternal * non relational processual * occurs at period in * action referential telic halfway through * single non habitual 19. Kogda on napisal eto pis'mo, on posel na poctu. past hesternal * non relational holistic starts at * * action referential telic completed * single non habitual 20. Ja poterala tvoj adres, poetomy ja ne otvecala na tvoje pis'mo. * * * relational to present punctual * * * action existential cause state negated * single non habitual 21. Kazdyj den' nasi zanjatija koncajutsja v 3 casa. * * * non relational punctual occurs at * * non action general cause state halfway through * single habitual 22. Vcera posle urokov moj drug Erik bystro poobedal, a potom celyj cas cital gazety. past hesternal * non relational processual * occurs at period in * action referential telic completed limited duration single non habitual 23. Vcera posle urokov moj drug Erik bystro poobedal, a potom celyj cas cital gazety. past hesternal * non relational processual * occurs at period in * action referential telic halfway through long duration single non habitual 24. Kogda Erik koncil delat' domasnee zadanie, on posel v klub. past * * non relational punctual occurs at * * action referential cause state completed * single non habitual 25. Segodnja utrom, kogda my pisali kontrol'nuju rabotu, on napisal ee lucse vsex. past hodiernal * non relational processual * occurs at period in * action referential telic halfway through * single non habitual 26. Segodnja utrom, kogda my pisali kontrol'nuju rabotu, on napisal ee lucse vsex. past hodiernal * non relational holistic * occurs at period in * action existential telic completed * single non habitual 27. Kogda ja byla malen'kaja, moja babuska casto rasskazyvala mne raznye skazki. past more than a year * non relational processual * occurs at period in some periods action general atelic completed * single habitual 13 28. Starsij brat podumal, cto u mladsego brata bol'saja sem'ja i resil emu pomoc'. past * * non relational punctual * * * non action referential state completed * single non habitual 29. Starsij brat podumal, cto u mladsego brata bol'saja sem'ja i resil emu pomoc'. past * * non relational punctual * * * action referential cause state completed * single non habitual 30. Kogda nastupila noc', on vzjal mesok risa i posel v derevnju, gde zil ego brat. past * * relational to past holistic * * * non action referential cause state completed * single non habitual References [Bondarko, 1983] A.V. Bondarko. Principy funkcional'noj grammatiki i voprosy aspektologii. Nauka, 1983. [Braitenberg, 1984] Valentin Braitenberg. Vehicles: Experiments in Synthetic Psychology. MIT Press, 1984. [Breu, 1980] Walter Breu. Semantische Untersuchungen zum Verbalaspekt im Russischen. Slavistische Beitrage 137. Otto Sagner, 1980. [Chawronina and Sirocenskaja, 1976] S. Chawronina and A. Sirocenskaja. Russkij jazyk v upraznenijach. Russkij jazyk, 1976. [Dowty, 1979] David Dowty. Word Meaning and Montague Grammar. Synthese Language Library 7. Reidel, 1979. [Duda and Hart, 1973] Richard O. Duda and Peter E. Hart. Pattern Classication and Scene Analysis. John Wiley, 1973. [Hinrichs, 1988] Erhard W. Hinrichs. Tense, quanti ers, and contexts. In Special Issue on Tense and Aspect. Computational Linguistics, 1988. [Hornik et al., 1989] Hornik, Stinchcombe, and White. Multilayer feedforward networks are universal approximators. Neural Networks 2, pages 359{366, 1989. [Hornik, 1993] Kurt Hornik. Some new results on neural network approximation. Neural Networks 6, pages 1069{1072, 1993. 14 [Munro and Tabasko, 1991] P. Munro and M. Tabasko. Translating locative prepositions. In Proceedings of NIPS-91, volume 3, pages 598{604, 1991. [Munro et al., 1991] P. Munro, C. Cosic, and M. Tabasko. A network for encoding, decoding, and translating locative prepositions. Connection Science, 3:225{240, 1991. [Rumelhart et al., 1986] D. Rumelhart, G. Hinton, and R. Williams. Learning internal representations by error propagation. In J. McClelland and D. Rumelhart, editors, Parallel Distributed Processing, volume 1, chapter 8. MIT Press, 1986. [Scheler, 1984] Gabriele Scheler. Zur Semantik von Tempus und Aspekt, insbesondere des Russischen. Master's thesis, LMU, Munchen, April 1984. [Scheler, 1992] G. Scheler. The use of an adaptive distance measure in generalizing pattern learning. In J.Taylor I. Aleksander, editor, Arti cial Neural Networks,2, volume 1, pages 131{135. North Holland, 1992. [Scheler, 1994a] Gabriele Scheler. Extracting semantic features for aspectual meanings from a syntactic representation using neural networks. Technical report, Technische Universitat Munchen, 1994. [Scheler, 1994b] Gabriele Scheler. Pattern classi cation with adaptive distance measures. Technical Report FKI-188-94, Technische Universitat Munchen, January 1994. [Thompson and Martinet, 1969] A.J. Thompson and A.V. Martinet. A Practical English Grammar. Oxford University Press, 1969. [Verkuyl, 1993] Henk J. Verkuyl. A theory of aspectuality. The interaction between temporal and atemporal structure. Cambridge Studies in Linguistics 64. Cambridge University Press, 1993. [White, 1993] Halbert White. Arti cial Neural Networks: Approximation and Learning Theory. Blackwell, 1993. [Zell and others, 1993] Andreas Zell et al. Snns User Manual v. 3.1. Universitat Stuttgart: Institute for parallel and distributed high-performance systems, 1993. 15