The Design of A System For The Automatic Extraction of A Lexical Database Analogous To Wordnet From Raw Text
The Design of A System For The Automatic Extraction of A Lexical Database Analogous To Wordnet From Raw Text
The Design of A System For The Automatic Extraction of A Lexical Database Analogous To Wordnet From Raw Text
1.
INTRODUCTION
WordNet [8] is a large lexical database of English where nouns, verbs, adjectives and adverbs are grouped into sets of synonyms (synsets), each expressing a distinct concept (see Figure 1 for a sample entry relating to the word motion). Synsets are interlinked by means of conceptual-semantic and lexical relations. WordNet has become an invaluable resource for processing semantic aspects of language, and it has served This research was supported by a Marie Curie Intra European Fellowship within the 7th European Community Framework Programme.
Figure 1: WordNet entry for the word motion. veloped, among them LexInfo [3], LingInfo [4], LexOnto [7], Linguistic Information Repository (LIR [20]), and Linguistic Watermark Suite (LWS [25]). However, these resources are too recent to be able to claim a status similar to WordNet. WordNet has been developed and optimized over decades, it has been scrutinized and used by a large community of scientists and, despite some criticism, the validity of its underlying principles is widely acknowledged. This is the main reason why the work descibed here is based on WordNet rather than on one of the above mentioned ontologies. Another reason is that WordNet appears to be a more direct representation of native speakers intuitions about language than most ontologies. Therefore, when working with it, it should be somewhat more straightforward to draw conclusions concerning human cognition. As far as applicable, we nevertheless take experiences from ontology building into account (for an overview see [2]). The focus of this work is to generate Lexical Databases as similar as possible to the existing manually created WordNets in a way that is easily adaptable to other languages for which no WordNets exist yet. The evaluation is conducted in a direct way by taking the existing WordNet and experimental data on word synonymy as a gold standard, and by comparing the generated data to this gold standard. An alternative would be to conduct an indirect evaluation by comparing the performance of the automatically generated WordNet to the performance of existing WordNets in some applicative areas, such as word sense disambiguation and information retrieval. However, applications are too numerous to consider one (or a few) of them as authoritative. For example, Rosenzweig et al. [34] list 868 papers of which many describe applications of WordNet. For such reasons we suggest here the automatic construction of a general purpose WordNet which is not optimized with respect to a specic application. However, looking at specic applications would be a logical next step following the work described here. Let us mention that although there has been quite some work on specic aspects of what is described here, to our knowledge no previous attempt to automatically create a WordNet-like system using state-of-the-art modules for lexical acquisition has been fully completed, with the resulting lexical database being nalized and published.1 However,
1
variations of a completely dierent method for automatically creating WordNets have been described and put into practice many times. They are based on the idea that a raw version of a WordNet for a new language can be created by simply translating an existing WordNet of a closely related language. Recent examples are [40] for Thai and WOLF for French.2 However, this methodology is rather unrelated to what we investigate here. It does not have as much cognitive plausibility, cannot produce WordNets specic to the genre of a particular corpus, and can only be applied if a WordNet of a related language is available.
http://alpage.inria.fr/sagot/wolf-en.html http://wordnet.cs.princeton.edu/downloads.html
Table 1: Performances for TOEFL synonym test Description Score Reference Random guessing. Four alter- 25.00% Rapp [30] nativesof which one is correct Average non-English US col64.50% Landauer & lege applicant taking TOEFL Dumais [14] Non-native speakers of 86.75% Rapp [30] English living in Australia Native speakers of English 97.75% Rapp [30] living in Australia
kowicz [12]), a given word is looked up in a large lexicon (or lexical database) of synonyms and it is determined whether there is a match between any of the retrieved synonyms and the four alternative words presented in the TOEFL question. If there is a match, the respective word is considered to be the solution to the question. In the other case, the procedure can be extended to indirect matches, e.g. involving synonyms of synonyms. This procedure works rather well if the lexicon has a good coverage of the respective vocabulary. In the literature typically WordNet [8] has been used, and performances of up to 78.75% on the TOEFL task have been reported [12]. On the other hand, both the TOEFL questions and the lexicons are handcrafted and therefore reect human intuitions. So it is not surprising that a high correspondence between these two closely related types of human intuitions can be observed. In our setting, as the purpose of our work is to generate a lexical database similar to WordNet, it would be contradictory to presuppose WordNet for the similarity computations. Therefore we concentrate here on the second method. This is a corpus-based machine learning approach which appears to be more interesting from a cognitive perspective as it potentially better captures the relevant aspects of human vocabulary acquisition. Table 2 (derived from [32] and the ACL Wiki) gives an overview on the current state of the art with regard to performance gures on the TOEFL synonym test. With 90.9% and 92.5% correct answers, the best performances were achieved by Pantel & Lin [23] and Rapp [32]. This is why here we will concentrate on combining these two rather dierent approaches, with the rst being syntax-based and the second using singular value decomposition for dimensionality reduction of the semantic space. The intention is to introduce some amount of syntax to the second approach by operating it on a part-of-speech-tagged rather than a raw text corpus. This should not only lead to better results, but is also necessary to obtain WordNet like entries which distinguish between parts of speech. The third approach is hybrid ([33] [13] [18] [12] [44]) and is basically a fall-back strategy for the rst approach: That is, by default the lexicon-based approach is used as its results tend to be more reliable. However, if the relevant words cannot be found in the lexicon, then it is of course better to use a corpus-based approach rather than to guess randomly. With a performance of up to 97.5% on the TOEFL synonym test [44], the results of the hybrid approach are the best. However, it is nevertheless inappropriate for our research because, like the lexicon-based approach, it also presupposes readily available lexical knowledge. Although the scores from the 80 item TOEFL synonym test, which has been the standard so far, give some idea concerning the overall performance of an algorithm, it can be argued that this test set is rather small and therefore prone to statistical variation. Also, this test was not designed to measure the strengths and weaknesses of various algorithms concerning particular properties of the input words, e.g. their frequency, saliency, part of speech, or ambiguity. We will therefore base our future evaluation on a much larger data set, namely the 200,000 sense specic human similarity judgments that were collected in the Prince-
Table 2: Comparison of corpus-based approaches Characterization of algorithm Score Ref. Latent semantic analysis 64.38% [14] Raw co-occurrences and city-block 69.00% [28] Dependency space 73.00% [22] Pointwise mutual information (MI) 73.75% [41] PairClass 76.25% [43] Pointwise mutual information 81.25% [39] Context window overlapping 82.55% [36] Positive pointwise MI with cosine 85.00% [5] Generalized latent semantic analysis 86.25% [19] Similarities between parsed relations 90.90% [30][23] Modied latent semantic analysis 92.50% [32]
ton Evocation project. Such a large scale data set will allow a much more detailed analysis of the behavior of the algorithms, and we would like to see this as the future gold standard for such comparisons.
Table 3: Term/context matrix for c1 c2 c3 c4 arm beach coconut nger hand shoulder tree
errors and to close gaps in the data. Although SVD is computationally demanding, previous experience shows that it is feasible to deal with matrices of several hundred thousand dimensions [32]. In summary, we will compare two fundamental types of algorithms for word sense induction, one being based on global and the other on local clustering of words. If the results are similar, we will give preference to the global clustering as it matches better the WordNet approach. On the other hand, the local clustering makes it easier to provide contexts for each sense, which will be used as replacements for the WordNet glosses. Empirical verication of these issues may give us important arguments to question some underlying principles of WordNet.
method outlined in [31] which looks at local rather than global co-occurrence vectors. As can be seen from human performance, in almost all cases the local context of an ambiguous word is sucient to disambiguate its sense. This means that if we consider words within their local context they are hardly ever ambiguous. The basic idea is now that we do not cluster the global cooccurrence vectors of the words (based on an entire corpus) but local ones which are derived from the various contexts of a single word. That is, the computations are based on the concordance of a word. Also, we do not consider a term/term but a term/context matrix. This means that for each word to be analyzed we get an entire matrix. Let us illustrate this using the ambiguous word palm which can refer to a tree or to a part of the hand. If we assume that our corpus contains six occurrences of palm, i.e. that there are six local contexts, then we can derive six local co-occurrence vectors for palm. Considering only strong associations to palm, these vectors could, for example, look as shown in Table 3. The dots in the matrix indicate whether the respective word occurs in a particular context or not. We use binary vectors since we assume short contexts where words usually occur only once. The matrix reveals that the contexts c1, c3, and c6 seem to relate to the hand sense of palm, whereas the contexts c2, c4, and c5 relate to its tree sense. These intuitions can be resembled by using a method for computing vector similarities such as the cosine coecient. If we then apply an appropriate clustering algorithm to the context vectors, we should obtain the two expected clusters. Each of the two clusters corresponds to one of the senses of palm, and the words closest to the geometric centers of the clusters should be good descriptors of each sense. However, as matrices of the above type can be extremely sparse, clustering is a dicult task, and common algorithms often produce sub-optimal results. Fortunately, the sparsity problem can be minimized by reducing the dimensionality of the matrix. An appropriate algebraic method which has the capability to reduce the dimensionality of a rectangular or square matrix in an optimal way is singular value decomposition. As shown by Sch utze [38] by reducing the dimensionality a generalization eect can be achieved which often yields improved results. The approach that we suggest here involves reducing the number of columns (contexts) and then applying a clustering algorithm to the row vectors (words) of the resulting matrix. This should work well as it is one of the strengths of SVD to reduce the eects of sampling
words sense). Words represent mixes of senses and looking at these mixes leads to blurred results. To avoid this, we must rst perform a word sense disambiguation on the entire corpus, and then apply the procedure for relation detection. From the previous step (of word sense induction) we already have the possible word senses readily available, so that using available software (including some of our own) it is relatively straightforward to conduct a word sense disambiguation. It can be expected that in the disambiguated corpus the relations between word senses are more salient than they would be for words. Nevertheless it will be necessary to optimize the algorithm in a procedure of stepwise renement by comparing its results to a representative subset of the relations found in WordNet.
3.
RESULTS
The focus of this paper is to give an overview on the AutoWordNet project. As the project is ongoing (and also due to space constraints) detailed results concerning the various aspects of the project cannot be presented here, but will be published separately. However, in order to give the reader an idea of the outcome, let us summarize here the results concerning one of the fundamental aspects of the project, namely the computation of thesauri of related words. Although this work has been completed for several languages, as more information can be found in [32], we will conne our description here to the English version. For the other languages, the procedure is essentially the same. As our underlying textual basis we used the British National Corpus (BNC). While being considerably smaller than more recent corpora (e.g. the WaCky or the LDC Gigaword corpora, which were used for some of the other languages), our experience is that it leads to somewhat better results for this task as it is well balanced, whereas the other corpora have a stronger tendency to produce idiosyncrasies. In a pre-processing step, we lemmatized this corpus and removed the function words (for details concerning this step see [32]. Based on a window size of 2 words, we then computed a co-occurrence matrix comprising all of the approximately 375,000 lemmas occurring in the BNC. The raw co-occurrence counts were converted to association strengths using the entropy-based association measure as described in [32]. Inspired by Latent Semantic Analysis [14], in a further step we applied a Singular Value Decomposition to the association matrix, thereby reducing the dimensionality of the semantic space to 300 dimensions. This dimensionality reduction has a generalization and smoothing eect which could be shown to improve the results of the subsequent similarity computations [30]. Given the resulting dimensionality reduced matrix, word similarities were computed by comparing word association vectors using the standard cosine similarity measure. This led to results like the ones shown in Table 4 (the lists are ranked according to decreasing cosine values). For a quantitative evaluation we used the system for solving the TOEFL synonym test (see section 2.1) and compared the results to the correct answers as provided by the Educational Testing Service. Remember that in this test the subjects had to choose the word most similar to a given stimulus word from a list of four alternatives. In the simulation,
Table 4: Sample lists of related words as computed greatly (0.52), immensely (0.51), tremendously enorm- (0.48), considerably (0.48), substantially (0.44), ously vastly (0.38), hugely (0.38), dramatically (0.35), materially (0.34), appreciably (0.33) Shortcomings (0.43), defect (0.42), deciencies aw (0.41), weakness (0.41), fault (0.36), drawback (0.36), anomaly (0.34), inconsistency (0.34), discrepancy (0.33), fallacy (0.31) question (0.51), matter (0.47), debate (0.38), issue concern (0.38), problem (0.37), topic (0.34), consideration (0.31), raise (0.30), dilemma (0.29), discussion (0.28) building (0.55), construct (0.48), erect (0.39), build design (0.37), create (0.37), develop (0.36), construction (0.34), rebuild (0.34), exist (0.29), brick (0.27) disparity (0.44), anomaly (0.43), inconsistency discre- (0.43), inaccuracy (0.40), dierence (0.36), pancy shortcomings (0.35), variance (0.34), imbalance (0.34), aw (0.33), variation (0.33) primarily (0.50), largely (0.49), purely (0.48), essenbasically (0.48), mainly (0.46), mostly (0.39), tially fundamentally (0.39), principally (0.39), solely (0.36), entirely (0.35)
we assumed that the system made the right decision if the correct answer was ranked best among the four alternatives. This was the case for 74 of the 80 test items which gives us an accuracy of 92.5%. In comparison, the performance of human subjects had been 97.75% for native speakers and 86.75% for highly procient non-native speakers (see Table 1). This means that our programs performance is in between these two levels with about equal margins towards both sides. An interesting observation is that in Table 4 most words listed are of the same part of speech as the stimulus word. This is insofar surprising as the simulation system never obtained any information concerning part of speech, but in the process of computing term relatedness implicitly determines it. This observation is consistent with other work (e.g. [14]). As mentioned above, the method has also been applied to other languages, namely French, German, Spanish and Russian [32]. Apart from corpus pre-processing (e.g. segmentation and lemmatization) the algorithm had remained unchanged, but nevertheless delivered similarly good results. As an outcome, large thesauri of related words (analogous to the samples shown in Table 1) each comprising in the order of 50,000 entries are available for these languages.
section, human intuitions have been successfully replicated via an automatic system building on previous studies such as [14], [23], and [32]. In word sense induction, current methods can make rough sense distinctions, but are far from reaching the sophistication of human judgements. Here our current work focuses on comparing methods based on local versus global cooccurrence vectors, as well as local versus global clustering. There are deep theoretical questions behind these choices which also correlate with some design principles of WordNet. We intend to compare three existing systems which can be seen as prototypical for dierent choices, namely the ones described by Pantel & Lin [23], Rapp [31], and Bordag [1]. By providing empirical evidence this should enable us to at least partially answer these questions. By combining the best choices we hope to be able to come up with an improved algorithm. Concerning the identication of conceptual relations holding between words, the eld is still at an early stage and it is unclear whether the aim of automatically replicating WordNets relations through unsupervised learning from raw text is realistic. However, attempting to do so is certainly of interest. On one hand, it is still rather unclear what the empirical basis for these relations is, and how they can be extracted from a corpus. On the other hand, WordNet provides such relations and can therefore be used as a gold standard for the iterative renement of an algorithm. As a possible outcome, it may well turn out that the empirical support for WordNets conceptual relations is not equally strong for all types. This would raise the question whether the choices underlying WordNet were sound, and what the most salient alternative relations would be. Also, there may be interesting ndings within each category, as most categories are only applicable to certain subsets of words (e.g. holonymy cannot easily be applied to abstract terms). Although the envisaged advances concerning the three steps are of a more evolutionary nature, the sum of it is supposed to lead to a time saving and largely language independent algorithm for the automatic extraction of a WordNet-like resource from a large text corpus. The work is also of interest from a cognitive perspective, as WordNet is a collection of dierent types of human intuitions, namely intuitions on word similarity, on word senses, and on word relations. The question is whether all of these intuitions nd their counterpart in corpus evidence. Should this be the case, this would support the view that human language acquisition can be explained by unsupervised learning (i.e. low level statistical mechanisms) on the basis of perceived spoken and/or written language. If not, other sources of information available for language learning would have to be identied, which may e.g. include knowledge derived from visual perception, world knowledge as postulated in Articial Intelligence, or some inherited high level mechanisms such as Pinkers language instinct or Chomskys language acquisition device. Although the suggested methodology is unlikely to completely replace current manual techniques of compiling lexical databases in the near future, it should at least be useful
to eciently aggregate relevant information for subsequent human inspection, thereby making the manual work more ecient. This is of particular importance as the suggested methods should in principle be applicable to all languages so that the potential savings multiply. Another aspect is that automatic methods will in principle allow generating WordNets for particular genres, domains or dialects by simply running the algorithm on a large text corpus of the respective type. This is an aspect that would not be easy to obtain manually, as human intuitions tend to be based on the sum of lifetime experience, so that it is dicult to concentrate on specic aspects. Let us conclude by citing from Piasecki et al. [27]: A language without a wordnet is at a severe disadvantage. ... Language technology is a signature area of ... the Internet, ... including increasingly clever search engines and more and more adequate machine translation. A wordnet a rich repository of knowledge about words is a key element of ... language processing.
5. REFERENCES
[1] S. Bordag. Word sense induction: triplet-based clustering and automatic evaluation. Proc. of EACL 2006. [2] P. Buitelaar, P. Cimiano (eds.). Ontology Learning and Population: Bridging the Gap between Text and Knowledge Selected Contributions to Ontology Learning and Population from Text. IOS Press 2008. [3] P. Buitelaar, P. Cimiano, P. Haase, M. Sintek. Towards linguistically grounded ontologies. Proceedings of the 6th ESCW, Heraklion, Greece, 111125, 2009. [4] P. Buitelaar, T. Declerck, A. Frank, S. Racioppa, M. Kiesel, M. Sintek, R. Engel, M. Romanelli, D. Sonntag, B. Loos, V. Micelli, R. Porzel, P. Cimiano. LingInfo: Design and applications of a model for the integration of linguistic information in ontologies. Proc. of the OntoLex Workshop, Genoa, Italy, 2834, 2006 [5] J.A. Bullinaria, J.P. Levy. Extracting semantic representations from word co-occurrence statistics: A computational study. Behavior Research Methods, 39, 510526, 2007. [6] S.A. Carabello, E. Charniak. Determining the specicity of nouns from text. Proc. of EMNLP-VLC, 6370, 1999. [7] P. Cimiano, P. Haase, M. Herold, M. Mantel, P. Buitelaar. LexOnto. A model for ontology lexicons for ontology based NLP. Proceedings of the OntoLex07 Workshop at ISWC07, South Corea, 2007. [8] C. Fellbaum (ed.). WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press, 1998. [9] G. Grefenstette. Explorations in Automatic Thesaurus Discovery. Dordrecht: Kluwer, 1994. [10] Z.S. Harris. Distributional structure. Word, 10(23), 146162, 1954. [11] G. Hirst, D. St-Onge. Lexical chains as representation of context for the detection and correction of malapropisms. In: C. Fellbaum (ed.): WordNet: An Electronic Lexical Database, Cambridge: MIT Press, 305332, 1998. [12] M. Jarmasz, S. Szpakowicz. Rogets thesaurus and
semantic similarity. Proc. of RANLP, Borovets, Bulgaria, September, 212219, 2003. [13] J.J. Jiang, D.W. Conrath. Semantic similarity based on corpus statistics and lexical taxonomy. Proceedings of the International Conference on Research in Computational Linguistics, Taiwan, 1997. [14] T.K. Landauer, S.T. Dumais. A solution to Platos problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2), 211240, 1997. [15] T.K. Landauer, D.S. McNamara, S. Dennis, W. Kintsch (eds.). Handbook of Latent Semantic Analysis. Lawrence Erlbaum, 2007. [16] C. Leacock, M. Chodorow. Combining local context and WordNet similarity for word sense identication. In: C. Fellbaum (ed.). WordNet: An Electronic Lexical Database. Cambridge: MIT Press, 265283, 1998. [17] D. Lin. Automatic retrieval and clustering of similar words. Proc. of COLING-ACL, Montreal, Vol. 2, 768773, 1998. [18] D. Lin. An information-theoretic denition of similarity. Proc. of the 15th International Conference on Machine Learning (ICML-98), Madison, WI, 296304, 1998. [19] I. Matveeva, G. Levow, A. Farahat, C. Royer. Generalized latent semantic analysis for term representation. Proc. of RANLP, Borovets, Bulgaria, 2005. [20] E. Montiel-Ponsoda, W. Peters, G. Auguado de Cea, M. Espinoza, A. G omez P erez, M. Sini. Multilingual and Localization Support for Ontologies. Technical report, D2.4.2 Neon Project Deliverable, 2008. [21] D.B. Neill. Fully Automatic Word Sense Induction by Semantic Clustering. Cambridge University, Masters Thesis, M.Phil. in Computer Speech, 2002. [22] S. Pado, M. Lapata. Dependency-based construction of semantic space models. Computational Linguistics, 33(2), 161199, 2007. [23] P. Pantel; D. Lin. Discovering word senses from text. Proc. of ACM SIGKDD, Edmonton, 613619, 2002. [24] P. Pantel, M. Pennacchiotti. Automatically harvesting and ontologizing semantic relations. In: P. Buitelaar, P. Cimiano (eds.) Ontology Learning and Population: Bridging the Gap between Text and Knowledge Selected Contributions to Ontology Learning and Population from Text, IOS Press, 2008. [25] M.T. Pazienza, A. Stellato. Exploiting Linguistic Resources for building linguistically motivated ontologies in the Semantic Web. Proc. of the 2nd OntoLex Workshop, 2006. [26] Pennacchiotti, M.; Pantel, P.. A bootstrapping algorithm for automatically harvesting semantic relations. Proceedings of Inference in Computational Semantics (ICoS), Boxton, England, 87-96, 2006. [27] M. Piasecki, S. Szpakowicz, B. Broda. A WordNet from the Ground Up. Ocyna Wydawnicza Politechniki Wroclawskiej, 2009. [28] R. Rapp. The computation of word associations: comparing syntagmatic and paradigmatic approaches. Proc. of the 19th COLING, Taipei, ROC, Vol. 2, 821827, 2003.
[29] R. Rapp. Word sense discovery based on sense descriptor dissimilarity. Proceedings of the Ninth MT Summit, 315322, 2003. [30] R. Rapp A freely available automatically generated thesaurus of related words. Proceedings of the 4th LREC, Lisbon, Vol. II, 395398, 2004. [31] R. Rapp. A practical solution to the problem of automatic word sense induction. Proc. of the 42nd Meeting of the ACL, Comp. Vol., 195198, 2004. [32] R. Rapp. The automatic generation of thesauri of related words for English, French, German, and Russian. International Journal of Speech Technology, 11 (3), 147156, 2009. [33] P. Resnik. Using information content to evaluate semantic similarity. Proc. of the 14th International Joint Conference on Articial Intelligence (IJCAI), Montreal, 448453, 1995. [34] J. Rosenzweig, R. Mihalcea, A. Csomai. WordNet bibliography. Web page: a bibliography referring to research involving the WordNet lexical database. URL http://lit.csci.unt.edu/wordnet/, 2007. [35] G. Ruge. Experiments on linguistically based term associations. Information Processing and Management, 28(3), 317332, 1992. [36] M. Ruiz-Casado, E. Alfonseca, P. Castells. Using context-window overlapping in synonym discovery and ontology extension. Proc. of RANLP, Borovets, Bulgaria, 2005. [37] M. Sahlgren. Vector-based semantic analysis: representing word meanings based on random labels. In: A. Lenci, S. Montemagni, V. Pirrelli (eds.): Proceedings of the ESSLLI Workshop on the Acquisition and Representation of Word Meaning, Helsinki, 2001. [38] H. Sch utze. Ambiguity Resolution in Language Learning: Computational and Cognitive Models. Stanford: CSLI Publications, 1997. [39] E. Terra, C.L.A. Clarke. Frequency estimates for statistical word similarity measures. Proceedings of HLT/NAACL, Edmonton, Alberta, 244251, 2003. [40] S. Thoongsup, K. Robkop, C. Mokarat, T. Sinthurahat, T. Charoenporn, V. Sornlertlamvanich, H. Isahara. Thai WordNet construction Proc. of the 7th Workshop on Asian Language Resources at ACL-IJCNLP, Suntec, Singapore, 139144, 2009. [41] P.D. Turney. Mining the Web for synonyms. PMI-IR versus LSA on TOEFL. Proc. of the Twelfth European Conference on Machine Learning, Freiburg, Germany, 491502, 2001. [42] P.D. Turney. Similarity of Semantic Relations Computational Linguistics, 32(3), 379416, 2006. [43] P.D. Turney. A uniform approach to analogies, synonyms, antonyms, and associations. Proceedings of the 22nd Coling, Manchester, UK, 905912, 2008. [44] P.D. Turney, M.L. Littman, J. Bigham, V. Shnayder. Combining independent modules to solve multiple-choice synonym and analogy problems. Proc. of RANLP, Borovets, Bulgaria, pp. 482489, 2003. [45] P.D. Turney, P. Pantel. From frequency to meaning: vector space models of semantics. Journal of Articial Intelligence Research, Volume 37, 141188, 2010.