Using Wordnet For Text Categorization

16 The International Arab Journal of Information Technology, Vol. 5, No.
1, January 2008
Using WordNet for Text Categorization

Zakaria Elberrichi1, Abdelattif Rahmoun2, and Mohamed Amine Bentaalah1
1
EEDIS Laboratory, Department of Computer Science, University Djilali Liabès, Algeria
2
King Faisal University, Saudi Arabia
Abstract: This paper explores a method that use WordNet concept to categorize text documents. The bag of words
representation used for text representation is unsatisfactory as it ignores possible relations between terms. The proposed
method extracts generic concepts from WordNet for all the terms in the text then combines them with the terms in different
ways to form a new representative vector. The effects of this method are examined in several experiments using the
multivariate chi-square to reduce the dimensionality, the cosine distance and two benchmark corpus the reuters-21578
newswire articles and the 20 newsgroups data for evaluation. The proposed method is especially effective in raising the
macro-averaged F1 value, which increased to 0.714 for the Reuters from 0.649 and to 0.719 for the 20 newsgroups from
0.667.
Keywords: 20Newsgroups, ontology, reuters-21578, text categorization, wordNet, and cosine distance.
Received April 5, 2006; Accepted August 1, 2006
1. Introduction The remainder of this paper is structured as follows.

Section 2 presents a brief presentation of WordNet.
Text Categorization (TC) is the classification of The architecture of our approach is provided in section
documents with respect to a set of one or more pre- 3 with its different stages. Testing and performance
existing categories [14]. TC is a hard and very useful analysis compared to the Bag-Of-Word representation
operation frequently applied to assign subject is provided in section 4. Section 5 cites some related
categories to documents, to route and filter texts, or as works. The conclusion and future work are provided in
a part of natural language processing systems. section 6.
During the last decades, a large number of methods
proposed for text categorization were typically based
on the classical Bag-of-Words model where each term 2. WordNet
or term stem is an independent feature. The There exist many difficulties to surmount to create an
disadvantages of this classical representation are: effective texts categorization system: the speed of the
• The ignorance of any relation between words, thus indexing and research, the index size, the robustness,
learning algorithms are restricted to detect patterns the reliability, the effectiveness,…etc. But the
in the used terminology only, while conceptual principal difficulties encountered in the field are those
patterns remain ignored. posed by the natural languages themselves. This is
• The big dimensionality of the representation space. why many experiments using linguistic resources and
In this article, we propose a new method for text treatments were realized and presented in the literature.
categorization, which is based on: The use of knowledge and advanced linguistic
treatments in the field does not achieve the unanimity
• The use of the WordNet ontology to capture the in the community. Indeed, many experiments seem to
relations between the words. show that sometimes the results obtained instead of
• The use of the multivariate χ2 method to reduce the improving do degrade. This was not the case of our
dimensionality and create the categories profiles. approach, where we used two of the semantic relations
of WordNet: the synonymy and the hyponymy.
The originality of this approach lies in merging terms
WordNet is a thesaurus for the English language
with their associated concepts extracted from the used
based on psycholinguistics studies and developed at
ontology to form a hybrid model for text
the University of Princeton [11]. It was conceived as a
representation.
data-processing resource which covers lexico-semantic
In order to show the positive contribution of this
categories called synsets. The synsets are sets of
approach, we have performed a series of experiments
synonyms which gather lexical items having similar
on the Reuters-21578 and 20Newsgoups test
significances, for example the words “a board” and
collections. WordNet’s large coverage and frequent
“a plank” grouped in the synset {board, plank}. But
utilization has led us to use it for our experiments.
“a board” can also indicate a group of people (e.g., a
Using WordNet for Text Categorization 17
board of directors) and to disambiguate these • See also: relation between concepts having a certain
homonymic significances “a board” will also belong affinity (cold /frozen).
to the synset {board, committee}. The definition of the • Similar to: certain adjectival concepts which
synsets varies from the very specific one to the very meaning is close are gathered. A synset is then
general. The most specific synsets gather a restricted designated as being central to the regrouping. The
number of lexical significances whereas the most relation 'Similar to' binds a peripheral synset with
general synsets cover a very broad number of the central synset (moist /wet).
significances. • Derived from: indicate a morphological derivation
The organization of WordNet through lexical between the target concept (adjective) and the
significances instead of using lexemes makes it concept origin (coldly /cold).
different from the traditional dictionaries and thesaurus
[11]. The other difference which has WordNet 2.1. Synonymy in WordNet
compared to the traditional dictionaries is the
separation of the data into four data bases associated A synonym is a word which we can substitute to
with the categories of verbs, nouns, adjectives and another without important change of meaning. Cruse
adverbs. This choice of organization is justified by [2] distinguishes three types of synonymy:
psycholinguistics research on the association of words • Absolute synonymes.
to the syntactic categories by humans. Each database is • Cognitive synonymes.
differently organized than the others. The names are
• Plesionymes.
organized in hierarchy, the verbs by relations, the
adjectives and the adverbs by N-dimension According to the definition of Cruse [3] of the
hyperspaces [11]. cognitive synonyms, X and Y are cognitive synonyms
The following list enumerates the semantic relations if they have the same syntactic function and that all
available in WordNet. These relations relate to grammatical declaratory sentences containing X have
concepts, but the examples which we give are based on the same conditions of truth as another identical
words. sentence where X is replaced by Y.
• Synonymy: relation binding two equivalent or close Example: Convey /automobile
concepts (frail /fragile). It is a symmetrical relation.
• Antonymy: relation binding two opposite concepts The relation of synonymy is at the base of the
(small /large). This relation is symmetrical. structure of WordNet. The lexemes are gathered in sets
• Hyperonymy: relation binding a concept-1 to a more of synonyms ("synsets"). There are thus in a synset all
general concept-2 (tulip /flower). the terms used to indicate the concept.
• Hyponymy: relation binding a concept-1 to a more The definition of synonymy used in WordNet [11] is
specific concept-2. It is the reciprocal of as follows: "Two expressions are synonymous in a
hyperonymy. This relation may be useful in linguistic context C if the substitution of for the other
information retrieval. Indeed, if all the texts treating out of C does not modify the value of truth of the
of vehicles are sought, it can be interesting to find sentence in which substitution is made".
those which speak about cars or motor bikes. Example of synset: [Person, individual, someone,
• Meronymy: relation binding a concept-1 to a somebody, mortal, human, drunk person].
concept-2 which is one of its parts (flower/petal),
one of its members (forest /tree) or a substance 2.2. Hyponyms /Hyperonyms in Word Net
made of (pane/glass). X is a hyponym of Y (and Y is a hyperonym of X) if:
• Metonymy: relation binding a concept-1 to a
concept-2 of which it is one of the parts. It is the • F(X) is the minimal indefinite expression
opposite of the meronymy relation. compatible with sentence A is F(X) and
• Implication: relation binding a concept-1 to a • A is F(X) implies A is F(Y).
concept-2 which results from it (to walk /take a In other words, the hyponymy is the relation between a
step). narrower term and a generic term expressed by the
• Causality: relation binding a concept-1 to its expression "is-a".
purpose (to kill /to die).
Example:
• Value: relation binding a concept-1 (adjective) which
It is a dog → It is an animal [2].
is a possible state for a concept-2 (poor /financial
A dog is a hyponym of animal and animal is a
condition).
hyperonym of dog.
• Has the value: relation binding a concept-1 to its
possible values (adjectives) (size /large). It is the
opposite of relation value.
18 The International Arab Journal of Information Technology, Vol. 5, No. 1, January 2008
The document to be
classified Generation of
Categories the bag of words
Train corpus ( pre-classified

documents)
The document profile
Generation of the bag of Mapping terms in concepts:

words
• The choice of the mapping strategy.
• The choice of the disambiguation
strategy
• Extraction of hypernyms.
WordNet
Calculate cosine distances
between profiles
The Chi-square reduction
.....
The categories profiles
Learning Phase Classification Phase
Figure1. The suggested approach.
In WordNet, the hyponymy is a lexical relation The second stage relates to the classification phase.
between meanings of words and more precisely It consists on:
between synsets (Synonym Sets). This relation is
• Weighting the features in the categories profiles.
defined by: X is a hyponym of Y if “X is a kind of Y”
• Calculating the distance between the categories
is true. It is a transitive and asymmetrical relation,
profiles and the profile of the document to be
which generates a downward hierarchy of heritage for
classified.
the organization of the nouns and the verbs. The
hyponymy is represented in WordNet by the symbol
'@', which is interpreted by "is-a" or "is a kind of". 3.1. The Learning Phase
The first issue that needs to be addressed in text
Example: categorization is how to represent texts so as to
It is a tree → It is a plant. facilitate machine manipulation but also to retain as
much information as needed. The commonly used text
3. WordNet-Based Texts Categorization representation is the Bag-Of-Words, which simply uses
a set of words and the number of occurrences of the
The approach suggested is composed of two stages, as
words to represent documents and categories [12].
indicated in Figure 1. The first stage relates to the
Many efforts have been made to improve this simple
learning phase. It consists of:
and limited text representation. For example, [6] uses
• Generating a new text representation based on phrases or word sequences to replace single words. In
merging terms with their associated concept. our approach, we use a method that merges terms with
• Selecting the characteristic features for creating the their associated concepts to represent texts. To
categories profiles. generate a text representation using this method, four
steps are required:
• Mapping terms into concepts and choosing a
merging strategy.
• Applying a strategy for word senses thus contain only the terms, which do not appear in
disambiguation. WordNet.
• Applying a strategy for considering hypernyms.
• Applying a strategy for features selection. C. Concept Vector Only
This strategy differs from the second strategy by the
3.1.1. Mapping Terms into Concepts fact that it excludes all the terms from the new
representation including the terms, which do not
The process of mapping terms into concepts is r
illustrated with an example shown in Figure 2. For appear in WordNet; cd is used to represent the
simplicity, suppose there is a text consisting in only 10 category.
words: government (2), politics (1), economy (1),
natural philosophy (2), life science (1), math (1), 3.1.2. Strategies for Disambiguation
political economy (1), and science (1), where the The assignment of terms to concepts is ambiguous.
number indicated is the number of occurrences. Therefore, one word may have several meanings and
Key Words Concept: physics (2) thus one word may be mapped into several concepts.
In this case, we need to determine which meaning is
government (2) Concept: government (3)
being used, which is the problem of sense
politics (1)
disambiguation [8]. Since a sophisticated solution for
economy (1) Concept: economics (2)
sense disambiguation is often impractical [1], we have
naturalphilosophy (2) considered the two simple disambiguation strategies
Concept: bioscience (1)
life science (1) used in [7].
math (1) Concept: mathematics (1)
political economy (1) A. All Concepts
science (1) Concept: science (1) This strategy considers all proposed concepts as the
most appropriate one for augmenting the text
Figure 2. Example of mapping terms into concepts. representation. This strategy is based on the
assumption that texts contain central themes that in our
The words are then mapped into their corresponding cases will be indicated by certain concepts having
concepts in the ontology. In the example, the two height weights. In this case, the concept frequencies
words government (2) and politics (1) are mapped in are calculated as follows:
the concept government and the term frequencies of cf ( d , c ) = tf {d ,{t ∈ T c ∈ ref c ( t )}} (1)
these two words are added to the concept frequency.
From this point, three strategies for adding or replacing B. First Concept
terms by concepts can be distinguished as proposed by This strategy considers only the most often used sense
[1]: of the word as the most appropriate concept. This
strategy is based on the assumption that the used
A. Add Concept
r ontology returns an ordered list of concepts in which
This strategy extends each term vector t d by new more common meanings are listed before less common
entries for WordNet concepts C appearing in the texts ones [10].
r
set. Thus, the vector t d will be replaced by the
r r cf ( d , c ) = tf {d ,{t ∈ T first ( ref c ( t )) = c }} (2)
concatenation of td and cd where
r
c d = ( cf ( d , c 1 ),......, cf ( d , c l )) . The concept vector 3.1.3. Adding Hypernyms
with l = C and cf ( d , c ) denotes the frequency that a
If concepts are used to represent texts, the relations
concept c ∈ C appears in a text d. between concepts play a key role in capturing the ideas
The terms, which appear in WordNet as a concept, in these texts. Recent research shows that simply
will be accounted for at least twice in the new changing the terms to concepts without considering the
r vector relations does not have a significant improvement and
representation; once in the old term vector t d and at
r some time even perform worse than terms [1]. For this
least once in the concept vector c d . purpose, we have considered the hypernym relation
B. Replace Terms by Concepts between concepts by adding to the concept frequency
This strategy is similar to the first strategy; the only of each concept in a text the frequencies that their
difference lies in the fact that it avoids the duplication hyponyms appears. Then the frequencies of the
of the terms in the new representation; i.e., the terms concept vector part are updated in the following way:
which appear in WordNet will be taken into account
only in the concept vector. The vector of the terms will cf ' (d , c ) = ∑ cf (d , b ) (3)
b∈ H ( c )
where H(c) gives for a given concept c its hyponyms.
3.1.4. Features Selection

Selection techniques for dimensionality reduction take
as input a set of features and output a subset of these
features, which are relevant for discriminating among
categories [3]. Controlling the dimensionality of the
vector space is essential for two reasons. The
complexity of many learning algorithms depends
crucially not only on the number of training examples Figure 3. Matrix of features frequencies in categories.
but also on the number of features. Thus, reducing the
number of index terms may be necessary to make these The principal characteristics of this method are:
algorithms tractable. Also, although more features can
be assumed to carry more information and should, • It is supervised because it is based on the
thus, lead to more accurate classifiers, a larger number information brought by the categories.
of features with possibly many of them being • It is a multivariate methode because it evaluates the
irrelevant may actually hinder a learning algorithm role of the feature with considering the other
constructing a classifier. features.
For our approach, a feature selection technique is • It considers interactions between features and
necessary in order to reduce the big dimensionality categories.
caused by considering concepts in the new text • In spite of its sophistication, it remains of linear
representation. For this purpose we used the Chi- complexity in terms number.
Square Statistic for feature selection.
( f jk − f j . f .k ) 2
The χ2 statistic measures the degree of association 2
C xjk = N × sign( f jk − f j . f .k ) (4)
between a term and the category. Its application is f j . f .k
based on the assumption that a term whose frequency N jk
strongly depends on the category in which it occurs where f jk = representing the relative
will be useful for discriminating among the categories. N
For the purpose of dimensionality reduction, terms frequencies of the occurrences.
with small χ2 values are discarded.
The χ2 multivariate, noted χ2 multvariate is a supervised
3.2. Classification Phase
method allowing the selection of terms by taking into The classification phase consists in generating a
account not only their frequencies in each category weighted vector for all categories, then using a
but also the interaction of the terms between them and similarity measure to find the closest category.
the interactions between the terms and the categories.
The principle consists in extracting K better features 3.2.1. Vector Generation
characterizing best the category compared to the
Given the features frequencies in all categories, the
others, this for each category.
task of the vector generation step is to create a
With this intention, the matrix (term-categories)
weighted vector d = (w (d , t 1 ),......, w (d , t m )) for
representing the total number of occurrences of the p
features in the m categories is calculated (see Figure any category d based on its feature frequency
3). The total sum of the occurrences is noted N. The vector d tf = (tf d (t1 ),......, tf d (t m )) , which commonly
values Njk represent the frequency of the feature X J results from the feature selection step. Each weight
in the category ek.. Then, the contributions of these w(d , t ) expresses the importance of feature t in
features in discriminating categories are calculated as category d with respect to its frequency in all training
indicated in Equation 4, then sorted by descending documents. The objective of using a feature weight
order for each category. The evaluation of the sign in rather than plain frequencies is to enhance
the Equation 4 makes it possible to determine the classification effectiveness.
direction of the contribution of the feature in In our experiments, we used the standard tfidf function,
discriminating the category. A positive value indicates defined as:
that it is the presence of the feature which contribute in
the discrimination while a negative value reveals that it  C 
tfidf (t k , c i ) = tf ( t k , c i ) × Log   (5)
is its absence which contribute in it.  df (t k ) 
where:
• tf (t k , ci ) denotes the number of times feature tk In the above formula, precision and recall are two
occurs in category ci. standard measures widely used in text categorization
literature to evaluate the algorithm’s effectiveness on a
• df (t k ) denotes the number of categories in which
given category where
feature tk occurs.
true positive (8)
• C denotes the number of categories. precision = × 100
(true positive ) + ( false positive )
true positive (9)
3.2.2. Distance Calculation recall = × 100
(true positive ) + ( false negative )
The similarity measure is used to determine the degree We also use the macroaveraged F1 to evaluate the
of resemblance between two vectors. To achieve overall performance of our approach on given datasets.
reasonable classification results, a similarity measure The macroaveraged F1 compute the F1 values for each
should generally respond with larger values to category and then takes the average over the per-
documents that belong to the same class and with category F1 scores. Given a training dataset with m
smaller values otherwise. categories, assuming the F1 value for the i-th category
The dominant similarity measure in information is F1(i), the macroaveraged F1 is defined as :
retrieval and text classification is the cosine similarity
∑ F1 (i )
m
between two vectors. Geometrically, the cosine macroavera ged F1 = i =1 (10)

similarity evaluates the cosine of the angle between m
two vectors d1 and d2 and is, thus, based on angular
distance. This allows us to abstract from varying 4.1. Datasets for Evaluation
vector length. The cosine similarity can be calculated 4.1.1. Reuters-21578
as the normalized product:
The Reuters dataset has been used in many text
∑ TFIDF
w∈i ∩ j
w, j × TFIDFw , j
(6) categorization experiments; the data was collected by
Si, j = the Carnegie group from the Reuters newswires in
∑ TFIDF
w∈i
2
w ,i × ∑ TFIDF
w∈ j
2
w, j
1987. There are now at least five versions of the
Reuters datasets widely used in TC community. We
where: choose the Modapte version of the Reuters-21578
w is a feature, I and J are the two vectors (profiles) to collection of new stories downloaded from
be compared. TFIDFw,i the weight of the term w in I http://www.daviddlewis.com/ressources/testcollections
and TFIDFw,j is the weight of the term w in J. This can /reuters21578. In our experiments, we used the ten
be translated in the following way: most frequent categories from this corpus as our
"More there are common features and more these dataset for training and testing as indicated in Table 1.
features have strong weightings, more the similarity
will be close to 1, and vice versa ". Table 1. Detailes of the reuters21-578 used categories.
In our approach, this similarity measure is used to Category # Training # Test Total
calculate the distance between the vector of the Earn 2877 1087 3864
document to be categorized and all categories vector. Acquisition 1650 719 2369
As a result, the document will be assigned to the
Money-fx 538 179 717
category whose vector is the closest with the document
Grain 433 149 582
vector.
Crude 389 189 578
Trade 369 118 487
4. Experiments and Evaluation
Interest 347 131 478
We have conducted our experiments on two commonly Wheat 212 71 283
used corpora in text categorization research: 20 Ship 197 89 286
Newsgroups, and ModApte version of the Reuters- Corn 182 56 238
21578 collection of the news stories. All documents
for training and testing involve a pre-processing step, 4.1.2. 20Newsgroups
which includes the task of stopwords removal.
Experimental results reported in this section are based The 20Newsgroups contains approximately 20,000
on the so-called "F1 measure", which is the harmonic newsgroups documents being partitioned (nearly)
mean of precision and recall. evenly across 20 different newsgroups, we used the
20newsgroups version downloaded from
2 × recall × precision (7)
F 1 (recall , precision )= http://www.ai.mit.edu/~jrennie/20Newsgroups. Table
recall + precision
2 specifies the 20Newsgroups categories and their
sizes.
Table 2. Detailes of 20Newsgroups categories. Macro-averaged values then reached 71.7%, thus
Category # Train # Test Total # yielding a relative improvement of 6.8% compared to
Docs Docs Docs the Bag-Of-Word representation.
alt.atheism 480 319 799
comp.graphics 584 389 973
The same remarks can be done on the
comp.os.ms-windows.misc 572 394 966
20Newsgroups categories (see Table 4). The best
comp.sys.ibm.pc.hardware 590 392 982 performance is obtained with the profile size k=500.
comp.sys.mac.hardware 578 385 963 The relative improvement is about 5.2% compared to
comp.windows.x 593 392 985 the Bag-Of-Word representation.
misc.forsale 585 390 975
rec.autos 594 395 989
rec.motorcycles 598 398 996 5. Related Work
rec.sport.baseball 597 397 994
The importance of WordNet as a source of conceptual
rec.sport.hockey 600 399 999
sci.crypt 595 396 991
information for all kinds of linguistic processing has
sci.electronics 591 393 984 been recognized with many different experiences and
sci.med 594 396 990 specialized workshops.
sci.space 593 394 987 There are a number of interesting uses of WordNet
soc.religion.christian 598 398 996 in information retrieval and supervised learning. Green
talk.politics.guns 545 364 909
[4, 5] uses WordNet to construct chains of related
talk.politics.mideast 564 376 940
synsets (that he calls ‘lexical chains’) from the
talk.politics.misc 465 310 775
talk.religion.misc 377 251 628
occurrence of terms in a document. It produces a
Total 11293 7528 18821 WordNet based document representation using a word
sense disambiguation strategy and term weighting.
Dave [13] has explored WordNet using synsets as
4.2. Results features for document representation and subsequent
Tables 3 and 4 summarize the results of our approach clustering.
compared with the Bag-Of-Word representation over He did not perform word sense disambiguation and
Reuters-21578 (10 largest categories) and only found that WordNet synsets decreased clustering
20Newsgroups categories. The results obtained in the performance in all his experiments. Voorhees [15] as
experiments suggest that the integration of conceptual well as Moldovan and Mihalcea have explored the
features improved text classification results. On the possibility to use WordNet for retrieving documents by
Reuters categories (see Table 3); the best overall value keyword search.
is achieved by the following combination of strategies: It has already become clear by their work that
"add concept" strategy using "First concept" strategy particular care must be taken in order to improve
for disambiguation with the profile size k=200. precision and recall.
Table 3.The comparison of performance (F1) on Reuters-21578.

Term/Concept Add Concept Replace Terms By Concept Vector Only Bag-Of-
Concepts Word
Disambiguation First All First All First All
K=100 0.703 0.671 0.682 0.658 0.618 0.580 0.643
Categories Profiles
K=200 0.709 0.682 0.688 0.670 0.625 0.610 0.659

The Size of
K=300 0.717 0.699 0.701 0.690 0.638 0.632 0.665

K=400 0.718 0.702 0.703 0.694 0.640 0.638 0.666
K=500 0.719 0.707 0.705 0.698 0.643 0.643 0.666
K=600 0.719 0.708 0.706 0.699 0.643 0.644 0.667
K=700 0.719 0.708 0.706 0.699 0.643 0.645 0.667
K=800 0.719 0.709 0.706 0.699 0.643 0.645 0.667
Table4. The comparison of performance (F1) on 20Newsgroups.

Term/Concept Add Concept Concept Vector Only Replace Bag-Of-
Terms Word
By
Concepts
Disambiguation First All First All First All
K=100 0.714 0.677 0.681 0.708 0.664 0.665 0.637
Categories Profiles
K=200 0.717 0.681 0.679 0.708 0.663 0.664 0.646

The Size of
K=300 0.716 0.683 0.683 0.710 0.663 0.669 0.649

K=400 0.715 0.685 0.686 0.711 0.666 0.669 0.646
K=500 0.714 0.684 0.688 0.710 0.667 0.669 0.643
K=600 0.714 0.686 0.691 0.711 0.668 0.675 0.643
K=700 0.714 0.686 0.692 0.711 0.667 0.675 0.646
K=800 0.714 0.686 0.692 0.711 0.667 0.675 0.646
6. Conclusion and Future Work [8] Ide N. and Véronis J., “Introduction to the
Special Issue on Word Sense Disambiguation:
In this paper, we have proposed a new approach for The State of the Art,” Computational Linguistics,
text categorization based on incorporating background vol. 24, no. 1, pp. 1-40, 1998.
knowledge (WordNet) into text representation with [9] Kehagias A., Petridis V., Kaburlasos V., and
using the χ2 multivariate, which consists on extracting Fragkou P., “A Comparison of Word and Sense-
the K better features characterizing best the category Based Text Categorization Using Several
compared to the others. The experimental results with Classification Algorithms”, Journal of Intelligent
both Reuters21578 and 20Newsgroups datasets show Information Systems, vol. 21, no. 3, pp. 227-247,
that incorporating background knowledge in order to 2001.
capture relationships between words is especially [10] McCarthy D., Koeling R., Weeds J., and Carroll
effective in raising the macro-averaged F1 value. J., “Finding Pre-Dominant Senses in Untagged
The main difficulty is that a word usually has Text”, in Proceedings of the 42nd Annual
multiple synonyms with somewhat different meanings Meeting of the Association for Computational
and it is not easy to automatically find the correct Linguistics, pp. 280-287. Barcelona, Spain, 2004.
synonyms to use. Our word sense disambiguation [11] Miller G., “Nouns in WordNet: A Lexical
technique is not capable of determining the correct Inheritance System”, International Journal of
senses. Our future works include a better Lexicography, vol. 3, no. 4, 1990.
disambiguation strategy for a more precise [12] Peng X. and Choi B., “Document Classifications
identification of the proper synonym and hyponym Based on Word Semantic Hierarchies”, in
synsets. Proceedings of the International Conference on
Some work has been done on creating WordNets for Artificial Intelligence and Applications
specialized domains and integrating them into (IASTED), pp. 362-367, 2005.
MultiWordNet. We plan to make use of it to achieve [13] Pennock D., Dave K., and Lawrence S., “Mining
further improvement. the Peanut Gallery: Opinion Extraction and
Semantic Classification of Product Reviews”, in
References Proceedings of the Twelfth International World
[1] Bloehdorn S. and Hotho A., “Text Classification Wide Web Conference (WWW’2003), ACM,
by Boosting Weak Learners Based on Terms and 2003.
Concepts”, in Proceedings of the Fourth IEEE [14] Sebastiani F., “Machine Learning in Automated
International Conference on Data Mining, IEEE Text Categorization,” ACM Computing Surveys,
Computer Society Press, 2004. vol. 34, no. 1, pp. 1-47, 2002.
[2] Cruse D., Lexical Semantics, Cambridge, [15] Voorhees E. , “Query Expansion Using Lexical-
London, New York, Cambridge University Press, Semantic Relations”, in Proceedings of ACM-
1986. SIGIR, Dublin, Ireland, pp. 61–69,
[3] Dash M. and Liu H., “Feature Selection for ACM/Springer, 1994.
Classification”, Journal Intelligent Data
Analysis, Elsevier, vol. 1, no. 3, 1997.
[4] Green S., “Building Hypertext Links in
Newspaper Articles Using Semantic Similarity”, Zakaria Elberrichi is lecturer in
in Proceedings of Third Workshop on computer science and a researcher at
Applications of Natural Language to Information Evolutionary Engineering and
Systems (NLDB’97), pp. 178-190, 1997. Distributed Information Systems
[5] Green S., “Building Hypertext Links by Laboratory, EEDIS at the University
Computing Semantic Similarity”, IEEE Djillali Liabes, Sidi-belabbes,
Transactions on Knowledge and Data Algeria. He holds a master degree in
Engineering (TKDE), vol. 11, no. 5, pp. 713-730, computer science from the California State University
1999. in addition to PGCert in higher education. He has more
[6] Hofmann T., “Probmap: A Probabilistic than 17 years of experience in teaching both BSc and
Approach for Mapping Large Document MSc levels in computer science and planning and
Collections”, Journal for Intelligent Data leading data mining related projects. The last one
Analysis, vol. 4, pp. 149-164, 2000. called “New Methodologies for Knowledge
[7] Hotho A., Staab S., and Stumme G., “Ontologies Acquisition”. He supervises five master students in e-
Improve Text Document Clustering”, in larning, text mining, web services, and workflow.
Proceedings of the 2003 IEEE International
Conference on Data Mining (ICDM'03), pp. 541-
544, 2003.
Abdellatif Rahmoun received his

BSc degree in electrical
engineering, University of Science
and Engineering of Oran, Algeria,
his Master degree in electrical
engineering and computer science
from Oregon State University, USA,
and his PhD degree in computer engineering, Algeria.
Currently, he is a lecturer in Computer Science
Department, Faculty of Planning and Management,
King Faisal University, Kingdom of Saudi Arabia. His
areas of interest include fuzzy logic, genetic
algorithms and genetic programming, neural
networks and applications, designing ga-based
neuro fuzzy systems, decision support systems, AI
applications, e-learning, electronic commerce and
electronic business and fractal image compression
using genetic tools.

Using Wordnet For Text Categorization

Uploaded by

Copyright:

Available Formats

Using Wordnet For Text Categorization

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Using Wordnet For Text Categorization

Uploaded by

Copyright:

Available Formats

16 The International Arab Journal of Information Technology, Vol. 5, No.

Using WordNet for Text Categorization

Received April 5, 2006; Accepted August 1, 2006

1. Introduction The remainder of this paper is structured as follows.

Train corpus ( pre-classified

The document profile

Generation of the bag of Mapping terms in concepts:

The Chi-square reduction

The categories profiles

Learning Phase Classification Phase

Figure1. The suggested approach.

where H(c) gives for a given concept c its hyponyms.

3.1.4. Features Selection

between two vectors. Geometrically, the cosine macroavera ged F1 = i =1 (10)

Table 3.The comparison of performance (F1) on Reuters-21578.

K=200 0.709 0.682 0.688 0.670 0.625 0.610 0.659

K=300 0.717 0.699 0.701 0.690 0.638 0.632 0.665

Table4. The comparison of performance (F1) on 20Newsgroups.

K=200 0.717 0.681 0.679 0.708 0.663 0.664 0.646

K=300 0.716 0.683 0.683 0.710 0.663 0.669 0.649

Abdellatif Rahmoun received his

You might also like