Person Name Entity Recognition For Arabic
Person Name Entity Recognition For Arabic
Person Name Entity Recognition For Arabic
Khaled.shaalan@buid.ac.ae hafsa.raza@gmail.com
The extraction quality of the pipeline created for 7 Conclusion and Future Work
the person name extractor confirms to the initial The work done in this project is an attempt to
target set. The required degree of precision (80%) broaden the coverage for entity extraction by in-
and recall (70%), for the Person name extractor, corporating the Arabic language, thereby paving
has been achieved with the hurricane evaluation. the path towards enabling search solutions to the
Some of the entries within the gazetteers were ex- Arabian market.
tracted from the same corpus used also for creating Various data collection techniques were used for
the reference corpus for evaluation. However, the acquiring gazetteer name lists. The rule-based ap-
results achieved are accurate since they indicated proach employed with great linguistic expertise
recognition of person entities not included in the provided a successful implementation of the PERA
system. Rules are capable of recognizing inflected
forms by breaking them down into stems and af- Technical Report IR-278. Available at
fixes. A filtration mechanism is employed in the http://ciir.cs.umass.edu/pubfiles/ir-278.pdf
form of a rejecter within the grammar configura- John Maloney and Michael Niv. 1998. TAGARAB: A
tion that helps in deciding where a name ends and Fast, Accurate Arabic Name Recogniser Using High
Precision Morphological Analysis. In Proceedings of
the non-name context begins. We have evaluated
the Workshop on Computational Approaches to Se-
our system performance using a reference corpus mitic Languages. Montreal, Canada. August, pp. 8-
that is tagged in a semi-automated way. The aver- 15.
age Precision and Recall achieved for recognizing
person names was 85.5% and 89%, respectively. Bruno Pouliquen, Ralf Steinberger, Camelia Ignat, Irina
Temnikova, Anna Widiger, Wajdi Zaghouani, and
Suggestions for improving the system performance
Jan Zizka. 2005. Multilingual person name recogni-
were provided. tion and transliteration. Journal CORELA-Cognition,
This work is part of a new system for Arabic Représentation, Langage, Vol. 2, ISSN 1638-5748.
NER. It has several ongoing activities, all con- Available at http://edel.univ-poitiers.fr/corela/
cerned with extending our research to recognize
Doaa Samy, Antonio Moreno and Jose M. Guirao. 2005.
and categorize other entity Arabic named entities
A Proposal for an Arabic Named Entity Tagger Lev-
such as locations, organization. eraging a Parallel Corpus, International Conference
RANLP, Borovets, Bulgaria, pp. 459-465.
Acknowledgement
An De Sitter, Toon Calders, and Walter Daelemans.
This work is funded by the "Named Entity Rec- 2004. A Formal Framework for Evaluation of Infor-
ognition for Arabic" joint project between The mation Extraction, University of Antwerp, Dept. of
British Univ. in Duabi, Dubai, UAE and FAST Mathematics and Computer Science, Technical Re-
search & Transfer Inc., Oslo, Norway. We thank port, TR 2004-0. Available at
the FAST team. In particular, we would like to http://www.cnts.ua.ac.be/Publications/2004/DCD04
thank Dr. Petra Maier and Dr. Jürgen Oesterle for Antonio Toral. 2005. DRAMNERI: a free knowledge
their technical support. based tool to Named Entity Recognition. In Proceed-
Any opinions, findings and conclusions or rec- ings of the 1st Free Software Technologies Confer-
ommendations expressed in this material are the ence. A Coruña, Spain. pp. 27-32.
authors, and do not necessarily reflect those of the Imed Zitouni, Jeffrey Sorensen, Xiaoqiang Luo and
sponsor. Radu Florian, 2005 The Impact of Morphological
Stemming on Arabic Mention Detection and
References Coreference Resolution, In the Proceedings of the
ACL workshop on Computational Approaches to Se-
Saleem Abuleil 2004. Extracting Names from Arabic mitic Languages, 43rd Annual Meeting of the Asso-
Text for Question-Answering Systems, In Proceed- ciation of Computational Linguistics (ACL05). June,
ings of Coupling approaches, coupling media and Ann Arbor, Michigan, USA, pp. 63-70.
coupling languages for information retrieval (RIAO
2004), Avignon, France. pp. 638- 647.
Da'ud Ibn Auda. 2003. Period Arabic Names and Nam-
ing Practices, In Proceedings of the Known World
Heraldic Symposium (SCA: KWHS Proceedings,
2003), pp. 42-56, St. Louis, USA.
FAST ESP
http://www.fastsearch.com/thesolution.aspx?m=376
Nancy Chinchor 1998. Overview of MUC-7. In Pro-
ceedings of the Seventh Message Understanding
Conference (MUC-7). Available at:
http://www.itl.nist.gov/iaui/894.02/related_projects/
muc/
Leah S. Larkey, Nasreen Abdul Jaleel, Margaret Con-
nell. 2003. What's in a Name?: Proper Names in
Arabic Cross Language Information Retrieval CIIR