2018-12-03: Using Wikipedia to build a corpus, classify text, and more

Wikipedia is an online encyclopedia, available in 301 different languages , and constantly updated by volunteers. Wikipedia is not only an encyclopedia, but it also has been used as an ontology to build a corpus, classify entities, cluster documents, create an annotation, recommend documents to a user, etc. Below, I review some of the significant publications in these areas. Using Wikipedia as a corpus: Wikipedia has been used to create corpora that can be used for text classification or annotation. In “ Named entity corpus construction using Wikipedia and DBpedia ontology ” (LREC 2014), YoungGyum Hahm et al. created a method to use Wikipedia, DBpedia , and SPARQL queries to generate a named entity corpus. The method used in this paper can be accomplished in any language. Fabian Suchanek used Wikipedia, WordNet , and Geonames to create an ontology called YAGO, which contains over 1.7 million entities and 15 million facts. The paper “ YAGO: A large ontology from Wikipedia ...