Creating a Dead Poets Society: Extracting a Social Network of Historical Persons from the Web

Geleijnse, Gijs; Korst, Jan

doi:10.1007/978-3-540-76298-0_12

Gijs Geleijnse¹³ &
Jan Korst¹³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4825))

Included in the following conference series:

5010 Accesses
3 Citations

Abstract

We present a simple method to extract information from search engine snippets. Although the techniques presented are domain independent, this work focuses on extracting biographical information of historical persons from multiple unstructured sources on the Web. We first similarly find a list of persons and their periods of life by querying the periods and scanning the retrieved snippets for person names. Subsequently, we find biographical information for the persons extracted. In order to get insight in the mutual relations among the persons identified, we create a social network using co-occurrences on the Web. Although we use uncontrolled and unstructured Web sources, the information extracted is reliable. Moreover we show that Web Information Extraction can be used to create both informative and enjoyable applications.

Download to read the full chapter text

Chapter PDF

Community Detection Algorithms for Cultural and Natural Heritage Data in Social Networks

Analysis of Online User Behaviour for Art and Culture Events

A semantically annotated corpus of tombstone inscriptions

Article Open access 25 October 2021

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

McDowell, L., Cafarella, M.J.: Ontology-driven information extraction with ontosyphon. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 428–444. Springer, Heidelberg (2006)
Chapter Google Scholar
Etzioni, O., Cafarella, M.J., Downey, D., Popescu, A., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised named-entity extraction from the web: An experimental study. Artificial Intelligence 165(1), 91–134 (2005)
Article Google Scholar
van Hage, W.R., Kolb, H., Schreiber, G.: A method for learning part-whole relations. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 723–736. Springer, Heidelberg (2006)
Chapter Google Scholar
Geleijnse, G., Korst, J.: Learning effective surface text patterns for information extraction. In: ATEM 2006. Proceedings of the EACL 2006 workshop on Adaptive Text Extraction and Mining, Trento, Italy, pp. 1–8 (2006)
Google Scholar
Hearst, M.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th conference on Computational linguistics, Nantes, France, pp. 539–545 (1992)
Google Scholar
Crescenzi, V., Mecca, G.: Automatic information extraction from large websites. Journal of the ACM 51(5), 731–779 (2004)
Article MathSciNet Google Scholar
Downey, D., Etzioni, O., Soderland, S.: A probabilistic model of redundancy in information extraction. In: IJCAI 2005. Proceeding of the 19th International Joint Conference on Artificial Intelligence, Edinburgh, UK pp. 1034–1041 (2005)
Google Scholar
Downey, D., Broadhead, M., Etzioni, O.: Locating Complex Named Entities in Web Text. In: IJCAI 2007. Proceedings of the Twentieth International Joint Conference on Artificial Intelligence, Hyderabad, India (2007)
Google Scholar
Sumida, A., Torisawa, K., Shinzato, K.: Concept-instance relation extraction from simple noun sequences using a full-text search engine. In: WebConMine. Proceedings of the ISWC 2006 workshop on Web Content Mining with Human Language Technologies, Athens, GA (2006)
Google Scholar
Cimiano, P., Staab, S.: Learning by Googling. SIGKDD Explorations Newsletter 6(2), 24–33 (2004)
Article Google Scholar
Ravichandran, D., Hovy, E.: Learning surface text patterns for a question answering system. In: ACL 2002. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, pp. 41–47 (2002)
Google Scholar
Cilibrasi, R., Vitanyi, P.: Automatic meaning discovery using Google (2004), http://www.cwi.nl/~paulv/papers/amdug.pdf
Zadel, M., Fujinaga, I.: Web services for music information retrieval. In: ISMIR 2004. Proceedings of 5th International Conference on Music Information Retrieval, Barcelona, Spain (2004)
Google Scholar
Véronis, J.: Weblog (2006), http://aixtal.blogspot.com
Geleijnse, G., Korst, J., de Boer, V.: Instance classification using co-occurrences on the web. In: WebConMine. Proceedings of the ISWC 2006 workshop on Web Content Mining with Human Language Technologies, Athens, GA (2006), http://orestes.ii.uam.es/workshop/3.pdf
Mori, J., Tsujishita, T., Matsuo, Y., Ishizuka, M.: Extracting relations in social networks from the web using similarity between collective contexts. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 487–500. Springer, Heidelberg (2006)
Chapter Google Scholar
Jin, Y., Matsuo, Y., Ishizuka, M.: Extracting a social network among entities by web mining. In: WebConMine. Proceedings of the ISWC 2006 workshop on Web Content Mining with Human Language Technologies, Athens, GA (2006)
Google Scholar
Zhou, G., Su, J.: Named entity recognition using an hmm-based chunk tagger. In: ACL 2002. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, pp. 473–480 (2002)
Google Scholar
Brothwick, A.: A Maximum Entropy Approach to Named Entity Recognition. PhD thesis, New York University (1999)
Google Scholar
Finkel, J.R., Grenager, T., Manning, C.D.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, Ann Arbor,MI (2005)
Google Scholar
Korst, J., Geleijnse, G., de Jong, N., Verschoor, M.: Ontology-based extraction of information from the World Wide Web. In: Intelligent Algorithms in Ambient and Biomedical Computing, pp. 149–167. Springer, Heidelberg (2006)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Philips Research, High Tech Campus 34, 5656 AE Eindhoven, The Netherlands
Gijs Geleijnse & Jan Korst

Authors

Gijs Geleijnse
View author publications
You can also search for this author in PubMed Google Scholar
Jan Korst
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015, Lausanne, Switzerland
Karl Aberer
Korea Advanced Institute of Science and Technology, 305-701, Daejeon, Korea
Key-Sun Choi
Stanford University, 94305, Stanford, CA, USA
Natasha Noy
TopQuadrant, 22314, VA, USA
Dean Allemang
Saltlix Inc., Korea
Kyung-Il Lee
Free University of Berlin, Germany
Lyndon Nixon
University of Maryland, 20742, College Park, MD, USA
Jennifer Golbeck
Yahoo! Research Barcelona, Spain
Peter Mika
University of Sheffield, S1 4DP, Sheffield, United Kingdom
Diana Maynard
Osaka University, 565-0047, Osaka, Japan
Riichiro Mizoguchi
Vrije Universiteit Amsterdam, The Netherlands
Guus Schreiber
École Polytechnique Fédérale de Lausanne (EPFL), 1015, Lausanne, Switzerland
Philippe Cudré-Mauroux

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Geleijnse, G., Korst, J. (2007). Creating a Dead Poets Society: Extracting a Social Network of Historical Persons from the Web. In: Aberer, K., et al. The Semantic Web. ISWC ASWC 2007 2007. Lecture Notes in Computer Science, vol 4825. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76298-0_12

Download citation

DOI: https://doi.org/10.1007/978-3-540-76298-0_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-76297-3
Online ISBN: 978-3-540-76298-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Creating a Dead Poets Society: Extracting a Social Network of Historical Persons from the Web

Abstract

Chapter PDF

Similar content being viewed by others

Community Detection Algorithms for Cultural and Natural Heritage Data in Social Networks

Analysis of Online User Behaviour for Art and Culture Events

A semantically annotated corpus of tombstone inscriptions

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Creating a Dead Poets Society: Extracting a Social Network of Historical Persons from the Web

Abstract

Chapter PDF

Similar content being viewed by others

Community Detection Algorithms for Cultural and Natural Heritage Data in Social Networks

Analysis of Online User Behaviour for Art and Culture Events

A semantically annotated corpus of tombstone inscriptions

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation