Skip to main content

Grubber: Allowing End-Users to Develop XML-Based Wrappers for Web Data Sources

  • Conference paper
Advances in Data and Web Management (APWeb 2009, WAIM 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5446))

  • 1302 Accesses

Abstract

There exist numerous online data sources on the Web. It is desirable to facilitate end-users to build XML-based wrappers from the data sources for further composition and reuse. This paper describes Grubber, a tool that allows end-users to develop XML-based wrappers from these data sources with just a few mouse clicks and keystrokes. An active learning algorithm was proposed and implemented to reduce end-users’ effort. Experimental results on real-world sites show that the algorithm can achieve a high degree of effectiveness. Compared with other similar tools, Grubber includes a number of usability improvements to lower the barrier of usage and we believe it is suitable for mass end-users to build situational applications.

Supported by the National Natural Science Foundation of China under Grant No.60573117 and Grant No.70673098, and the National Basic Research Program of China under Grant No.2007CB310804.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Simmen, D.E., Altinel, M., Markl, V., et al.: Damia: data mashups for intranet applications. In: Proc. of International Conference on Management of Data, pp. 1171–1182 (2008)

    Google Scholar 

  2. Pipes: Rewire the web, http://pipes.yahoo.com/pipes/

  3. Chang, C.H., Kayed, M., Girgis, M.R., Shaalan, K.: A Survey of Web Information Extraction Systems. IEEE Trans. Knowl. Data Eng. 18(10), 1411–1428 (2006)

    Article  Google Scholar 

  4. Arasu, A., Garcia-Molina, H.: Extracting Structured Data from Web Pages. In: Proc. of the International Conference on Management of Data, pp. 337–348 (2003)

    Google Scholar 

  5. Zhao, H.K., Meng, W.Y., et al.: Mining templates from search result records of search engines. In: Proc. of SIGKDD 2007, pp. 884–893 (2007)

    Google Scholar 

  6. Hogue, A., Karger, D.R.: Thresher: automating the unwrapping of semantic content from the World Wide Web. In: Proc. of WWW 2005, pp. 86–95 (2005)

    Google Scholar 

  7. Irmak, U., Suel, T.: Interactive wrapper generation with minimal user effort. In: Proc. of the 15th International Conference on World Wide Web, pp. 553–563 (2006)

    Google Scholar 

  8. Wong, J., Hong, J.I.: Making mashups with marmite: towards end-user programming for the web. In: Proc. of CHI 2007, pp. 1435–1444 (2007)

    Google Scholar 

  9. Google Directory, http://www.google.com/dirhp

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yang, S., Wang, G., Han, Y. (2009). Grubber: Allowing End-Users to Develop XML-Based Wrappers for Web Data Sources. In: Li, Q., Feng, L., Pei, J., Wang, S.X., Zhou, X., Zhu, QM. (eds) Advances in Data and Web Management. APWeb WAIM 2009 2009. Lecture Notes in Computer Science, vol 5446. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00672-2_65

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00672-2_65

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00671-5

  • Online ISBN: 978-3-642-00672-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics