Abstract
There exist numerous online data sources on the Web. It is desirable to facilitate end-users to build XML-based wrappers from the data sources for further composition and reuse. This paper describes Grubber, a tool that allows end-users to develop XML-based wrappers from these data sources with just a few mouse clicks and keystrokes. An active learning algorithm was proposed and implemented to reduce end-users’ effort. Experimental results on real-world sites show that the algorithm can achieve a high degree of effectiveness. Compared with other similar tools, Grubber includes a number of usability improvements to lower the barrier of usage and we believe it is suitable for mass end-users to build situational applications.
Supported by the National Natural Science Foundation of China under Grant No.60573117 and Grant No.70673098, and the National Basic Research Program of China under Grant No.2007CB310804.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Simmen, D.E., Altinel, M., Markl, V., et al.: Damia: data mashups for intranet applications. In: Proc. of International Conference on Management of Data, pp. 1171–1182 (2008)
Pipes: Rewire the web, http://pipes.yahoo.com/pipes/
Chang, C.H., Kayed, M., Girgis, M.R., Shaalan, K.: A Survey of Web Information Extraction Systems. IEEE Trans. Knowl. Data Eng. 18(10), 1411–1428 (2006)
Arasu, A., Garcia-Molina, H.: Extracting Structured Data from Web Pages. In: Proc. of the International Conference on Management of Data, pp. 337–348 (2003)
Zhao, H.K., Meng, W.Y., et al.: Mining templates from search result records of search engines. In: Proc. of SIGKDD 2007, pp. 884–893 (2007)
Hogue, A., Karger, D.R.: Thresher: automating the unwrapping of semantic content from the World Wide Web. In: Proc. of WWW 2005, pp. 86–95 (2005)
Irmak, U., Suel, T.: Interactive wrapper generation with minimal user effort. In: Proc. of the 15th International Conference on World Wide Web, pp. 553–563 (2006)
Wong, J., Hong, J.I.: Making mashups with marmite: towards end-user programming for the web. In: Proc. of CHI 2007, pp. 1435–1444 (2007)
Google Directory, http://www.google.com/dirhp
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yang, S., Wang, G., Han, Y. (2009). Grubber: Allowing End-Users to Develop XML-Based Wrappers for Web Data Sources. In: Li, Q., Feng, L., Pei, J., Wang, S.X., Zhou, X., Zhu, QM. (eds) Advances in Data and Web Management. APWeb WAIM 2009 2009. Lecture Notes in Computer Science, vol 5446. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00672-2_65
Download citation
DOI: https://doi.org/10.1007/978-3-642-00672-2_65
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00671-5
Online ISBN: 978-3-642-00672-2
eBook Packages: Computer ScienceComputer Science (R0)