Abstract
The flexibility of XML data model allows a more natural representation of uncertain data compared with the relational model. Matching twig pattern against XML data is a fundamental problem in querying information from XML documents. For a probabilistic XML document, each twig answer has a probabilistic value because of the uncertainty of data. The twig answers that have small probabilistic value are useless to the users, and usually users only want to get the answers with the k largest probabilistic values. To this end, existing algorithms for ordinary XML documents cannot be directly applicable due to the need for handling probability distributional nodes and efficient calculation of top-k probabilities of answers in probabilistic XML. In this paper, we address the problem of finding twig answers with top-k probabilistic values against probabilistic XML documents directly. We propose a new encoding scheme called PEDewey for probabilistic XML in this paper. Based on this encoding scheme, we then design two algorithms for finding answers of top-k probabilities for twig queries. One is called ProTJFast, to process probabilistic XML data based on element streams in document order, and the other is called PTopKTwig, based on the element streams ordered by the path probability values. Experiments have been conducted to study the performance of these algorithms.
Similar content being viewed by others
References
Abiteboul, S., Senellart, P.: Queryig and updating probabilistic information in XML. In: Prodeeding of EDBT, pp. 1059–1068 (2006)
Bruno, N., Srivastava, D., Koudas, N.: Holistic twig joins: optimal XML pattern matching. In: Proceedings of SIGMOD, pp. 310–321 (2002)
Chang, L., Yu, J.X., Qin, L.: Query ranking in probabilistic XML data. In: Proceeding of EDBT, pp. 156–167 (2009)
Diaz, A., Lovell, D.: XML Generator. http://www.alphaworks.ibm.com/tech/xmlgenerator/. Accessed Sept 1999
Grust, T.: Accelerating XPath location steps. In: Proceeding of SIGMOD, pp. 109–120 (2002)
Hua, M., Pei, J., Zhang, W., Lin, X.: Ranking queries on uncertain data: a probabilistic threshold approach. In: Proceeding of SIGMOD, pp. 673–686 (2008)
Hua, M., Pei, J., Zhang, W., Lin, X.: Efficiently answering probabilistic threshold top-k queries on uncertain data. In: Proceeding of ICDE, pp. 1403–1405 (2008)
Hung, E., Getoor, L., Subrahmanian, V.S.: Probabilistic interval XML. In: Proceeding of ICDT, pp. 358–374 (2003)
Hung, E., Getoor, L., Subrahmanian, V.S.: PXML: a probabilistic semistructured data model and algebra. In: Proceeding of ICDE, pp. 467–478 (2003)
Kimelfeld, B., Kosharovsky, Y., Sagiv, Y.: Query efficiency in probabilistic XML models. In: Proceeding of SIGMOD, pp. 701–714 (2008)
Kimelfeld, B., Sagiv, Y.: Matching twigs in probabilistic XML. In: Proceeding of VLDB, pp. 27–38. (2007)
Li, J., Liu, C., Zhou, R., Wang, W.: Top-k keyword search over probabilistic xml data. In: Proceedings of ICDE, pp. 673–684 (2011)
Liu, C., Li, J., Yu, J.X., Zhou R.: Adaptive relaxation for querying heterogeneous XML data sources. Inf. Syst. 35(6), 688–707 (2010)
Liu, C., Vincent, M.W., Liu, J.: Constraint preserving transformation from relational schema to XML schema. World Wide Web 9(1), 93–110 (2006)
Lu, J., Ling, T.W., Chan, C.Y., Chen, T.: From region encoding to extended dewey. On efficient processing of XML twig pattern matching. In: Proceeding of VLDB, pp. 193–204 (2005)
Nierman, A., Jagasish, H.V.: ProTDB: probabilistic data in XML. In: Proceeding of VLDB, pp. 646–657 (2002)
Qin, L., Yu, J.X., Ding, B.: TwigList: make twig pattern matching fast. In: Proceeding of DASFAA, pp. 850–862 (2007)
Senellart, P., Abiteboul, S.: On the complexity of managing probabilistic XML data. In: roceeding of PODS, pp. 283–292 (2007)
University of Washington XML Repository. http://www.cs.washington.edu/research/xmldatasets/. Accessed Oct 2002
Wang, G., Ning, B., Yu, G.: Holistically stream-based processing Xtwig queries. World Wide Web 11(4), 407–425 (2008)
Wu, X., Theodoratos, D., Souldatos, S., Dalamagas, T., Sellis, T.: Evaluation techniques for generalized path pattern queries on XML data. World Wide Web 13(4), 441–474 (2010)
Yi, K., Li, F., Kollios, G., Srivastava, D.: Efficient processing of top-k queries in uncertain databases. In: Proceeding of ICDE, pp. 1406–1408 (2008)
Yi, K., Li, F., Kollios, G., Srivastava, D.: Efficient processing of top-k queries in uncertain databases with x-relations. TKDE 20(12), 1669–1682 (2008)
Zhang, C., Naughton, J., DeWitt, D., Luo, Q., Lohman, G.: On supporting containment queries in relational database management systems. In: Proceeding of SIGMOD, pp. 425–436 (2001)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ning, B., Liu, C. & Yu, J.X. Efficient processing of top-k twig queries over probabilistic XML data. World Wide Web 16, 299–323 (2013). https://doi.org/10.1007/s11280-011-0144-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-011-0144-2