Abstract
Existing algorithms of mining frequent XML query patterns (XQPs) employ a candidate generate-and-test strategy. They involve expensive candidate enumeration and costly tree-containment checking. Further, most of existing methods compute the frequencies of candidate query patterns from scratch periodically by checking the entire transaction database, which consists of XQPs transferred from user query logs. However, it is not straightforward to maintain such discovered frequent patterns in real XML databases as there may be frequent updates that may not only invalidate some existing frequent query patterns but also generate some new frequent query patterns. Therefore, a drawback of existing methods is that they are rather inefficient for the evolution of transaction databases. To address above-mentioned problems, this paper proposes an efficient algorithm ESPRIT to mine frequent XQPs without costly tree-containment checking. ESPRIT transforms XML queries into sequences using a one-to-one mapping technique and mines the frequent sequences to generate frequent XQPs. We propose two efficient incremental algorithms, ESPRIT-i and ESPRIT-i +, to incrementally mine frequent XQPs. We devise several novel optimization techniques of query rewriting, cache lookup, and cache replacement to improve the answerability and the hit rate of caching. We have implemented our algorithms and conducted a set of experimental studies on various datasets. The experimental results demonstrate that our algorithms achieve high efficiency and scalability and outperform state-of-the-art methods significantly.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: VLDB, pp 487–499
Aggarwal C, Ta N, Wang J, Feng J, Zaki MJ (2007) Xproj: a framework for projected structural clustering of xml documents. In: KDD
Asai T, Abe K, Kawasoe S, Arimura H, Sakamoto H, Arikawa S (2002) Efficient substructure discovery from large semi-structured data. In: SDM
Aumann Y, Feldman R, Liphstat O, Mannila H (1999) Borders: an efficient algorithm for association generation in dynamic databases. J Intell Inf Syst 12(1): 61–73
Ayres J, Flannick J, Gehrke J, Yiu T (2002) Sequential pattern mining using a bitmap representation. In: KDD
Balmin A, Ozcan F, Beyer K, Cochrane R, Pirahesh H (2004) A framework for using materialized xpath views in xml query processing. In: VLDB, pp 60–71
Bettini C, Wang XS, Jajodia S (1998) Mining temporal relationships with multiple granularities in time sequences. IEEE Data Eng Bull 21(1): 32–38
Chen L, Rundensteiner EA, Wang S (2002) Xcache: a semantic caching system for xml queries. In: SIGMOD
Chen Y, Yang L, Wang YG (2004) Incremental mining of frequent xml query patterns. In: ICDM, pp 343–346
Chung C-W, Min J-K, Shim K (2002) Apex: an adaptive path index for xml data. In: SIGMOD Conference, pp 121–132
Dehaspe L, Toivonen H, King R (1998) Finding frequent substructures in chemical compounds. In: KDD, pp 30–36
Feng J, Qian Q, Wang J, Zhou L (2006) Exploit sequencing to accelerate hot xml query pattern mining. In: ACM SAC
Feng J, Ta N, Li G (2007) Exploit sequencing views in semantic cache to accelerate xpath query evaluation. In: WWW
Han J, Dong G, Yin Y (1999) Efficient mining of partial periodic patterns in time series database. In: ICDE, pp 106–115
Han J, Pei J, Mortazavi-Asl B, Chen Q, Dayal U, Hsu M (2000) Freespan: frequent pattern-projected sequential pattern mining. In: KDD, pp 355–359
Hristidis V, Petropoulos M (2002) Semantic caching of xml databases. In: WebDB
Kaushik R, Shenoy P, Bohannon P, Gudes E (2002) Exploiting local similarity for indexing paths in graph-structured data. In: ICDE, pp 129–140
Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: ICDM, pp 313–320
Kwon J, Rao P, Moon B, Lee S (2005) Fist: scalable xml document filtering by sequencing twig patterns. In: VLDB, pp 217–228
Li G, Feng J, Ta N, Zhang Y, Zhou L (2006a) Scend: an efficient semantic cache to exploit xpath query/view answerability. In: WISE, pp 460–473
Li G, Feng J, Wang J, Zhang Y, Zhou L (2006b) Incremental mining of frequent query patterns from xml queries for caching. In: ICDM, pp 350–361
Luo Q, Krishnamurthy S, Mohan C, Pirahesh H, Woo H, Lindsay BG, Naughton JF (2002) Middle-tier database caching for e-business. In: SIGMOD, pp 600–611
Mandhani B, Suciu D (2005) Query caching and view selection for xml databases. In: VLDB
Masseglia F, Cathala F, Poncelet P (1998) The psp approach for mining sequential patterns. In: PKDD
Milo T, Suciu D (1999) Index structures for path expressions. In: ICDT, pp 277–295
Ozden B, Ramaswamy S, Silberschatz A (1998) Cyclic association rules. In: ICDE, pp 412–421
Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M (2001) Prefixspan: Mining sequential patterns by prefix-projected growth. In: ICDE, pp 215–224
Prüfer H (1918) Neuer beweis eines satzes uber permutationen. Archiv fur Mathematik und Physik 27: 142–144
Qun C, Lim A, Ong KW (2003) D(k)-index: an adaptive structural summary for graph-structured data. In: SIGMOD, pp 134–144
Rao PR, Moon B (2004) Prix: indexing and querying xml using prufer sequences. In: ICDE, pp 288–299
Re C, Brinkley J, Hinshaw K, Suciu D (2004) Distributed xquery. In: Information integration on the web (IIWeb)
Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: EDBT, pp 3–17
Termier A, Rousset M-C, Sebag M (2002) Treefinder: a first step towards xml data mining. In: ICDM, pp 450–457
Wang J, Han J (2004) Bide: efficient mining of frequent closed sequences. In: ICDE, pp 79–90
Wang K, Liu H (2000) Discovering structural association of semistructured data. IEEE TKDE 12(2): 353–371
Wang H, Park S, Fan W, Yu PS (2003) Vist: a dynamic index method for querying xml data by tree structures. In: SIGMOD, pp 110–121
Xu W (2005) The framework of an xml semantic caching system. In: WebDB
Yan X, Han J, Afshar R (2003) Clospan: mining closed sequential patterns in large databases. In: SDM
Yang J, Wang W, Yu PS, Han J (2002) Mining long sequential patterns in a noisy environment. In: SIGMOD, pp 406–417
Yang LH, Lee M-L, Hsu W (2003a) Efficient mining of xml query patterns for caching. In: VLDB, pp 69–80
Yang LH, Lee M-L, Hsu W, Acharya S (2003b) Mining frequent query patterns from xml queries. In: DASFAA, pp 355–362
Yang LH, Lee ML, Hsu W, Guo X (2004) 2pxminer: an efficient two pass mining of frequent xml query patterns. In: KDD
Zaki MJ (2002) Efficiently mining frequent trees in a forest. In: SIGKDD, pp 71–80
Zaki MJ (2005) Efficiently mining frequent trees in a forest: algorithms and applications. IEEE TKDE 17(8): 1021–1035
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: M. J. Zaki.
Rights and permissions
About this article
Cite this article
Li, G., Feng, J., Wang, J. et al. Incremental sequence-based frequent query pattern mining from XML queries. Data Min Knowl Disc 18, 472–516 (2009). https://doi.org/10.1007/s10618-009-0126-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-009-0126-5