The VLDB Journal manuscript No.
(will be inserted by the editor)
Flavio Rizzolo · Alejandro A. Vaisman
Temporal XML: Modeling, Indexing, and Query Processing
Received: date / Accepted: date
Abstract In this paper we address the problem of modeling and implementing temporal data in XML. We propose a data model for tracking historical information in
an XML document and for recovering the state of the
document as of any given time. We study the temporal
constraints imposed by the data model, and present algorithms for validating a temporal XML document against
these constraints, along with methods for fixing inconsistent documents. In addition, we discuss different ways
of mapping the abstract representation into a temporal XML document, and introduce TXPath, a temporal
XML query language that extends XPath 2.0.
In the second part of the paper, we present our approach for summarizing and indexing temporal XML
documents. In particular we show that by indexing continuous paths, i.e., paths that are valid continuously during a certain interval in a temporal XML graph, we
can dramatically increase query performance. To achieve
this, we introduce a new class of summaries, denoted
TSummary, that adds the time dimension to the wellknown path summarization schemes. Within this framework, we present two new summaries: LCP and Interval summaries. The indexing scheme, denoted TempIndex, integrates these summaries with additional data
structures. We give a query processing strategy based on
TempIndex and a type of ancestor-descendant encoding,
denoted temporal interval encoding. We present a persistent implementation of TempIndex, and a comparison
against a system based on a non-temporal path index,
University of Toronto
40 St. George St. Bahen Center for Information Technology
University of Toronto,
Toronto, Ontario M5S 2E4 Canada
Tel.: +1-416-946-0398
Fax: +123-45-678910
E-mail: flavio@cs.toronto.edu
Universidad de Chile and Universidad de Buenos Aires
Ciudad Universitaria, Pabellon I, Buenos Aires, Argentina
Tel.:+5411-4576-3359
Fax: +5411-4576-3359
E-mail: avaisman@dc.uba.ar
and one based on DOM. Finally, we sketch a language
for updates, and show that the cost of updating the index is compatible with real-world requirements.
Keywords : XML, Temporal databases, Semistructured data, Structural summaries, XPath.
1 Introduction
The topic of representing, querying and updating temporal information has received little attention in the XML
literature. Nevertheless, time is present in almost any
real-world application, especially in web and e-business
applications. In this paper we will show how temporal
database concepts [52, 19] can be applied to define, query
and manage temporal XML documents, i.e., XML documents that can be navigated across time.
The graph depicted in Figure 1 is an abstract representation of a temporal XML document for a portion
of the NBA1 database. We will be using this example
throughout the paper. The league is composed of franchises that maintain teams, and each team has a set of
players that may change over time. Some franchises may
have players directly associated to them, not included
in teams. The database also records some statistics for
each player. Note some of the dynamics that this example graph models: players move from one franchise to
another, usually from year to year, while their statistics
change from match to match. For instance, in this database, node 14 represents player Williams. The dashed
line between nodes 2 and 14, labeled [0,22], indicates
that he played for the Orlando Magic between instants
‘0’ and ‘22’. After that, he moved to the Toronto Raptors
(a team corresponding to this franchise is represented by
node 5), where he is currently playing. This is represented by the solid line joining nodes 5 and 14, labeled
1
National Basketball Association, a professional basketball
league
2
[23,Now]. Notice that in spite of the change of franchise,
there is only one node for each player, which contains
all the player’s information. Thus, regardless of the franchise he played for, the graph shows that Williams scored
twenty-two points throughout his career. As another example, node 24 represents player Garrity, who scored
fifteen points between instants ‘0’ and ‘10’, and twelve
points since then. In the next sections we will describe
in detail the components of Figure 1.
The information contained in the abstract representation of a temporal document presented in Figure 1 allows to traverse the history of the NBA stored using this
single document. We can then (a) query the state of the
database at a certain point in time (technically, a snapshot of the document); or (b) pose temporal queries like
“players who played for the Toronto Raptors continuously since at least the year 2000” or “name of the players who were with the Orlando Magic when McGrady
joined the franchise for the first time”. For these kinds
of queries we provide in Section 7 an indexing scheme
and in Section 8 efficient query evaluation techniques.
Other approaches (based on versioning) store only
the information at some point in time, and use edit
scripts and diff algorithms to reconstruct the histories
of the document. In Section 2 we discuss the different
ways of tackling this problem, and show that our approach can do better than versioning, mainly when we
expect frequent updates that may cause a query to span
over many versions.
In the first part of this work we address the problem
of modeling and implementing temporal features in XML
documents. We begin by defining an abstract model for
temporal XML documents as a graph with annotated
edges of two kinds: containment edges, describing element nesting and attribute values, and reference edges
describing IDREF to ID references. Both kinds of edges
are annotated with temporal elements (actually, for the
sake of clarity, we will work with single intervals). Next,
we study consistency conditions for temporal XML documents based on our data model (although our approach
is general enough to be extended to other data models).
We present algorithms for checking consistency, study
their computational complexity, and discuss solutions
for fixing inconsistencies. To the best of our knowledge,
this is the first contribution on consistency of temporal
XML documents, although this has been studied for nontemporal XML in [20]. In addition, we discuss different
ways of mapping the abstract temporal model into a concrete XML document and, finally, introduce TXPath, a
query language that extends XPath for supporting temporal queries.
In the second part of the paper we present a framework for structural summaries of temporal XML documents, and study an indexing scheme, denoted TempIndex, introduced in previous work [37]. Several structural
summaries have been proposed in order to optimize path
query evaluation over non-temporal data graphs. Some
Rizzolo, F. and Vaisman, A.
recent works on structural summaries in the XML context include [25, 38,31,33,45]. Most of these proposals
keep record of the paths in the XML data by summarizing path information in different ways. They construct
a concise representation of the XML nodes based on
their labels, usually a labeled graph. Although indexing label paths on temporal documents helps to reduce
the search space, our experiments show that computing
paths within a given time interval is quite expensive even
in the presence of traditional path indexes. One possible
solution is to integrate the temporal dimension into the
indexing scheme in order to obtain better performance.
TempIndex accomplishes this integration by summarizing label paths together with temporal intervals and continuous paths (paths that are valid continuously during
a certain interval). Finally, we sketch a language for updates in XML, and show how consistency checking affects
the definition of update operators.
This paper considerably updates and extends the work
presented in [37]. Section 2 has been expanded, providing a better comparison of our model with other proposals. Section 3 gives a more detailed discussion of the
data model. Section 4 is completely new, presenting an
in-depth study of temporal consistency issues. Section
5 is also new. Sections 7 and 8 have been substantially
modified: the notion of structural summary, and the introduction of the TSummary class of summaries give a
totally new framework for the indexing scheme. Last, but
not least, we now present a persistent implementation,
which allows managing larger documents, and makes the
results presented in Section 10 much more relevant and
significant.
The remainder of the paper is organized as follows: in
Section 2 we review previous efforts in temporal semistructured/XML data and non-temporal structural summaries
indexes. In Section 3 we introduce the temporal data
model. Section 4 presents an in-depth study of the consistency conditions required by the data model, algorithms
for checking these conditions, and methods for fixing (if
needed) inconsistent documents. Section 5 presents four
alternatives for mapping the abstract representation to a
concrete XML document. TXPath is introduced in Section 6. We present the TSummary framework and the
TempIndex scheme in Section 7, and Section 8 explains
how to use this scheme in query processing. Section 9
discusses updates in temporal XML documents, and update management in TempIndex. Implementation and
testing results are presented in Section 10. We conclude
in Section 11.
2 Related Work
Temporal Relational Databases. Temporal relational database management has been extensively studied in the
literature, including data models [52] and query languages [10] (like TSQL2 [51]). However, most proposals
Temporal XML
3
✿ ❀ ❁❃❂ ❀
❄
● ✫✠✂ ✡ ✬ ✭ ✮ ✟
❅ ✫✠✂ ✡ ✬ ✭ ✮ ✟
❆ ✡ ✂ ☛☞✟
✹ ✏ ✺ ✣✻ ✢ ✒
✔✕ ✖ ✷ ✕ ✚
❊ ✝ ✆✂ ✞ ✟ ✠
❋ ✡ ✂ ☛✧✟
★ ✏ ✩ ✎✤ ✪
❇ ✁✟ ✂ ☛
✔✕ ✖ ✷ ✕ ✚
✔✕ ✖ ✷ ✷ ✚
❅ ❄ ✝ ✆✂ ✞ ✟ ✠
❅ ❆ ✝ ✆✂ ✞ ✟ ✠
✔✷ ✵ ✖✗ ✘ ✙ ✚
❅ ❊ ✝ ✆✂ ✞ ✟ ✠
✔✷ ✸ ✖ ✗ ✘ ✙ ✚
❅ ❋ ✡ ✂ ☛✛✟ ❅ ❉ ✁ ✂ ✁
✯ ✰ ✱ ✢✏ ✲ ✪
❈ ❄ ✡ ✂ ☛☞✟ ❅ ❈ ✁ ✂ ✁
✌ ✍✎✎✍✏ ✑ ✒
✔✕ ✖ ✗ ✘ ✙ ✚
✄
☎
✂
✆
✆
✂
✁
❍■❅ ❅
❅● ✄ ☎ ✂ ✆
✜ ✏ ✢ ✣✤ ✢
✥✦
✓✓
❉ ✡ ✂ ☛✛✟ ❈ ❅ ✁ ✂ ✁
❈ ✫✠✂ ✡ ✬ ✭ ✮ ✟
❅❍ ✄ ☎ ✂ ✆
✥✥
❅ ❇ ✡ ✂ ☛✛✟
✯ ✏ ✳ ✍✰
● ❆ ✝ ✆✂ ✞ ✟ ✠
● ❇ ✡ ✂ ☛☞✟
✼ ✏ ✽ ✾ ✽ ✣ ✻ ✽ ✍✻
● ❈ ✡ ✂ ✛☛ ✟ ● ● ✁ ✂ ✁
✱ ✏ ✢ ✢ ✍✣✪
✔✕ ✖ ✵ ✕ ✚ ✔ ✵ ✶ ✖ ✗ ✘ ✙ ✚
●❄ ✄ ☎ ✂ ✆ ●❅
✥ ✦✴✥ ✓
Fig. 1 Temporal XML document for a portion of the NBA database
in the relational database framework require complex extensions to SQL, and commercial databases provide only
limited built-in support for temporal information management.
Temporal Semistructured Databases. A model for managing historical semistructured data was proposed by
Chawathe et al. [6]. They extend the Object Exchange
Model (OEM) [7] with the ability to represent updates
and to keep track of them by means of “deltas”. However, they do not apply this work to XML. Along the
same lines, Oliboni et al. [40] proposed a graphical data
model and query language for semistructured data supporting transaction time, by means of attaching an interval of validity to the objects of the model. Dyreson et
al. [18] went further, allowing annotations on the edges
of the database graph that can refer not only to valid or
transaction times, but other kinds of metadata as well.
Temporal XML. In the last few years, many proposals
have addressed the problem of maintaining versions of
XML documents. Grandi [26] provides a good index to
bibliography on temporal aspects in the Web. Updates to
XML in a non-temporal framework has been first studied
by Tatarinov et al. [53]. They proposed a language for updating XML documents as an extension to XQuery [59].
A model for granting access to temporal XML documents
was introduced by De Capitani [14]; however, the focus
here is the authorization model, not the temporal features of the document. Grandi and Mandreoli [27] presented an infrastructure for managing temporal web documents. Amagasa et al. [2] introduced a temporal data
model based on XPath, but not a model for updates,
nor a query language taking advantage of the temporal model. Dyreson [17] proposed an extension to XPath
with support for transaction time by means of the addition of several temporal axes for specifying temporal
directions. Their focus is on document versioning over
the web in the absence of explicit timestamps. Manukyan
et al. [35] attempted formalizing temporal constituents
of XML documents. They do not address querying temporal XML documents, neither discuss implementation
issues. Chien et al. [8,9] proposed update and versioning
schemes for XML. First, they presented an edit-based
schema [8] in which the most current version of the document is maintained, and reverse edit scripts that allow
moving backward in version time. They later moved to a
scheme where version management is performed by keeping references to the maximal unchanged subtree in the
previous version [9], sharing unchanged elements among
versions. The main difference between their approach
and ours is that we maintain a single temporal document
from which versions can be extracted when needed. We
believe this is better for scenarios where changes are frequent and only affect a few elements of the document. In
this situation, creating a new physical version each time
an update occurs may lead to large overheads when processing temporal queries that span multiple versions. A
similar approach was followed by Marian et al. [36]. Their
goal was detecting, managing and notifying changes in
web data warehouses of XML data in the context of the
Xyleme project [1], a project aimed at building a dynamic World Wide Web data warehouse. The idea here
is that Xyleme periodically refreshes its data and computes the changes using a diff algorithm. All nodes are
assigned a Xyleme ID, which is independent of the ID attributes that the document may contain. It follows that
Xyleme’s goals and requirements differ from ours. Wang
and Zaniolo have also proposed solutions for the Web
Warehousing problem [55, 56]. In [55] they proposed a
valid time model that represents successive versions of a
document as an XML document (implementing a temporally grouped data model) which is then queried using
4
XQuery or any other XML query language. The latter
is the strongest point of this approach. They provide
versioning using the special attributes vstart and vend
(and, in a subsequent paper [56] tstart and tend, for
handling also transaction time). Their work shows some
examples of the kinds of queries that could be addressed
with this approach, but there is neither an in-depth study
of the model, nor experimental results supporting their
claims. Wang et al. [57, 58] used a similar concept for
managing and querying historical databases. They take
advantage of the fact that a temporally grouped data
model fits well into the XML data model. Thus, they
map historical databases to so-called H-documents. This
approach allows posing queries in XQuery and evaluate
them using a relational database. They provide preliminary experimental results over a very simple example.
Although this work overrides the problem of having multiple versions of the same document, and allows querying
with any standard XML query language, it is not clear
how general this solution is (i.e. how it can be efficiently
applied to more involved situations), given the limited
semantics of the data model. It seems that the temporal grouping assumption may limit the model to handle
particular cases, where no relationships between complex objects change (like in our NBA example). Nothing
is said about query optimization using index structures
appropriate for temporal information, and updates are
only vaguely discussed in [56].
Gergatsoulis and Stavrakas [24] introduced a model
for representing changes using an extension to XML denoted MXML (Multidimensional XML), where dimensions are applied to elements and attributes. Queries are
not addressed in this work, but the authors claim that
queries can be posed after a reduction from MXML to
XML.
Our proposal has similarities with the work of Buneman et al. [5]. In this work, the authors study data structures specifically suited for keeping historical information
about scientific data. They provide a versioning scheme
allowing storing all the information in a single document
(i.e., the authors also acknowledge the need for avoiding edit scripts when changes are frequent). Seminal in
many senses, we think the proposal is limited to relevant although very specific situations and data formats.
The authors also present timestamp trees, an efficient
indexing scheme that allows obtaining a version of the
document at some point in time. The proposal supports
documents where the changes consist in addition of information, and is not oriented to documents where the relationships between objects change, or when updates are
of a kind other than the insertion of elements. Moreover,
the scheme requires that each node in the graph representation of the temporal document must be uniquely
identified by the path in which it occurs and the values
of its subelements (following the concept of XML keys
discussed in [4]). The authors conclude that, if the document does not have a key system, the proposal requires a
Rizzolo, F. and Vaisman, A.
diff algorithm, turning it into a conventional Source Control Code System (SCCS). Also, the work is oriented to
queries asking for document snapshots or histories of elements, which are only a portion of the queries a temporal database must support. The indexing scheme is also
oriented to these kinds of queries. Finally, the issue of
handling document order is not considered in the paper.
We believe the model proposed in the present paper, although having some features in common with the work
of Buneman et al., considerably extends and improves
their work, overriding its many constraints.
Also close to our ideas, Gao et al. [22, 23] introduced
τ XQuery, an extension to XQuery supporting valid time
while maintaining the data model unchanged. Queries
are translated into XQuery and evaluated by an XQuery
engine. Even for simple temporal queries, this approach
results in long XQuery programs. Moreover, translating
a temporal query into a non-temporal one makes it more
difficult to apply query optimization and indexing techniques particularly suited for temporal XML documents.
We would like to make it clear here that we do not compare the expressiveness of τ XQuery against TXPath (the
language we propose), we only point out the different
approach. TXPath is not aimed at being a working temporal query language, but a tool for giving insight into
the problems that appear when querying temporal XML
data that is stored using different models.
It is worth noticing that none of the approaches commented above provides an in-depth study of the problems
of working with inconsistent temporal XML documents.
Moreover, most of these proposals only define vague consistency conditions for the data models that support
them. This is a subject overlooked so far in temporal
XML, although in the last few years the topic has been
addressed in the non-temporal XML framework (see for
example [20]). An important contribution of our work
is the study of different ways of tackling consistency in
temporal XML documents.
Structural Summaries for XML. Structural summaries
for XML data have been proposed in recent years in order to optimize path query evaluation. Most of these
proposals keep record of the paths in the XML data by
summarizing path information in different ways. They
construct a concise representation of the XML nodes
based on their labels, usually a labeled graph. Examples
of those are region inclusion graphs (RIGs) [13], representative objects (ROs)[39], dataguides [25], reversed
dataguides [34], 1-index, 2-index and T-index [38], and
more recently, ToXin [46], A(k)-index [33], F&B-Index
and F+B-Index [31], and HOPI [49]. Dataguides and
ROs group nodes into sets according to the label paths
incoming to them (each node may appear more than once
in the dataguide if the document instance is not just a
tree). RIGs, 1-index, T-index, ToXin, F&B-Index, and
F+B-Index, on the other hand, partition the data nodes
into equivalence classes (called extents in the literature)
Temporal XML
so that each node appears only once in the summary.
The partition is computed in different ways: according
to the node labels (RIGs), the label paths incoming to
the nodes (1-index, ToXin, A(k)-index), the label paths
going out from the nodes (reversed dataguides), or all
of the above (F&B-Index and F+B-Index). The length
of the paths in the summary also varies: ToXin, 1-index
and F&B-Index summarize paths of any length, whereas
A(k)-index and F+B-Index are synopsis of paths of a
fixed length. HOPI is the only proposal designed specifically for graph data instances: it materializes the 2-hop
cover of the graph.
Other summaries are augmented with statistical information of the instance for selectivity estimation, including path/branching distribution (XSKETCH [41, 42],
fXSKETCH [15]) and value distribution (XCLUSTER
[43]). Another proposal contains statistical information
for approximate query processing (TREESKETCH [44]).
A few adaptive summaries like APEX [11], D(k)-index
[45], and M(k)-index [29] use dynamic query workloads
to determine the subset of incoming paths to be summarized. APEX is a synopsis of frequently used paths of any
length. D(k)-index and M(k)-index, in contrast, summarize variable-length paths based on both the workload
and local similarity (the length of each path depends on
its location in the XML instance). In addition, updates
to structural indexes have been studied in [32] and [61]. It
is important to note that although using a non-temporal
summary reduces the search space for TXPath queries it
does not help with the temporal semantics of the query
evaluation.
In previous work [37] we addressed the problem of indexing temporal XML documents and introduced TempIndex, an indexing scheme for continuous paths that improves temporal query performance. In the second part
of this paper we discuss TempIndex in detail.
5
– Attribute nodes: labeled with the name of an attribute,
plus possibly one ‘ID’ or ‘REF’ annotation.
– Element nodes: labeled with an element tag, and containing outgoing links to attribute nodes, value nodes,
and other element nodes.
Each node is uniquely identified by an integer, the
node number, and is described by a string, the node label. Edges in the document graph are constrained to be
either containment edges or reference edges. A containment edge ec (ni , nj ) joins two nodes ni and nj such that:
(a) ni is either r or an element node, and nj is an attribute node, a value node or another element node; or
(b) ni is an attribute node, and nj is a value node containing the value for the attribute. Attribute nodes must
have exactly one outgoing containment edge (to the attribute’s value). A reference edge er (ni , nj ) links an attribute node ni of type REF, with an element node nj .
Finally, node and edge types in our model allow mixed
content, i.e. an element node may have different kinds of
child nodes, including more than one value node.
3.2 Temporal XML Documents
The mechanism we use for adding the time dimension
to document graphs consists in labeling edges with intervals. We consider time as a discrete, linearly ordered
domain. An ordered pair [a, b] of time points, with a ≤ b,
denotes the closed interval from a to b. A set of such intervals is called a temporal element. In what follows we
will only consider that edges are labeled with single intervals instead of temporal elements. Later in the paper
we will justify this decision. As is common in temporal databases, the current time point will be represented
with the distinguished word ‘Now’. The document creation instant will be denoted t0 .
3 Temporal XML Data Model
3.2.1 Time Labels
First we define a (fairly standard) graph model of an
XML document, and then we extend it to a temporal
model.
We extend the document graph model with temporal
labels. A temporal label is an interval Tec labeling a containment edge ec or reference edge er , respectively. The
meaning of this label is that given an edge ec between
nodes ni and nj , Tec will represent the time period where
the element represented by nj was contained in the element represented by ni . In this paper we will work
with the transaction time of the containment relation.
Although we do not deal with valid time, it could be addressed in an analogous way. Moreover, we will show that
a slight modification to the updates we propose would
suffice for supporting a limited notion of valid time. For
a reference edge er , Ter represents the transaction time
of the reference. Edges labeled with temporal labels will
be called temporal edges. In general, if an edge e is labeled with a temporal label Te , we will use Te .T O and
Te .F ROM to refer to the endpoints of the interval Te .
3.1 XML Documents
For our purposes, an XML document is a directed labeled
graph. We distinguish several classes of nodes:
– A distinguished node r, the root of the document,
such that r has no incoming edges, and every node in
the graph is reachable from r.
– Value nodes: nodes representing values (text or numeric). They have no outgoing edges, and have exactly one incoming edge, from attribute or element
nodes (or from the root).
6
Rizzolo, F. and Vaisman, A.
We say that two temporal labels Tei and Tej are consecutive if Tej .F ROM =Tei .T O + 1. Note that working with
single intervals instead of temporal elements (i.e., sets of
intervals) imposes some constraints to the model, which
are discussed in Section 3.3.
Definition 1 (Current Nodes and Edges) A temporal containment (reference) edge such that Te .T O =
N ow is called a current containment (reference) edge.
A node is called current if one of its incoming containment edges is current. (As we will see below, at most one
incoming containment edge can be current.)
3.2.2 Attribute Nodes
In the XML data model, attributes must be unique.
This limitation influences the way a temporal data model
supports these kinds of nodes. We may (a) disallow attributes to vary over time; (b) treat them as elements
of a special kind. We chose the second option. From a
formal modeling point of view, we make no difference
between an attribute and an element node (except that
attribute nodes cannot contain other elements). From a
practical point of view, we will define a special element,
denoted <ATTRIBUTE>. Consider, for example, an element
<person> representing a woman, with an attribute called
last name; the value for this attribute will change if she
marries. This will be treated as follows. (We will explain
the syntax later in the paper). At instant t0 the element
<person> looks like:
<person name="Maria">
<ATTRIBUTES>
<last name Time:From="0" Time:To="Now">
Perez
</last name>
</ATTRIBUTES>
</person> ...
After marrying at time t1 , the element will contain:
<person name="Maria">
<ATTRIBUTES>
<lastname Time:From="0" Time:To="t1-1">
Perez
</last name>
<last name Time:From="t1" Time:To="Now">
Perez-Gomez
</last name>
</ATTRIBUTES>
</person> ...
3.2.3 Temporal Data Model for XML
We are now ready to formally define a temporal XML
document. First, we introduce the notion of lifespan of
a node.
Definition 2 (Lifespan of a Node) The lifespan of a
node n, denoted lif espan(n), is the union of the temporal elements of all the containment edges incoming to the
node. The lifespan of the root is the interval [t0 , N ow].
Example 1 Consider our running example, the NBA database of Figure 1. The fact that McGrady played for
the Orlando Magic between instant ‘21’ and the current time, is represented by the current containment edge
(2, 16). The lifespan of node 16 is the union of the elements [0,20] (the temporal label of the incoming containment edge between nodes 5 and 16) and [21,Now] (the
label of the current incoming containment edge). To simplify the figures, we omit all temporal labels of the form
[t0 , N ow].
The definitions above, imply some consistency conditions that a graph must satisfy in order to be a temporal XML document. The following definition spells these
conditions out.
Definition 3 (Temporal XML Document) A Temporal XML Document is a document graph augmented
with temporal labels, that satisfies the following conditions:
1. The union of the temporal labels of the containment
edges outgoing from a node is contained in the lifespan of the node.
2. The temporal labels of the containment edges incoming to a node are consecutive.
3. For any time instant t, the sub-graph composed by
all containment edges ec such that t ∈ Tec is a tree
with root r. We call this subgraph a snapshot of the
document at time t, denoted D(t).
4. For any containment edge ec (ni , nj , Tec ), if nj is a
node of type ID, the time label of ec is the same as
the lifespan of ni ; moreover, if there are two elements
in the document with the same value for an ID attribute, both elements are the same. In other words,
the ID of a node remains constant for all the snapshots of the document.
5. For any containment edge ec (ni , nj , Tec ), if nj is an
attribute of type REF, such that there exists a reference edge er (nj , nk , Ter ), then Tec = Ter holds.
6. Let er (ni , nj , Ter ) be a reference edge. Then, Ter ⊆
lif espan(nj ) holds.
3.3 Discussion
We will discuss some characteristics of the model, and
some assumptions we have made.
The second condition in Definition 3 implies that we
will be working with plain intervals instead of temporal
elements (i.e. sets of intervals). This assumption simplifies the presentation and makes the implementations
more efficient. Our definitions and theorems can be, however, extended to the case of temporal elements. There
are, of course, semantic and practical consequences of
our decision. For example, suppose we want to represent
the fact that Michael Jordan played for the Chicago Bulls
between 1996 and 1998 (i.e., there is a node for the Bulls,
Temporal XML
7
root
root
[0,t2]
[t5,Now]
n3
n1
[0,t2]
[t5,Now]
n1
[t5,Now]
[t5,Now]
n2
n2
(a)
(b)
Fig. 2 (a) A gap in lifespan of node n1; (b) A possible solution
another one for Jordan, and an edge between them, labeled with the interval [1996,1998]); then he retired, and
after a year he resumed his career. As the model requires that the edges incoming to a node must be consecutive, we cannot represent this situation adding an
edge labeled [2000, Now]. A solution could be to create a
parent node for ‘retired’ players, with an edge to the Jordan’s node (labeled [1999,1999]), and then, again an edge
from the Chicago Bulls’ node to the Jordan node, labeled
[2000,N ow]. We can see that this solution does not generate a significant problem (we may even think, in this
case, that it could be a natural way of representing the
situation). Another solution, more syntactically oriented
(and more likely to be used if the non-consecutiveness
came from an inconsistency in the document), can be to
duplicate the node with temporal gaps in the labels of its
incoming edges. An abstract example is shown in Figure
2.
Remark 1 There is no condition preventing more than
one edge between the same two nodes. If they are consecutive, we assume they are coalesced into a single node.
Note that the first constraint in Definition 3 implies
that, even though containment edges can only represent
containment relations of the same kind in a particular
instant, this containment relationship can be a different
one in another instant. For example, in the NBA document, a node for McGrady has incoming containment
edges from the “team” and “franchise” elements. For any
other relationship occurring at the same time we need to
use reference edges.
The constraint of having a unique ID throughout the
whole history of the document allows overriding many
of the restrictions present in [5]. However, it introduces
other kinds of problems. Let us suppose that in our
running example we would like to represent the same information in a different way, namely with the franchises
elements “below” the player nodes (e.g., below the McGrady node we find the Raptors and Magic nodes, with
the corresponding temporal labels over the containment
edges). The problem here is that the ID attribute would
not identify a franchise node (two instances of the same
franchise, in the same snapshot, will have different IDs).
In this case, even though temporal queries could be answered, we are losing the desirable property of having all
the information for a franchise in the same node. Here,
the temporal key for a franchise should be the value node
containing its name. In the remainder of the paper we
will assume that all documents comply with the constraint that, in each snapshot, containment relationships
are many-to-one from child to parent nodes (like in Figure 1). In other words, all nodes in a path of containment
edges are relative keys (in a snapshot) in the sense of [4].
This, along with the ID constraint, allows identifying a
node throughout the document’s history.
Definition 4 (Current Subtree) We denote Dc the
subgraph of the temporal XML document D containing
no reference edges. Given a temporal XML document D,
and a current node n, the current subtree of n, is the
subtree of Dc (N ow) with root n.
In the remainder of the paper, for the sake of simplicity, we will consider only containment edges, although all
the concepts can be extended to consider also reference
edges.
Definition 5 (Continuous Path and Maximal Continuous Path) A continuous path (cp) with interval T
from node n1 to node nk in a temporal document graph
is a sequence (n1 , . . . , nk , T ) of k nodes and an interval T
such that there is a sequence of containment edges of the
form e1 (n1 , n2 ,T
T1 ), e2 (n2 , n3 , T2 ), . . . , ek (nk−1 , nk , Tk ),
such that T = i=1,k Ti . We say there is a maximal continuous path (mcp) with interval T from node n1 to node
nk if T is the union of a maximal set of consecutive intervals Ti such that there is a continuous path from n1
to nk with interval Ti .
Example 2 Consider Figure 3. There is only one mcp
from node team(t1) to goals(g3), with interval [99, 02].
There are 2 mcp’s from node team(t1) to player(p1),
with intervals [01, N ow] and [95, 97]. There are 3 continuous paths from the root to player(p1), with intervals
[95, 97], [98, 00], and [01, N ow]; since these are consecutive, they produce a single mcp with interval [95, N ow].
An interesting property of mcp’s is that they can be
computed visiting each node only once. We will take advantage of this property for query processing (see Section
8). Let us consider two nodes n1 , nk . Let N be the set of
nodes ni,i6=1,i6=k such that there is a continuous path from
n1 to ni , with interval Tni , and there is a containment
edge from ni to nk , with label Tei . Thus, each continuous path from n1 to nk will have interval Ti = Tni ∩ Tei .
The union of the intervals of these continuous paths will
be the interval of the mcp between n1 and nk , if the
intervals are consecutive. This means that all mcp’s in
a graph can be computed visiting each node only once,
starting from the root. For example, in Figure 3, if we
know the interval of the mcp between f1 and p1 we can
8
Rizzolo, F. and Vaisman, A.
franchise(f1)
4 Consistency of Temporal XML Documents
[0,Now]
compute the mcp from f1 to g1 , without visiting the ancestors of p1 . In what follows, except when noted, we will
assume that all mcp’s are computed from the root.
Temporal XML documents, as defined in Section 3.2,
are subject to continuous updates, which will be studied
later in the paper. Such updates must take as input (and
return) a consistent XML document. More often than
not we will need to check if a temporal document is consistent or not, instead of working with documents built
from scratch using update operations. Thus, a study of
the cost of such operation is required together with efficient algorithms (not only for checking, but for fixing inconsistencies as well). We will first give consistency conditions for temporal XML documents based on the model
presented in the previous section; then, we will propose
algorithms for verifying them and give their complexity. In Section 9 we will see how this concepts interplay
with the update operators that modify a temporal XML
document. Definition 6 below, states the possible inconsistencies in a Temporal XML document.
Document order
Definition 6 (Inconsistencies in a Temporal XML
Document) The following are the inconsistencies that
may violate the conditions stated in Definition 3.
[0,Now]
team(t2)
team(t1)
[01,Now]
[98,00]
[95,97]
[98,Now]
[99,Now]
player (p2)
player
(p1)
[00,Now]
[95,99]
[99,02]
goals(g3)
goals(g1)
player (p3)
[98,01]
goals(g4)
goals(g2)
Fig. 3 Maximal Continuous Path
i. There is an outgoing containment edge whose temIn a non-temporal XML document there is a total order
poral label is outside the node’s lifespan.
between the nodes. A temporal document does not nec- ii. The temporal labels of the containment edges incomessarily impose a total order among its nodes, but for
ing to a node are not consecutive. Here, the inconsisany instant t there must be a total order, denoted <t ,
tency may be due to (a) a gap in the temporal labels
among the nodes of each snapshot D(t) of document D
of some incoming edges; or (b) an overlapping of the
at time t. In general, for any pair of nodes n1 and n2 ,
temporal labels of some incoming edges.
we may have n1 <t1 n2 , and n2 <t2 n1 , in two different iii. There is a cycle in some document’s snapshot.
instants t1 and t2 . However, we can show that there is iv. There exist more than one node with the same value
an interval during which the relative order between n1
for the ID attribute.
and n2 does not change. If T1 is the interval on a continuous path from the root to n1 , and similarly T2 for n2 ,
then the ordering between n1 and n2 is the same for any In what follows we will refer to these types of inconsisinstant t in the interval T1 ∩ T2 . This is formalized in the tencies as inconsistencies of type i, type ii, and so on.
following proposition.
We will not study ID attributes (we will limit ourselves
to temporal issues here). Thus, inconsistencies of type iv
will not be addressed.
Proposition 1 Let D be a temporal XML document; n1
and n2 two nodes in D; p1 = (r, . . . , n1 , T1 ) and p2 (r, . . . , Definition 7 (Interval of Inconsistency) Let I be
n2 , T2 ) two continuous paths to n1 and n2 with intervals one of the inconsistencies of Definition 6, the Interval
T1 and T2 , respectively; then, either n1 <t n2 for every of Inconsistency of I, denoted II , is the closed interval
t ∈ T1 ∩ T2 , or n2 <t n1 in every such t.
where the inconsistency occurs. The notion of interval
of inconsistency is local to I, meaning that there are as
Proof By definition of cp (Definition 5) and the third many II ’s in a document as inconsistencies occur in it.
condition of temporal XML document (Definition 3), we
know that p1 is the only path of containment edges to
n1 during interval T1 . (If there were another path of containment edges p′1 to n1 during any instant t in T1 , then
the subgraph composed by all containment edges would
not be a tree at instant t.) The same argument can be
made about p2 and n2 during interval T2 . In particular,
p1 and p2 are the only paths of containment edges reaching n1 and n2 respectively during T1 ∩ T2 . Thus, during
the entire interval T1 ∩ T2 , either n1 <t n2 or n2 <t n1 .
Example 3 Figures 4 (a) to (c) show examples of intervals of inconsistency for types i to iii, respectively. In
Figure 4(a) II = [T4 , N ow] (the temporal label of edge
outgoing from n1 lies outside the lifespan of the node); in
Figure 4 (b) II = [T2 , T4 ] (a gap in node n2 ); in Figure 4
(c) there is a cycle in every snapshot within the interval
II = [T4 , T6 ].
Given that computing II is, typically, an expensive
operation, we have decided to divide the process in two
Temporal XML
9
[0,Now]
[0,Now]
Algorithm 1 (Computing the Lifespan of a Node)
root
root
root
INPUT: A node n
OUTPUT: I = [F ROM, T O]; Time interval of the lifespan
of the node, or null if I cannot be computed.
[t2,t6]
[0,t3]
n3
n1
n1
[t2,t3]
n1
[0,t1]
[t5,Now]
n2
[t1,Now]
[t4,t6]
n2
n2
(a)
[t2,t6]
n3
(b)
(c)
Fig. 4 (a) Inconsistency of type i; (b) Inconsistency of type
ii; (c) Inconsistency of type iii
parts: (a) check if the document presents an inconsistency; (b) fix the inconsistency (for this, computing the
inconsistency interval is necessary). The user will decide
if she wants to execute part (b).
4.1 Checking Consistency
In this section we will study the complexity of checking
consistency in Temporal XML documents, and give an
algorithm for such task.
Throughout this section we will use the following notion of order: given two intervals T1 and T2 , if T1 .T O >
T2 .T O we will say that T1 succeeds T2 , denoted T1 ≻ T2 .
Analogously, if T1 .T O < T2 .T O, we say that T1 precedes
T2 , denoted T1 ≺ T2 .
Our algorithm for checking consistency will use the
following proposition.
Proposition 2 Let D be a Temporal XML document
where every node has at most one incoming containment
edge in every time instant t; if there is a cycle in some
interval II in D, then, there exists a node ni such that
Tmcp (ni ) 6= lif espan(ni ), where Tmcp (ni ) is the temporal
interval of the mcp between the root and node ni .
Proof Assume that there is a cycle in document D during
an interval II . Let ni be a node belonging to such cycle.
Thus, by definition, we know that lif espan(ni ) ⊃ II ;
however, Tmcp (ni ) ∩ II = φ; if this were not the case,
there would be some t such that a path between the root
and the node exists, and there cannot exist a node with
more than one parent at any instant t.
We will use this property to check consistency condition iii. If the property does not hold (assuming that
there are no inconsistencies of other types), then, there
is a cycle in the document. Algorithm 1 computes the
lifespan of a node.
TimeInterval lifespan(node n){
(1) Initialize a list of temporal labels L to null;
(2) I = null;
(3) for each edge e incident to n with label Te do
(4)
Append (Te ) to L;
(5) Sort L; //using the order relation defined above
(6) I = L[1];
(7) for each i between 1 and length(L) − 1 do
(8)
if L[i].T O + 1 6= L[i + 1].F ROM then
(9)
Return null;
(10)
I = I ∪ L[i + 1];
(11) end for;
(12) Return I }
It can be shown that the lifespan of a node can be
computed with complexity
O(2degin (n) + degin (n) ∗ log(degin (n)))
= O(degin (n)(log(degin (n)) + 2))
≈ O(degin (n) ∗ log(degin (n)))
where degin (n) is the number of edges incident to n.
In the worst case (this considers the case in which all
edges are incident to the node), the order of the algorithm is O(|E| ∗ log(|E|)); in the average case (all nodes
|E|
have the same number of incoming edges, i.e. |V
| ), this
|E|
|E|
reduces to O( |V
| ∗ log( |V | )). In the best case (when each
node has only one incoming edge) the lifespan is computed in constant time.
The following algorithm checks inconsistencies of types
i and ii.
Algorithm 2 (Checks Inconsistencies of Types i and ii)
INPUT: A temporal XML document D
OUTPUT: True if D has no inconsistencies of types i and ii;
False otherwise.
boolean checkNodeConsistency(document D) {
(1) for each node n in D do
(2)
I = lifespan(n);
(3)
if is null(I) and n is not the root then
(4)
Return F alse;
(5)
for each edge e outgoing from n do
(6)
if Te is not in I then
(7)
Return F alse;
(8)
end for;
(9) end for;
(10)Return T rue;}
Lines 5 to 7 check inconsistencies of type i, line 3
checks the occurrence of inconsistencies of type ii (if the
intervals of the incoming edges are not consecutive Algorithm 1 returns null).
We can see that the main loop iterates at most |V |
times (the number of nodes in the document). Lines 2
10
Rizzolo, F. and Vaisman, A.
and 3 check inconsistencies of type i, computing the lifespan of n, with order O(deg
P in (n) ∗ log(degin (n))), as explained above. In total, n∈|V | degin (n) ∗ log(degin (n)).
For the average case, degin (n) equals
|E|
|V | ,
yielding: |E| ∗
|E|
log( |V
| ).
Lines 5 to 7 compose a loop that repeats for
each edge outgoing from a visited node, performing operations of constant order. This, for the average case,
|E|
results in complexity O(|V | ∗ |V
| ). The algorithm’s order
|E|
is then: O(E ∗ log( |V
| ) + 1).
The next algorithm checks for cycles in a temporal
labeled graph.
Algorithm 3 (Finds Cycles in a Document)
INPUT: A temporal XML document D, such that
checkNodeConsistency(D) = true
OUTPUT: True if D has no cycles, otherwise False
boolean checkCycles(D){
(1) Queue nodes = getRoot(D) (a queue,
initialized with the root of D)
(2) Queue nodesWait = [ ] (empty queue of nodes)
(3) Set traversed(e), usable(e), visited(n) and
ended(n) to F alse, for all edges e and nodes
n in D.
(4) while !empty(nodes) do
(5)
n = f irst(nodes)
(6)
if !ended(n) then
(7)
labelList = [Te where e is an edge
incoming to n and !traversed(e)]
(8)
for each e outgoing from n do
(9)
if !traversed(e) then
(10)
if Te ∩ Te′ 6= φ for some e′ in labelList then
(11)
usable(e) = F alse
(12)
else
(13)
usable(e) = T rue
(14)
end if;
(15)
end for;
(16)
end if;
(17)
for each edge e(n, nf ) and !traversed(e) do
(18)
if (usable(e)||ended(n)) then
(19)
traversed(e) = T rue
(20)
if traversed(e) = T rue, ∀e incoming to nf
then
(21)
Append nf to nodes
(22)
ended(nf ) = T rue
(23)
else
(24)
if !visited(nf ) then
(25)
Append nf to nodesW ait
(26)
visited(nf ) = T rue
(27)
end if;
(28)
end if;
(29)
end if;
(30)
end for;
(31)
if empty(nodes) and !empty(nodesW ait) then
(32)
n = F irst(nodesW ait)
(33)
Append n to nodes
(34)
visited(n) = F alse
(35)
end if;
(36) end while;
(37) for each node n in D do
(38)
if !ended(n)
(39)
Return F alse;
(40) end for
(41) Return T rue }
In Algorithm 3, functions traversed(e) and ended(n)
apply to edges and nodes, respectively. A node is ended
when all its incoming edges have been traversed. The
intuition behind this notion is that, since we are treating
isolated inconsistencies, all the edges outgoing from an
ended node are usable (i.e., can be traversed). There are
two queues: (a) nodes, which holds all nodes such that
all of their incoming edges have already been traversed,
and (b) nodesWait holding the nodes that have been
visited but have incoming edges not yet traversed (the
visited function is used to indicate this). Note that if
the document is a tree, the latter queue will always be
empty. If the two queues are empty, and all nodes are
ended, the document contains no cycle. Conversely, if
unended nodes remain, there is a cycle in the document.
Example 4 Let us suppose a graph with nodes root, n1
and n2 . The edges are e1 (root, n1 , [T1 , T2 ]), e2 (n1 , n2 ,
[T2 , T4 ]), and e3 (n2 , n1 , [T 3, T 4]). Clearly, there is a cycle in [T3 , T4 ] between n1 and n2 . When the algorithm
reaches n1 in [T1 , T2 ], the node is stored in nodesWait
since n1 is not ended. Then, as the queue nodes becomes empty, n1 is removed from nodesWait and added
to nodes. Also, visited(n1 ) is set to False. However, when
the usability of the edge is checked in line (10) of the algorithm, there is a non-empty intersection between the
temporal labels, and the edge e3 is not “usable”. Thus,
there are no more edges to traverse, and remaining unended nodes exist. Then, the document must contain a
cycle.
Theorem 1 Algorithm 3 finds all temporal cycles in the
graph, does not loop indefinitely, and only returns False
if it finds a cycle.
Proof 1. The algorithm has a finite number of loops. In
each main loop the algorithm visits only edges (not
yet traversed) outgoing from a node, and adds to a
list the nodes such edges are incident to. When all
possible edges have been traversed, no node will be
added, and the algorithm will stop.
2. a) The algorithm finds all cycles in the graph. Let
us suppose there a cycle in the graph and it is
not detected by the algorithm (i.e., T rue is returned). Let n1 . . . nk be the nodes in the cycle
and T the cycle’s interval. Since the algorithm returned T rue, all nodes were “ended”, including
n1 . . . nk , meaning that these nodes were visited
in T (because, in order to be ended, all incoming
edges must be traversed in the whole interval).
Let n1 be the first node visited in T ; it should
have been reached from a node n whose incoming edges in T were already traversed. As n1 was
the first one to be visited in such interval, it follows that n does not belong to the cycle; thus, n1
has a parent outside T and another one inside it,
which is a contradiction, because of the precondition stating that no inconsistencies of other kind
pre-existed in the document.
Temporal XML
b) The algorithm returns F alse only if there is a cycle. Let us suppose there is at least one node n1
such that ended(n1 ) = F alse and no cycle was
found. As there cannot be inconsistencies of type
i, all edges incoming to n1 have a temporal label inside the lifespan of the starting node. Let
e1 (n1 , n2 , TT ) be an untraversed edge incoming to
n1 ; then, n2 must be not ended too (i.e., there is
at least one not visited incoming edge within T ).
Thus, there is a path of unended nodes in T ′ ∩ T ;
however, as there are no cycles, all nodes appear
only once in the path. Thus, when the algorithm
reaches the root (by definition there are no edges
incoming to the root), all of its outgoing edges
must have been traversed. This is a contradiction,
given that, either there are no “not ended” nodes,
or there is a cycle in the document.
We now study the complexity of Algorithm 3. Each
node can be visited more than once, depending on the
number of incoming edges. The best case is the one where
no temporal cycles exist. In this case, lines 6 to 11 will
never be executed. Lines 5 and 6 are of constant order, and the loop in line 17 is executed degout (n) times;
all operations are of constant order, resulting in order
degout (n). Lines 31 to 34 are also of constant order. The
final loop is performed |v| times in the worst case, and
the
P operations are of constant order. Finally, we have:
n∈|V | degout (n) + |V | ≈ O(|E| + |V |).
4.2 Fixing Inconsistent Documents
In the previous section we provided efficient algorithms
that allow the user to quickly check if the document
presents inconsistencies. In this section we will discuss
how we can correct these inconsistencies. For each kind
of inconsistency (of types i, ii and iii), we study possible
fixing procedures. Of course, there are semantic implications for each of the solutions proposed here that the user
must be aware of. If these implications are unacceptable
for the user, she may just choose dropping the document
instead of fixing it. We will study isolated inconsistencies,
that is, we assume that everything happens as if the inconsistency under study is the only one in the document.
The following definitions will be used in the remainder
of this section.
Definition 8 (Deleting Edges in Temporal XML)
Let D be a Temporal XML document, e be a containment edge of the form e(n, m, Te ) and let I = [I.F ROM,
I.T O] be a temporal interval. The deletion of e in the interval I is defined as follows:
1. If I.F ROM ≤ Te .F ROM ≤ Te .T O ≤ I.T O, then
physically delete e.
2. If Te .F ROM < I.F ROM ≤ Te .T O ≤ I.T O, then
make Te .T O = I.F ROM − 1.
11
r
r
[35,Now]
[0,Now]
[0,34]
n
[0,30]
nc
n
[20,Now]
[0,30]
[20,34]
[36,Now]
n1
n2
n1
n2
Fig. 5 Deleting the edge (n, n2 , [20, N ow]) at t=35, using
node duplication
3. If I.F ROM ≤ Te .F ROM ≤ I.T O < Te .T O, then
make Te .F ROM = I.T O + 1.
4. If Te .F ROM < I.F ROM ≤ I.T O < Te .T O, then
(a) Create a new node nc
(b) Replace Te in e by [Te .F ROM, I.F ROM − 1] and
create a new edge e′ (nc , m, [I.T O + 1, Te .T O]).
(c) Remove every edge ej (n, nj , Tej ) outgoing from n
for which I.F ROM ≤ Tej .F ROM and create a
new edge e′j (nc , nj , Tej ).
(d) For every edge ej (n, nj , Tej ) outgoing from n for
which Tej .F ROM < I.F ROM ≤ Tej .T O replace
Tej in ej by [Tej .F ROM, I.F ROM −1] and create
a new edge e′j (nc , nj , [I.F ROM, Tej .T O).
(e) Remove every edge ei (ni , n, Tei ) incident to n for
which I.F ROM ≤ Tei .F ROM and create a new
edge e′i (ni , nc , Tei ).
(f) For every edge ei (ni , n, Tei ) incident to n such
that Tei .F ROM < I.F ROM ≤ Tei .T O replace
Tei in ei by [Tei .F ROM, I.F ROM − 1] and create a new edge e′i (ni , nc , [I.F ROM, Tei .T O]).
Note that step 4 in Definition 8 performs a duplication of the node from which the deleted edge outgoes.
The next example illustrates the situation.
Example 5 Figure 5 shows a deletion of edge e(n, n2 ,
[20, N ow]) at instant t = 35. Since Te .F ROM < 35 <
Te .T O, we performed node duplication (following step
4 in Definition 8) as follows: we created a copy of n,
denoted nc , and the edge (nc , n2 , [36, N ow]); we also replaced (n, n2 , [20, N ow]) by (n, n2 , [20, 34]) (step 4(b)).
According to step 4 (f), the edge (r, n, [0, N ow]) has been
replaced by (r, n, [0, 34]), and a new edge (r, nc , [35, N ow])
was created.
Definition 9 (Temporal Label Expansion and Reduction)
Given a containment edge e(ni , nj , Te ), an expansion
of Te to an instant t is performed making Te .T O = t, if
t > Te .T O, and Te .F ROM = t, if t < Te .F ROM .
Reducing the temporal label Te to an interval T ′ ⊂ Te
implies deleting e in the intervals [Te .F ROM, T ′ .F ROM −
1], [T ′ .T O + 1, Te .T O].
12
Rizzolo, F. and Vaisman, A.
root
root
root
root
[0,Now]
[0,Now]
[0,Now]
[0,Now]
[0,t1]
[0,t1]
n2
[0,t1]
n2
[0,t1]
n2
n1
n1
[t2,t5]
n2
[t2,Now]
[t2,t10]
[t2,t10]
[0,t5]
[t6,Now]
[t6,Now]
n1
n1
[0,t5]
Fig. 6 Original graph, and graph after expansion at t10
Figure 6 shows an example of expansion for the temporal label of the edge e(n2 , n1 , [t2 , t5 ]) to instant t10 . In
this case, t10 became the rightmost boundary of Te of
the edge between n2 and n1 .
n3
n3
Fig. 7 Example of inconsistency of type i and solution by
expansion
has temporal label [t6 , N ow], and the lifespan of n1 is
[0, t10 ]). Then, II = [t11 , N ow]. The right hand side of
the picture shows the solution, expanding the youngest
Definition 10 (Youngest (Oldest) Incoming Edge) edge incoming to n1 i.e., e(n2 , n1 , T ).
We will denote youngest edge incoming to a node n,
Note that even though in Example 6 we only exye (n), an edge whose temporal label is the largest (acpanded
one temporal label, this may not be the usual
cording to the notation above) among all the temporal
case:
the
modified interval may fall outside the lifeslabels of the edges incoming to n. Analogously the oldest
pan
of
the
origin node of the inconsistent edge. Thus,
edge incoming to a node n, oe (n), is an edge whose temthe
inconsistency
may recursively propagate upward in
poral label precedes all the labels of the edges incoming
the
path,
until
a
consistent state is reached. To make
to n.
these concepts more formal, we define the concepts of
path of youngest parents and path of oldest parents. We
Inconsistencies of Type i
will generically denote these paths expansion paths.
In this case, the temporal label of an edge outgoing from
a node is outside the lifespan of the node. We will say
that an edge e is inconsistent if its temporal label is outside the lifespan of the inconsistent node (i.e. the origin
node of e). For inconsistencies of type i, the interval of
inconsistency II is the maximum interval within the temporal label of e that is not included in the lifespan of the
inconsistent node. Note that II could actually be a set
of intervals (for instance, if the lifespan of the inconsistent node is properly included in the temporal label of
the inconsistent edge). In this section we will study the
problems introduced by an inconsistent edge. We study
two ways of fixing the problem: (a) Correction by expansion (expanding the lifespan of the inconsistent node);
(b) Correction by reduction (reduces the temporal label
of the inconsistent edge, closing II ).
(a) Correction by expansion. In this solution, we expand the inconsistent node’s lifespan until it covers the
violating interval. We may take the youngest or oldest
incoming edge, and modify its temporal label in a way
such that it covers the whole label of the inconsistent
edge. If II ≻ lif espan(n) then we must consider ye (n);
if lif espan(n) ≻ II , we must consider oe (n).
Example 6 Figure 7 shows that node n1 presents an inconsistency of type i (the youngest edge incoming to n3
Definition 11 (Expansion Paths) We call youngest
parent of a node n, the origin node of ye (n). Analogously, we denote oldest parent of a node the origin node
of oe (n). A path of oldest (youngest) parents between two
nodes ni , nj is a path where each node is the youngest
(oldest) parent of the next node in the path. We denote
these paths expansion paths.
It can be shown that all sub-paths of a path of youngest
(oldest) parents are also paths of youngest (oldest) parents.
Example 7 For the graph in the right hand side of Figure
7, the path of youngest parents for node n3 is (n3 , n1 ,
n2 , root). The path of oldest parents for the same node
is (n3 , n2 , root).
The problem with the solution by expansion is twofold:
on the one hand, we do not really know if the containment relation actually existed in the new interval. An
expert user (or curator) will be needed to define this.
On the other hand, the expansion may introduce a cycle
(i.e., an inconsistency of type iii). In this case, expansion
will not be a possible solution. We characterize the latter
situation defining the Instant of Maximal Path Expansion (IMPE). The idea is that if we expand the interval
beyond the IMPE, a cycle will be produced.
Temporal XML
13
Theorem 2 Let D be a document with an inconsistence
of type i in a node n. Then, the IMPE is the maximum
instant to which an edge interval in an expansion path
can be expanded without introducing a cycle in the document.
root
[0,Now]
[0,Now]
n3
n1
[18,19]
[0,24]
[0,19]
[20,21]
n5
n4
[20,21]
I
[20,21]
[25,30]
[25,30]
n2
n6
Fig. 8 Instant of Maximal Path Expansion
Definition 12 (Instant of Maximal Path Expansion (IMPE))
Let P = (n1 , n2 , . . . nf ) be an expansion path between n1
and nf , and let ei (ni , ni+1 , Ti ), with 1 ≤ i ≤ f −1, be the
edges in this path. Let m = min{T1 .T O, . . . , Tf −1 .T O}
and M = max{T1 .F ROM, . . . , Tf −1 .F ROM }. Let L =
[(nf , . . . , n1 , T1′ ), . . ., (nf , . . . , nf −1 , Tf′ −1 ) be a list of mcps
such that (nf , . . . , nk , Tk′ ) is an mcp from node nf to
node nk , with 1 ≤ k ≤ f − 1.
We define the Instant of Maximal Path Expansion of
P , denoted IM P E(P ) as:
IMPE(P) =
the maximum instant t such that
t ≥ m and [m, t] ∩ Tj′ = φ ∀j ∈ 1..f − 1,
if P is a path of youngest parents.
minimum instant t such that
t ≤ M and [t, M ] ∩ Tj′ = φ ∀j ∈ 1..f − 1,
if P is a path of oldest parents.
The intuition behind this definition is that, in the
case of a path of youngest parents for instance, the IMPE
of a path P = (n1 , ..., nf ) is an instant greater than the
minimum ending point of the intervals in an expansion
path, and less than the starting point of the intervals of
all mcps starting from a node reachable from nf , and
ending at node n1 .
Example 8 Figure 8 shows a graph with the expansion
path (actually a path of youngest parents) (n2 , n4 , n5 ).
Node I violates consistency condition of type i. A solution for this could be to expand the lifespan of I. In this
case, IM P E(n2 , n4 , n5 ) = 24, because t = 24 is greater
than the minimum ending time of the intervals in the
expansion path, and less than the interval of the mcp
between I and n2 . Thus, expanding to t = 25 would introduce a cycle. Then, the inconsistency cannot be solved
by means of lifespan expansion.
Proof We will study the case of a path of youngest parents. (The case of a path of oldest parents is analogous.) Let us assume that we expand an interval until the
IMPE, and a cycle is generated. Then, this implies that
there is a path in some instant t ∈ [min(Ti .T O), IM P E]
(see Definition 12), between (a) nf and ni for some ni
in the path of youngest parents; (b) nj and ni for some
ni , nj in the path of youngest parents. In case (a), this
would imply t ∈ Tki , for some mcp(nf , nk , Tki ), contradicting the definition of IMPE. In case (b), before the
expansion, ni and nj were consistent before the expansion, thus, t ∈ lif espan(ni ) ∧ t ∈ lif espan(nj ) holds,
implying that the cycle was pre-existent.
(b) Correction by reduction. The main idea of this solution is to modify the temporal label of the inconsistent
edge, in a way such that it lies within the lifespan of the
starting node of such edge. It may even be necessary to
delete this edge if its temporal label does not intersect
the lifespan of the inconsistent node. Although not cycles can be introduced by this solution, it may introduce
new inconsistencies of type i in the ending node of the
modified edge if this node has outgoing edges that cover
the interval that has to be reduced; moreover, inconsistencies of type ii may also be introduced if the deleted
interval was not in one of the lifespan’s extremes.
The algorithm for this solution proceeds as follows:
it first deletes the edge in the interval of inconsistency.
Then, it visits the node at the end of this edge and
repeats the process until a consistent document is obtained. The number
P of iterations required by this solution is given by: n∈V degout (n) ≈ O(|E|).
Finally, in the worst case, inconsistencies of type ii
must be fixed (with order |E|2 , see below), yielding an
order of O(|E| + |E|2 ) ≈ O(|E|2 ).
Example 9 Figures 9 (a) and (b) show a graph where
the correction by reduction approach generates new inconsistencies of type i and ii. In Figure 9 (a), reducing
to [20,50] the interval of the edge (n2 , n3 ) introduces a
gap in node n3 . In Figure 9 (b), the same correction will
make the temporal label of the edge (n3 , n4 ) lie outside
the lifespan of node n3 .
(c) Expansion vs. Reduction The discussion above
showed that both fixing procedures, i.e., correction by
expansion or correction by reduction may propagate upward or downward in cascade, respectively. For example,
in the case of Figure 7, assume that the label of the
edge (root, n2 ) is [0, t10 ] instead of [0, N ow]. Fixing by
expansion the inconsistency over n1 as explained in Example 6, would propagate the inconsistency to node n2 .
14
Rizzolo, F. and Vaisman, A.
[16,35]
[16,35]
n1
n2
n1
n2
[1,6]
[9,16]
[1,6]
[9,16]
n1
n1
[31,50]
[0,30]
[0,30]
[31,50]
n2
n2
[20,60]
[20,60]
n
n
nc
[61,Now]
n3
n3
[30,40]
[1,4]
[10,14]
[1,4]
[10,14]
[20,60]
n4
n4
n3
(a)
n4
n3
n4
(b)
Fig. 9 Correction by reduction
Fig. 10 Node duplication for fixing a gap inconsistency.
On the other hand, a correction by reduction may propagate downward and also introduce gaps (inconsistencies
of type ii), as Example 9 shows. In order to compare options (a) and (b), a simple metric could be used, namely
the number of changes needed to fix the problem, where
a change could be: (a) the expansion of an interval; (b)
the reduction of an interval; (c) the duplication of a node;
(d) the deletion of an edge.
The algorithm for fixing inconsistencies of type ii visits all the document’s nodes looking for gaps or overlapping. If an overlapping is found, one of the edges involved is deleted in the interval where the overlapping
is produced. If a gap if found, the algorithm performing node duplication is called. Each time a node n is
visited, the calling to the node duplication algorithm is
performed degin (n) times. This gives the algorithm an
order of O(|E|)2 .
Inconsistencies of Type ii
Inconsistencies of Type iii
As we already explained, these kinds of inconsistencies
occur when some edges incoming to a node are not consecutive. This may be caused by: (a) overlapping of temporal labels, involving two or more of them; (b) the union
of the temporal labels of the edges incoming to a node
presents a temporal gap.
For fixing overlapping it suffices just to delete one of
the violating edges in the interval of inconsistency. Fixing
the gaps has more than one possible solution: (a) physically delete all incoming edges until the gap is closed; (b)
expand the temporal labels of the edges, in order to close
the gap (this could be performed expanding the temporal
labels of one or more of the edges involved); (c) treat the
inconsistency from a syntactic point of view, duplicating the violating node in a way such that the resulting
incoming and outgoing edges have consistent temporal
labels. This duplication is based on the same concepts
underlying the fourth step of Definition 8.
The first two options have the following problem:
they may introduce new inconsistencies of type i (for
example, if the violating node is n, and there is an edge
e(ni , n, Te ), and Te is expanded to Te′ , the latter label
may be outside the lifespan of ni . Thus, we think the
third option is the best one, if the node created is semantically equivalent and syntactically consistent. Figure 10 shows a gap inconsistency in node n fixed by node
duplication at time instant 9. Note that in this case, duplication eliminates the gap between the time label of
the edges incoming to n.
Inconsistencies of type iii involve cycles occurring in
some interval(s) of the document’s lifespan. In this case,
again, we have more than one possible way of fixing the
inconsistency, basically consisting in deleting (according
to Definition 8) edges within the cycle. We may:
– delete all containment edges involved in a cycle during the inconsistency interval II (i.e., in this case,
the interval when the cycle occurs). This can be performed (a) by deleting (within II ) all the subgraphs
with root in each of the nodes in the cycle; or (b) by
expanding the expansion path (see Definition 11) for
each node belonging to the cycle.
– delete (within the interval of inconsistency) one of
the edges in the cycle. Given that this would introduce an inconsistency of type i, this solution is only
possible if there is at least one node n in the cycle with more than one incoming containment edge
ec (ni , n, Te ) such that Te lies outside II . Thus, besides
deleting the edge, Te must be expanded in order to
prevent introducing a new inconsistence.
Example 10 Figure 11 shows the two alternatives for fixing an inconsistency of type iii. In Figure 11(b) all edges
involved in the cycle are deleted during II = [0, 15]. In
Figure 11(c), the cycle was eliminated by only deleting
the edge incoming to n1 in the interval [0, 15], and expanding the temporal label of the remaining edge incoming to n1 (i.e., the label is now [0, 35]), in order to avoid
an inconsistency of type i.
Temporal XML
15
[0,15]
n1
n1
n1
(15) end while
(16) Fix possible inconsistencies of type ii.
(17) return D }
[0,35]
[16,35]
[16,35]
[0,30]
[16,30]
n2
n2
[0,30]
n2
[0,20]
[16,20]
[0,20]
n3
n3
n3
(b)
(a)
(c)
Fig. 11 Fixing an inconsistency of type iii (cycle).
r
[0,Now]
[0,Now]
[0,Now]
n2
n1
[2,8]
[0,Now]
n3
[0,Now]
[9,14]
[0,1]
n6
n4
[15,19]
[20,Now]
Line (16) fixes all possible inconsistencies (basically
gaps) that could have been introduced by a sequence of
edge eliminations. Successive deletions of edges incoming
to the same node may cause more than one gap when the
labels of these edges were not at the beginning or the end
of the node’s lifespan. This would result in many node
duplications. Thus, we decided to postpone node duplication to the end of the algorithm, because if the edges
that are deleted have consecutive intervals the gaps could
be solved in one single step (i.e., with just one node duplication). Figure 12 shows an example of this: the regular procedure for deleting edges (n6 , n5 ), (n2 , n5 ) and
(n3 , n5 ) implies three node duplications, in that order.
Instead, if we just delete the edges and leave the action
of fixing the gaps to be performed at the end of the whole
process, we would have to perform just one node duplication and obtain the same end result (i.e., node n5 and
a copy of it, with intervals [0, 1] and [20, N ow] respectively). The algorithm has an order O(|E|2 ) due to this
last step.
The algorithm for eliminating a single edge in the
cycle essentially picks a node n in the cycle such that n
has at least another incoming edge with temporal label
not in the cycles’ interval Tc . The algorithm then deletes
the edge incoming to n in Tc and expands (if possible, i.e.,
using the notion of IMPE introduced above) the lifespan
of n including Tc in this lifespan, to avoid inconsistencies
of type i.
n5
Fig. 12 Commutativity of gap elimination
5 Model Implementation
Algorithm 4 performs cycle elimination by deleting
all the edges within the interval of inconsistency.
Algorithm 4 (Fixing an Inconsistency of Type iii)
INPUT: A document D, with a cycle C in an interval Tc .
OUTPUT: a legal temporal XML document
Fixcycle(D, C, Tc ) {
(1) nc = a node in C
(2) nodes stack = [nc ]
(3) while node stack is not empty do
(4)
n = node stack.pop()
(5)
visited(n) = T rue
(6)
for each edge e = (n, nd , Te ) outgoing
from n, Tc ∩ Te 6= φ do
(7)
delete e in Tc
(8)
if nd has no other incoming edges and Te ⊆ Tc
then
(9)
delete the subtree with root nd
(10)
else
(11)
if !visited(nd )
(12)
node stack.push(nd );
(13)
end if
(14)
end for
The abstract temporal model introduced in Section 3.2
can be encoded into a concrete XML document in many
ways. We distinguish between non-replicated representations, where each node of the graph is represented by
a single XML element or attribute, and replicated representations, where a node is represented by multiple
elements or attributes. In the non-replicated representations, the nesting relationship of the resulting document is used to encode the “oldest” containment edges,
while the remaining containment edges are represented
by references. In the top-down version, the references
go from parent to child, while in the bottom-up version
they go from child to parent. Experiments we performed
showed that the non-replicated representation outperforms the other ones in terms of space. Moreover, the
replicated representations have some semantic issues we
will briefly discuss. Thus, we will focus on the top-down
non-replicated representation, which we will describe in
detail in this section. For completeness of analysis we
will give a quick idea of the other ones.
16
<NBAdb>
<franchise ID="1" [0,Now]>
<name [0,Now]> Raptors </name>
<team [0,Now]>
<player [0,20]>
<name [0,20]> Oakley </name>
</player>
...
<player [0,20]ID="16">
<name [0,Now] > McGrady </name>
<stats [0,Now]>
<goals[0,Now]>11</goals>
</stats>
</player>
....
<franchise ID="2" [0,Now]>
<name [0,Now]> Magic </name>
<player [21,Now] IN ="16"/>
...
Rizzolo, F. and Vaisman, A.
NBAdb
0
2
1
4
5
5.1 Non-Replicated Representations
The non-replicated representation comes in two flavors:
(a) Top down, and (b) Bottom-up.
(a) Top-down. The root of the graph maps to the root
element of the document. For each element node there
will be an element in the document, tagged with the
label of the node. If the element node has a containment
edge to a value node, the corresponding value is included
in the element. For each attribute node there will be
an attribute in the document, and its value will be the
unique value node associated to the attribute node. If
the attribute is of type REF, the value of the attribute
will be the ID of the node being referenced.
Let e(ni , nj , Te ) be one of the containment edges incoming to a node nj . The element elemni representing
ni in the XML document will physically include the element elemnj , tagged with the interval Te . Thus, there
will be only one element representing nj in the document.
For each node nk in the remaining edges e(nk , nj , Tek )
incoming to nj , a distinguished reference attribute denoted IN with the value of the ID in elemnj and label
Tek will be placed in the element elemnk .
The containment edges to be physically encoded in
the XML document can be selected in many different
ways. In general, we can chose a time instant t and for
each containment edge e(ni , nj , Te ) such that t ∈ Te ,
physically include nj in ni (this is equivalent to taking a snapshot of the graph at time t and generate the
XML document representing this snapshot); other containment edges incoming to nj (if they exist) will be referenced as explained above. All nodes nj such that t is
not included in Te , must be added afterward. As another
alternative, we could take a different time instant tj for
each node nj and physically include nj in ni if there is
a containment edge e(ni , nj , Te ) and tj ∈ Te . Following
this approach, in the work presented here we physically
encoded the “oldest” containment edges.
[0,22]
[23,Now]
[0,20]
player
6
player
player
10
player
14
14’
Raptors
7
[23,Now]
name
8
name
31
stats
30’
name
[23,Now]
stats
[0,22]
30 name
last
11
goals
15
[0,22]
13
13’
[0,22]
[23,Now]
9
Carter
Fig. 13 Top-down non-replicated representation
franchise
team
name
Oakley
3
franchise
franchise
12
12’
goals
Williams
goals
Williams
12
10
Fig. 14 Portion of the NBA database with duplicated nodes
Example 11 For the sake of clarity, in the following examples we will use a simplified syntax for the XML documents resulting from the various mappings. For instance,
we use the notation <franchise ID=‘1’[0,N ow]> to mean
that the time interval associated with this element is
[0,Now]. (Note that we use integers to represent time
points instead of actual date/time values, also for simplicity). In an actual implementation, we define a namespace and create three new attributes: ‘FROM’(the starting point of the interval), ‘TO’(the ending point of the
interval), and ‘IN’ (the reference to a contained element).
In Section 5.4 we describe the implementation of temporal features in more detail.
Figure 13 depicts a portion of the document resulting
from mapping the graph in Figure 1. Here, attributes of
ID type have no temporal tag. Let us consider the second player element, with temporal interval [0,20]. The
“oldest” containment edge approach has been chosen,
resulting in the inclusion of this player in the <team>
element corresponding to the Toronto Raptors. The construct <player[21,N ow] IN=‘16’/> represents a current containment edge going from node ‘2’ to node ‘16’
with time label [21,N ow]. This means that the information about this player is physically encoded in the
element with ID = ‘16’.
Bottom-up. In the top-down representation we picked
the oldest containment edge to be represented by physical inclusion, while the others were represented by references from parent to child. We could instead have the
references going from child to parent. For example, instead of placing a reference to node ‘16’ between instants
‘21’ and ‘Now’, we place a reference from the player to his
current franchise. The resulting document is analogous
to the one obtained adopting the top-down alternative.
Temporal XML
5.2 Node-Replicating Representation
A third alternative to the representation described above
avoids using the ‘IN’ reference. This implementation requires transforming the original graph into a tree of containment edges. This is performed, in short, recursively
creating k copies of each node n with k (k > 1) incoming containment edges. This process is similar to the one
described in Definition 8 and transforms the temporal
XML document into a tree of containment edges. Figure
14 shows a portion of the graph for the NBA database
with node replication, where the node for player with
ID=‘14’ has been duplicated, denoting ‘14′ ’ the new ID.
As a convention, for each node with node number n that
is duplicated, we denote its new versions n′ . Note that
the lifespan of node 12 (the interval [0, N ow]) has been
split into intervals [0, 22] and [23, N ow]. This shows the
biggest weakness of this approach: node replication is
not appropriate when the value nodes contain data that
aggregates over time. In Figure 14, we have assigned values 12 and 10 to nodes 12 and is replica, respectively,
assuming that the value associated to the original node
(22 goals in this case) is partitioned proportionally to
the lifespan of the nodes involved in the replication.
5.3 Node-Edge Representation
A fourth way of implementing a temporal XML document is to store the edges and nodes of the graph in a
way similar to the edge XML-to-relational mapping [21].
The idea is to list the nodes and the edges in the graph,
using attributes for their validity intervals and other features like references or attributes. For instance, there are
two elements, NODE and EDGE, that define the nodes and
the edges respectively. Additionally, there are two attributes, Origin and End (defined within a namespace),
that represent the node numbers that are the endpoints
of each edge. Finally, a Type attribute defines the type
of the node being represented (i.e. element, attribute or
value nodes).
5.4 Implementation of Temporal Attributes
As we commented above, the syntax for intervals and
distinguished references introduced in the temporal document was simplified for the sake of the paper’s clarity.
In a real implementation, we need to define a namespace
and create three new attributes: ‘FROM’(the starting
point of the interval),‘TO’(the ending point of the interval), and ‘IN’ (the reference to a contained element). We
denote this namespace ‘Time’, and its associated URI
is defined as ‘http://www.cs.toronto.edu/ db/time’. Attributes ‘Time:FROM’ and ‘Time:TO’ introduce potential attribute duplication in a tag. Thus, we adopted the
solution explained in Section 3. Figure 15 shows an example. Note that for references of type IN, we defined
17
<NBAdb xmlns:Time="hhtp://www.cs.toronto.edu/db/time">
<franchise ID="1" Time:FROM="1999-01-01"
Time:TO="Now">
...
<team Time:FROM="1999-01-01" Time:TO="Now">
<player Time:FROM="1999-01-01"
Time:TO="2001-06-01" Time:IN="7">
...
<player Time:FROM="1999-01-01"
Time:TO="2000-12-31" Time:IN="3">
</player>
<ATTRIBUTES>
<day-of-birth Time:FROM="2002-01-01"
Time:TO="Now" $>$ "5-24-79" </day-of-birth>
</ATTRIBUTES>
<assists Time:FROM="1999-01-01"
Time:TO="2000-05-31">6.5</assists>
<assists Time:FROM="1999-01-01"
Time:TO="2000-05-31">6.5</assists>
...
Fig. 15 Implementation of temporal attributes
a special attribute ‘Time:IN’, included in the tags of the
types of the element being referenced (see for example,
tags <team> and <player>).
5.5 Snapshots
In temporal relational databases it is often relevant to
compute snapshots of the data. In temporal XML we
would like to be able to reconstruct a document as of a
given time instant. We call this a document snapshot. In
Section 6 we will distinguish this concept from the notion
of snapshot query. In this section we briefly show how to
compute a document snapshot (at time t) of a temporal document implemented as described in Sections 5.1
(using the top-down alternative) and 5.2.
Snapshots in a Non-Replicated Implementation The following procedure computes a document snapshot as of
a time instant t, when the temporal document is implemented using the top-down non-replicated implementation.
– There is a non-annotated tag T in D(t) for every tag
T in D annotated with a temporal element Te where:
(a) t ∈ Te ; (b) T is not contained in any tag T1 such
that there exists a distinguished reference of type IN
to T1 at time t (see Example 12).
– For every reference r annotated with a temporal element Te such that t ∈ Te included in an element satisfying condition (b) above, there is a non-annotated
reference r in D(t).
– For every attribute a=v (where v is the value associated to a) annotated with a temporal element Te such
that t ∈ Te included in an element satisfying condition (b) above, there is a non-annotated attribute a
in D(t).
18
<NBAdb>
<franchise ID="1">
<name> Raptors </name>
<team>
...
<franchise ID="2">
<name> Magic </name>
<player ID="16">
<name>McGrady</name>
<stats>
<goals>11</goals>
<stats>
</player>
...
Fig. 16 Snapshot of the document of Figure 13 at t=‘24’
– If within a tag T in D there is a tag T1 with a reference R of the form IN=v to an element with ID=v
(also with tag T1 ) such that for the temporal element labeling R, call it Tr , t ∈ Tr holds, include in
D(t) the complete element being referenced (i.e., T1 )
as the last subelement within tag T, excluding the
sub-elements with temporal labels Ti where t 6∈ Tr .
Finally, replace IN=v with ID=v in T1 .
– The former are the only transformation rules applying to the document.
Rizzolo, F. and Vaisman, A.
This is compensated by the size of the produced document, which, due to node duplication, is two to three
times larger than an equivalent document with no duplicates. As a result, computing a document snapshot
in both representations takes, on the average, approximately the same time. In Section 10 we provide experimental results on snapshot computation.
6 TXPath: a Temporal Extension to XPath
One of the motivations for proposing a temporal data
model is to support query languages that make complex
queries easy to express. For example, consider the query
“players who played for the Toronto Raptors continuously since at least the year 2000.” In this section we
introduce TXPath, a temporal query language that extends XPath 2.0 [60] with temporal operators in order to
enable this kind of query. As we only intend to show the
main ideas of this extension, we will not discuss details
or standard temporal database issues like temporal comparisons and granularity, that are treated in the usual
way.
6.1 Syntax and Semantics
Example 12 We will give an example of the procedure
above, using the document in Figure 13. When taking a In non-temporal XPath 2.0, the meaning of a path exsnapshot at time ‘24’, the tag <player[0,20] ID=‘16’...>
pression is the sequence of nodes, at the end of each path,
neither verifies condition (a), nor condition (b). The tag that matches the expression. In TXPath, the meaning is
<name[0,N ow]> McGrady </name> verifies condition a sequence of (node,interval) pairs such that the node has
(a) but not condition (b), which prevents its inclusion been at the end of a matching path continuously during
in the snapshot (inside the player tag). However, no- that interval (i.e., at the end of a continuous path).
tice that there is a tag <player[21, N ow] IN=‘16’...>
We stay as close as possible to the XPath syntax,
(T1 in the fourth step above), included in the franchise extending it with temporal operators. We specify the
tag with ID=‘2’ (tag T above). Thus, because of the TXPath semantics adapting the formal XPath semantics
fourth step of the algorithm, the snapshot will have an el- introduced by Wadler [54]. The meaning of an XPath exement <player ID=‘16’..>. Also, all the sub-elements pression is specified with respect to a context node; we
of <player[0,20] ID=‘16’..> will be included in the extend this to a context pair of a node and a time interval.
<franchise> tag (note that all the sub-elements are We define three semantic functions: S, Q and Q such
T
labeled [0, N ow] in the document, and that a snapshot that S[[p]]x denotes the sequence of pairs (node, interval)
does not have temporal labels). A portion of the resulting (or values, as we will see below) selected by pattern
snapshot is shown in Figure 16.
p when x is the context pair. The boolean expression
Snapshots in a Node-Replicating Implementation Taking a snapshot at time t of a temporal document implemented as described in Section 5.2 just requires scanning
the document and placing a tag for every tag in D annotated with a temporal element Te such that t ∈ Te .
Analogously, an attribute and/or reference must be created for each attribute and/or reference associated with
a temporal element including t.
The node-replicating implementation requires only
one pass through the document for computing a document snapshot, while the non-replicated implementation
requires at least two, the first one for finding the references, and the second one for generating the snapshot.
Q[[q]]x denotes whether or not the qualifier q is satisfied
when the context pair (node, interval) is x. Finally, another boolean expression QT [[qT ]]x denotes whether or
not a temporal condition qT is satisfied. For the sake of
brevity, in Figure 17 we only show the most common
TXPath constructs.
Example 13 The expression //player, applied to the document in Figure 3, will return the sequence (p1, [95, N ow]),
(p2, [99, N ow]), (p3, [98, N ow])).
With respect to our running example, the query “players who have played for the Toronto Raptors continuously since the year 2000” reads in TXPath:
//franchise[name=‘Raptors’]//player[@from≤2000
Temporal XML
S[[/p]]x
S[[//p]]x
S[[p1 /p2 ]]x
S[[p1 //p2 ]]x
S[[p[q]]]x
S[[n]]x
S[[@n]]x
S[[@from]]x
S[[@to]]x
S[[ p[qT ] ]]x
S[[ancestor :: p]]x
Q[[p = s]]x
Q[[p]]x
=
=
19
=
=
=
=
=
=
=
=
=
=
=
S[[p]]root(x) ;
{x2 | x1 ∈ subnodes(root(x)), x2 ∈ S[[p]]x1 };
{(v2 , I1 ∩ I2 )|(v1 , I1 ) ∈ S[[p1 ]]x, (v2 , I2 ) ∈ S[[p2 ]](v1 , I1 ) };
{x2 | x1 ∈ subnodes(x), x2 ∈ S[[p]]x1 };
{(v, I)|(v, I) ∈ S[[p]]x, Q[[q]](v, I) };
{(v, I) | isElement(v), child(x) = (v, I), name(v) = n };
{(v, I) | isAttribute(v), child(x) = (v, I), name(v) = n };
{f | (v, I) ∈ S[[p]]x, I = [f, t] };
{t | (v, I) ∈ S[[p]]x, I = [f, t] };
{(v, I) | (v, I) ∈ S[[p]]x, QT [[p]](v, I) };
{x2 | x1 ∈ prenodes(x), x2 ∈ S[[p]]x1 };
{(v, I) | (v, I) ∈ S[[p]]x, value(v) = s} 6= φ;
{x1 | x1 ∈ S[[p]]x} 6= φ;
QT [[d IN (@from,@to)]]x
QT [[(s, e) CONTAINS (from, @to)]]x
QT [[(s, e) MEETS (from, @to)]]x
QT [[ @from op d]]x
QT [[ @to op d]]x
=
=
=
=
=
{x
{x
{x
{x
{x
|
|
|
|
|
x = (v, [@from, @to]), d ≥ @from, d ≤ @to} 6= φ;
x = (v, [@from, @to]), s ≤ @from, e ≥ @to} 6= φ;
x = (v, [from, @to]), [from, @to] ∩ [s, e] 6= φ} 6= φ;
r ∈ S[[@from]]x, r op d} 6= φ;
r ∈ S[[@to]]x, r op d} 6= φ;
Where subnodes(y) = {(v, I) | there exists a maximal continuous path (mcp) from y to v with interval I}; prenodes(y) =
{(v, I) | there exists an mcp from v to y with interval I}; root(x) is the (root, interval) pair of the tree in which x is a
(node, interval) pair; child(x) = {(v, I) | there exists an mcp of length 1 from x to v with interval I}.
Fig. 17 Formal semantics of TXPath
and @to=‘Now’]
We use the XPath construct @from to refer to the
starting point of the interval associated with each node
in the answer, and similarly for @to.
6.2 TXPath by Example
We will briefly present and discuss the main features of
TXPath, in order to give the idea of the kinds of queries
that can be supported.
Coalescing Sequences In temporal queries it is often useful to coalesce sets of overlapping intervals. We define the
coalesce operation over a sequence of pairs (value(node),
interval), where value(node) stands for the value associated to a value node, to generate a new sequence where
all maximal sets of overlapping intervals are coalesced
into single intervals when the values are the same. For example, given a sequence S=((2,[1,5]), (2,[3,8]), (4,[12,16]),
(4,[14,18])), coalesce(S) returns the sequence ((2,[1,8]),
(4,[12,18])). Given an arbitrary sequence of pairs, we
extend the XPath distinct-values operator to group
all pairs that have the same node component and coalesce the resulting sub-sequence. For example, the query
“goals scored by Carter whenever a change in his scoring
occurred” is expressed as
distinct-values(//player[name=’Carter’]//goals)
This query only returns one pair (goal, interval) for
each sequence of k consecutive or overlapping seasons
where Carter scored the same number of goals, instead
of the k pairs that would be returned without using the
distinct-values statement.
Aggregation XPath 2.0 has aggregate operators that can
be applied to a sequence of nodes to compute its sum,
average, etc. In addition, we can take advantage of these
operators by applying them to sequences of time points,
as the next example shows. The query “name of the players who were with the Orlando Magic when McGrady
joined the franchise for the first time” is expressed in
TXPath as:
let $m= min(//franchise[name=‘Magic’]//
player[name=‘McGrady’]/@from)
return
//franchise[name=’Magic’]//player[$m ≥ @from
and $m ≤ @to]/name
In the query above, min returns the minimum time
instant in the result set, and this value is used to qualify
the results in the next part of the query.
Snapshots In Section 5 we discussed document snapshots based on the different implementations of the abstract temporal data model. Obtaining a document snapshot means reconstructing a temporal XML document
as of a given time instant. In order to express document
snapshots in TXPath we would need to introduce userdefined functions like the ones supported in XQuery. On
the other hand, snapshot queries can be expressed in
TXPath within the framework given by the syntax and
formal semantics introduced in Section 6.1. A snapshot
query at a time instant t is simply a query that retrieves
a portion of a document as of t. The query returns a
20
Rizzolo, F. and Vaisman, A.
sequence of pairs (node, interval) such that the interval
contains the instant t.
An example of a snapshot query is “Give me the
player nodes for players with the Toronto Raptors on
October 10th, 2001”. This query reads in TXPath:
NBAdb/franchise[name=’Raptors’]//players
[@from ≥ ‘10/10/01’ and @to ≤ ‘10/10/01’]
Assuming that the date October 10th, 2001 is represented by instant 15, the result of this query over the
database of Figure 1 is the sequence ((6, [0, 20]), (10,
[0, N ow]), (16, [0, 20])).
6.3 The Notion of “Now”
It is a well-known fact in temporal databases, that using
a current time variable has several implications which
require the definition of a precise semantics [12]. Since
our model follows the transaction time approach, the
problems arising from the use of Now are considerably
reduced compared with a valid time data model. The
main reason for this is that in valid time databases timestamps are provided by the user, while in transaction time
databases these values are usually built-in, i.e., provided
by the database management system (DBMS).
The semantics adopted for Now in this work is the
one proposed in [12]. Therefore, the meaning of the current time variable is that, if the ending point of a temporal label is T.T O = N ow, the edge is valid from T.T O
(the starting point of the label) until the timestamped
element is updated, yielding the so-called until changed
semantics. This will become evident in Section 9, where
we discuss updates. A direct consequence of this decision
is that T.F ROM can never be stamped with Now, as it
could be the case in valid time databases.
From the language point of view, we have decided
that the syntax for representing the current time variable uses the distinguished constant ‘Now’. At the implementation level, for simplicity we have defined the
maxint value for representing the end of time (some systems use ‘999-12-31’ for representing this value). Also,
a current-date function is applied when needed, that is,
when ‘Now’ is found in a query, or maxint is found in
the database.
7 Structural Summaries For Temporal XML
As we mentioned in Section 1, efficiently querying temporal XML documents requires the ability to find the paths
in the graph that were valid at a given time (i.e. the continuous paths in the document). This ability is not provided by traditional path summaries. Our proposal for a
new class of summaries, which we call TSummary, adds
the time dimension to the usual path summarization by
considering continuous paths to element or value nodes.
TSummary is the theoretical framework behind TempIndex, our indexing scheme for temporal XML data [37].
7.1 Summarizing Continuous Paths
Structural summaries are data structures used to locate specific fragments of the XML data, such as nodes,
paths and subtrees. By accessing relevant data directly
they help to avoid sequential scans of entire documents
during query evaluation. Since our goal is to optimize
TXPath query evaluation, the (temporal) XML fragments we want to summarize are continuous paths. A
TSummary includes a graph that describes the continuous paths in the temporal document in a concise way.
Nodes in the temporal document are partitioned into
equivalence classes. Each node in a TSummary graph will
have associated to it one such equivalence class, which
we call the temporal mapping (or tmap) of the summary
node.
Like in non-temporal XML, a concise representation
of the nodes based on their labels is a useful summarization of the temporal XML graph structure. The first
TSummary we will present in this work is the LCP summary which summarizes labels of continuous paths from
the root. Traditional path indices [25, 38,31] often define
equivalence classes of nodes that belong to paths with
the same label. In contrast, the LCP Summary defines
equivalence classes of nodes that belong to cp’s from the
root with the same label. Since we also need to summarize temporal intervals, we will define a summary that
describes cp’s regardless of their labels. One way of doing that is to cluster together nodes that belong to cp’s
from the root with the same length. This is in fact the
definition of another TSummary, the interval summary.
We introduce next our formalization of temporal summaries we will use in the remainder of the paper.
Definition 13 (Temporal Summary) Consider a temporal XML document D and the set T N ode of pairs
hn, Ii such that n is a node in D, I is an interval, and
there is a continuous path p from the root of D to n with
interval I. A temporal summary of D, SD = (T Sum,
tmap, edge, Label, λ), is a structure where
– TSum is a set of summary nodes defined as follows:
Sum = {s | hs, n, Ii ∈ tmap};
– tmap is a relation defined as follows:
– Each pair hn, Ii is associated to only one summary node. That is, hs, n, Ii ∈ tmap ⇒ ¬∃s′ 6=
s | hs′ , n, Ii ∈ tmap;
– Every document node is associated to some summary node. That is, ∀hn, Ii ∈ T N ode : ∃s ∈
T Sum | hs, n, Ii ∈ tmap;
We say that a pair hn, Ii ∈ T N ode is in the temporal
map of a node s ∈ T Sum iff hs, n, Ii ∈ tmap.
Temporal XML
21
– edge is a relation in TSum × TSum that represents
the edges in SD ;
– λ is a labelling function that assigns names to nodes
in TSum by mapping T Sum → Label.
Definition 14 (Temporal Summary Graph) Consider a temporal summary SD = (T Sum, tmap, edge,
Label, λ). The tuple GS = (T Sum, edge, Label, λ) is
the summary graph of SD .
– SD has the edge property;
– Two document nodes belong to the tmap of the same
summary node iff they are at the end of continuous
paths from the root with the same label : ∀ hn, Ii,
hn′ , I ′ i ∈ T N ode, s ∈ T Sum : hs, n, Ii, hs, n′ , I ′ i ∈
tmap ⇔ {λ(p) | p = (r, . . . , n, I)} = {λ(p′ ) | p′ =
(r, . . . , n′ , I ′ )}.
– Label is the set of node labels in the temporal document;
– ∀hs, n, Ii ∈ tmap : λ(s) := λ(n).
Note that tmap defines a partition of T N ode where
two pairs belong to the same equivalence class iff they
have incoming continuous paths with the same labels.
Also note that there is a one-to-one mapping between the
equivalence classes defined by tmap and the summary
nodes in T Sum.
The following example shows the LCP summary for
the NBA database fragment of Figure 1.
✄ ☎ ✆ ✝ ✞ ✟ ✠✡ ☛
✒✔
✬
✒ ★✓
☞☛ ✆ ✌
✒✖
✧
✒ ★✔
✍ ✎✆ ✏ ☛ ☎
✒ ✓✢
✍ ✎✆ ✏ ☛ ☎
✒✗
Property 1 (Edge Property) A temporal summary SD
has the edge property iff its edges are defined by edge as
follows: edge := {hs, s′ i | ∃hs, n, Ii ∈ tmap ∧ ∃hs′ , n′ , I ′ i ∈
tmap ∧ ∃ec (n, n′ , Ie ) ∈ D}
Definition 15 (LCP Summary) Consider a temporal
XML document D and the set T N ode of pairs hn, Ii such
that n is a node in D, I is an interval, and there is a
continuous path p from the root of D to n with interval
I. A temporal summary of D SD = (T Sum, tmap, edge,
Label, λ) is an LCP summary iff
✦
✒ ★✢
✝✆✑
✌ ☛
✒✕
Since we need for the edge structure of the summary
to somehow describe the structure of the temporal document, there has to be a relationship between the summary edges and the temporal XML graph edges. This
relationship is given by the following property.
That is, there is an edge between two nodes in the
summary iff there is a containment edge between any
two nodes in their temporal mappings.
In order to define our first TSummary, we will need
the notion of label of a continuous path. The standard
notion of label paths can be easily extended to continuous paths as follows. Let p = (n1 , . . . , nk , T ) be a continuous path with interval T. The label path of p, or
continuous label path λ(p) is the concatenation of the
labels of the ni in p.
We are able to introduce now the LCP summary, a
TSummary that summarizes labels of continuous paths
from the root.
✁✂
✒✓
✝✆✣
✌ ☛
✒ ✓✓
✡ ☞✆ ☞✡
✒ ✓✔
✩
✒ ★✕
✛ ✜ ✆ ✎✡
✒ ✓✕
✪
✒ ★✖
✝✆✙
✌ ☛
✒✚
✡ ☞✆ ☞✡
✒✘
✎✆ ✡ ☞
✒✥
✛ ✜ ✆ ✎✡
✒✤
✫
✒ ★✗
✭✙✮✰✯
✱ ✲✑✳ ✴✣✵ ✶✙✷✣✸
Fig. 18 Temporal Summary Graphs
Example 14 The LCP summary graph GS = (T Sum,
edge, Label, λ) for the NBA database is shown on the
left side of Figure 18, where T Sum = {s1 , . . . , s13 }, edge
is defined by the edges in the figure, Label = {NBA,
franchise, name, last, team, player, stats, goals}, and λ
is defined by the label assigned to each node in the figure.
In addition, the tmap relation of the LCP summary SD
is given by the following table:
s1
s2
s2
s2
s3
s3
s3
s4
s5
s5
s5
s5
s6
s6
s6
s6
s7
s8
s8
tmap
0, [0, N ow]
1, [0, N ow]
2, [0, N ow]
3, [0, N ow]
4, [0, N ow]
15, [0, N ow]
25, [0, N ow]
5, [0, N ow]
6, [0, 20]
10, [0, N ow]
14, [23, N ow]
16, [0, 20]
7, [0, 20]
8, [0, N ow]
17, [0, 20]
30, [23, N ow]
9, [0, N ow]
13, [23, N ow]
18, [0, 20]
s8
s9
s9
s9
s10
s10
s10
s11
s11
s11
s12
s12
s12
s13
s13
s13
s13
s13
s13
tmap (cont.)
31, [0, N ow]
11, [0, N ow]
12, [23, N ow]
19, [0, 20]
14, [0, 22]
16, [21, N ow]
24, [0, N ow]
17, [21, N ow]
23, [0, 20]
30, [0, 22]
13, [0, 22]
18, [21, N ow]
22, [0, N ow]
12, [0, 15]
12, [16, 22]
19, [21, 30]
19, [31, N ow]
20, [0, 10]
21, [16, N ow]
Since cp’s in the LCP summary are clustered by label,
we need additional summaries to describe the intervals
and to capture the node ordering <t at any given instant
(as defined in Proposition 1). In order to do that, we
will introduce next a TSummary based on the notion of
temporal depth.
Definition 16 (Temporal Depth) Consider a temporal XML document D. For each node n in D such
that there exists a continuous path p = (r, . . . , n, I) in
22
D, δ(n, I) = length(p) is a function called the temporal
depth of n during the interval I. (Note that there is at
most one continuous path with interval I from the root
to each node n).
For each temporal depth k, we define the nodes that
are valid at that depth during an interval I as follows.
Definition 17 (Node Validity) A node n is valid at
temporal depth k in an interval I iff there exists an interval I ′ such that δ(n, I ′ ) = k and I ⊆ I ′ .
Based on the notions of temporal depth and node validity we introduce next the interval summary, a TSummary that clusters together nodes that belong to cp’s
from the root with the same length.
Definition 18 (Interval Summary) Consider a temporal XML document D and the set T N ode of pairs
hn, Ii such that n is a node in D, I is an interval, and
there is a continuous path p from the root of D to n with
interval I. A temporal summary of D, SD = (T Sum,
tmap, edge, Label, λ) is an interval summary iff
Rizzolo, F. and Vaisman, A.
methods proposed by Bozkaya et al. [3] and Salzberg et
al. [47], where a B+ tree indexes the FROM value in the
intervals being indexed, and each internal node is augmented with the information of the maximum TO value
in an interval of the corresponding subtree. These proposals are “indexing” schemes rather than summaries.
That is, they provide low level index structures and access methods for optimization. In contrast, TSummaries
are high level descriptions of the temporal data which
are in turn implemented by indexing schemes. In the
next section we provide a description of our own indexing scheme, TempIndex, but TSummaries could also be
implemented by combining other well-known interval indexes and access methods like the ones mentioned above.
7.2 TempIndex: an Indexing Scheme for Temporal XML
In order to optimize TXPath query evaluation, we need
to integrate LCP and interval summaries in an effective
way. In addition to the summaries themselves, we need
indexes, access methods and additional data structures
with information about the hierarchical relationships be– SD has the edge property;
tween nodes in a temporal XML documents. We present
– Two document nodes belong to the tmap of the same here TempIndex (introduced in previous work [37]), an
summary node iff they have the same temporal depth: indexing scheme that integrates LCP and interval sum∀ hn, Ii, hn′ , I ′ i ∈ T N ode, s ∈ T Sum : hs, n, Ii, maries with additional indexes for efficient navigation.
hs, n′ , I ′ i ∈ tmap ⇔ δ(n, I) = δ(n′ , I ′ ).
For representing the structural relationship between
– Label = {0, . . . , m}, where m is the length of the nodes in different equivalence classes we define what we
longest cp in the document;
call CP tables. Each CP table is associated to a summary
– ∀hs, n, Ii ∈ tmap : λ(s) := δ(n, I).
edge and stores the parent-child relationship between
Note that δ(n, I) defines an equivalence relation be- document nodes in the two equivalence classes of the
tween the nodes in the temporal XML graph where for end-points of such edge. In addition, CP tables contain
each pair hn, Ii in a class the length of the continuous the interval of the continuous paths ending at the child
equivalence class. The information contained in the CP
path from the root to n is the same.
tables is used during query evaluation to traverse continExample 15 The interval summary graph GS ′ = (T Sum′ , uous paths with a given label and interval (see Section 8
edge′ , Label′ , λ′ ) for the NBA database is shown on the for more details).
right side of Figure 18. Since the difference in labels does
Definition 19 (CP Tables) Consider the summary SD
not matter here, several LCP summary nodes may “colof a temporal XML document D. For each edge e =
lapse” into one in the interval summary. For instance,
(s1 , s2 ) in the temporal summary graph GS there is a
nodes s3 , s4 and s10 of the LCP summary are repreCP table in which each tuple t has attributes parent,
′
sented by node s2 in the interval summary. This also
node, from and to such that there is a continuous path
′
impacts on the definition of tmap for the interval sumfrom the root of d to t.node with interval [t.from,
′
mary SD
: all pairs hn, Ii that belong to nodes s3 , s4 and
t.to] via t.parent. When t.node has a value v assos10 in SD (see tmap definition in Example 14) belong to
ciated to it, the CP table has an extra attribute named
′
′
node s2 in SD .
value, where t.value= v. Tuples in the CP tables are
Whereas the LCP summary provides a combined la- sorted by node.
bel path + temporal clustering, the interval summary Example 16 Consider the summary graphs shown in Figis in fact a pure temporal clustering. This kind of clus- ure 18, which correspond to the summaries of Examples
tering does not consider node labels or label paths and 14 and 15. The CP tables of edges (s , s ) and (s , s )
10 11
5 6
therefore can be used for efficiently selecting nodes based are the following:
solely on their intervals. This functionality is useful for
Edge (s10 , s11 ) CP table
computing document snapshots and for some stages in
parent node from
to
value
the evaluation of TXPath queries (see Section 8.2).
16
17
21
N ow “McGrady”
There are many proposals in the literature for index24
23
0
N ow
“Garrity”
ing temporal intervals. Some of them are based on the
14
30
0
22
“Williams”
Temporal XML
23
Edge (s5 , s6 ) CP table
node from
to
value
7
0
20
“Oakley”
8
0
N ow
–
17
0
20
“McGrady”
30
23
N ow “Williams”
the event table represents a change in some node’s state
(from valid to not valid and viceversa). Therefore, for
each tuple in the event table the algorithm checks whether
the node is already in the valid list or not. If the node is
in the list it means that the entry in the table corresponds
to the end of its interval and therefore the node has to
Note that nodes 17 (“McGrady”) and 30 (“Williams”) be removed from the valid list. If the node is not yet in
appear in both tables but with different intervals. This the list, then the entry corresponds to the beginning of its
happens because we are indexing cp’s rather than nodes, interval and thus the node has to be added to the list. In
and both nodes have two cp’s ending at them.
addition, for each tuple in the event table the algorithm
also checks if it is the last one in the table or if its inFor each temporal depth k, we will define a table
stant attribute is different from the next. In both cases a
called δk table, listing the nodes that are valid at certain
tuple hold − instant, instant − 1, valid listi is added to
intervals and their relative order. These intervals are obthe δk table, and no tuple is added otherwise.
tained by taking all the intervals that label some continuous path of length k and partitioning them as needed Example 17 We will apply the δ table construction algoto obtain a set of pairwise-disjoint intervals. This is for- rithm to the s′5 node of the interval summary in Figure
malized with the notion of interval partition.
18. The tmap relation for s′5 is the following:
parent
6
10
16
14
Definition 20 (Interval Partition) The interval partition P of a set of intervals I1 . . . In is the smallest set
of intervals P = P1 . . . Pm such that all the Pi ’s in P
are pairwise disjoint and P contains a partition of every
interval Ij .
Definition 21 (δk Tables) Consider a temporal XML
document D. For each temporal depth k in D there is a
table called δk table. Each tuple t in a δk table has two
temporal attributes, from, to, and a list-valued attribute
valid. Let I1 . . . In be all the intervals such that there is
a cp of length k labeled by one of the Ij ’s, and P1 . . . Pm
be the interval partition of I1 . . . In . Each Pk is represented by a tuple t in δk . The t.valid attribute contains
the list of all nodes at temporal depth k that are valid
in the interval [t.from, t.to). The nodes in t.valid
are ordered by the order relation defined in the interval
[t.from, t.to]. (Note that, according to Proposition
1, this order relation is always defined for all nodes in
[t.from, t.to]). Tuples in the δk tables are indexed
by from and to.
s′5
s′5
s′5
s′5
tmap
9, [0, N ow]
11, [0, N ow]
12, [23, N ow]
19, [0, 20]
From the tmap relation for s5′ the following event
table is constructed:
δ5 event table
node instant
19
0
11
0
9
0
19
20
12
23
12
N ow
11
N ow
9
N ow
Note that the order in which the nodes appear in
the event table does not necessarily represent the instant
order <t . (For example, node ‘11’ appears before node
‘19’ in the temporal document at instant ‘0’, rather than
after it.) The instant order <t will be taken into account
when the nodes are inserted in the valid list. Let us now
Algorithm 5 (Construction of δk Tables) The δk ta- begin to traverse the event table. The first node we find
ble construction algorithm starts by creating a temporary is ‘19’ with instant ‘0’. We check if ‘19’ is in the valid
event table with two attributes, node and instant. For list. Since it is not (in fact the valid list is still empty
each tuple hsk , n, Ii in the tmap relation, where n (and at this point), we conclude that ‘0’ corresponds to the
sk ) are at depth k, two tuples t’ and t’’ are created in beginning of ‘19’s interval and we add it to the valid list.
the event table as follows:
Likewise we add nodes ‘11’ and ‘9’. Since the next tuple
has a different instant value, we can now add the entry
t’.node = n
h0, {9, 11, 19}i to the δ5 table. Next we find node ‘19’
t’.instant = I.F ROM
with instant ‘20’ and when we look it up in the valid list
t’’.node = n
t’’.instant = I.T O
we find it. Thus we conclude that ‘20’ corresponds to the
end of ‘19’s interval and hence we remove ‘19’ from the
The event table is then sorted by the instant at- list. We process the rest of the event table in a similar
tribute (and the instant order <t when it is defined, i.e. fashion and we get the following δ table:
5
when two or more tuples have the same instant attribute
δ5 table
value t). Next the algorithm traverses the event table
from
to
valid
in ascending order adding and removing nodes from the
0
19
{9, 11, 19}
valid node list. Nodes in the valid list are kept in the or20
22
{9, 11}
der defined in their intersection interval. Each entry in
23
N ow {9, 11, 12}
24
The δ5 table contains the interval partition of intervals [23, N ow], [0, 20], [0, N ow], which are the intervals
of node s′5 according to tmap.
The δk tables can be used for computing snapshots
efficiently. When creating a snapshot at time i we simply have to find the tuple t in the δk tables such that i is
contained in t’s interval. In addition, the δk tables support efficient retrieval of all nodes that are valid during a
given interval. In the next section we will explain query
processing using the CP and δk tables in detail.
7.3 Space Requirements
The size of the index is proportional to the number of
cp’s. Our experiments in Section 10 show that, for the
NBA database, the number of cp’s is about three times
the number of nodes in the temporal graph. We support
three types of updates, insertion, deletion and modification. When the XML graph is a tree, i.e. before any
update is performed, for each edge in the temporal graph
there is one tuple in the CP tables. Furthermore, since
there is only one interval of relevance, [0, Now], there is
only one tuple t in each δk and the list of its valid nodes
contains all nodes at temporal depth k. As updates are
performed, the number of cp’s in the document – and
consequently the number of tuples in the tables – increases. The tables affected by an update are those that
index descendants of a node at the update point, so the
closer the update is to the root, the larger the increase in
the index size. Occasionally, an update may also create a
new partition in a δk table, in which case the nodes from
the last partition that are still valid in the new partition
have to be replicated.
There are several ways to reduce the space requirements for the index. In many applications, we expect
most updates to occur close to the leaves, so that the
size of the index will grow linearly in the size of the
document. Our experiments so far confirm that expectation: the main-memory representation of TempIndex has
a size comparable to that of the DOM representation (see
Section 10) for all document sizes tested.
Another typical property of temporal applications is
that there is a great deal of skew in the distribution
of queries, with recent instants being accessed more frequently than older ones. In a space-constrained situation
we could exploit this property by limiting how far the
temporal window extends back in time, and periodically
reindexing to take this into account.
8 Evaluating TXPath Queries Using TempIndex
In this section we will introduce the query evaluation
algorithms which are based on our ancestor-descendant
encoding.
Rizzolo, F. and Vaisman, A.
8.1 Ancestor-Descendant Encoding for Temporal XML
So far we have used node numbers for identifying nodes
in the XML graph. However, we will show that we can
encode nodes in a more efficient way in order to improve
the performance of some TXPath queries. We devised
the temporal interval encoding, which is an ancestordescendant encoding inspired by the interval scheme first
presented by Santoro and Khatib [48]. In this scheme, the
leaves of a tree are numbered from left to right and each
internal node is labeled with a pair of numbers corresponding to its smallest and largest leaf descendants.
All known ancestor-descendant encoding schemes (see
[30] for a recent survey) are variations of Santoro and
Khatib’s interval scheme. The average label length of
these class of schemes has an upper bound of 2 log n, n
being the number of nodes in the XML graph. In our
index, the integration of the encoding with other index
structures allows us to encode the ancestor-descendant
relationship using only one number instead of two (the
end of each interval is implicitly stored in the order of
the δk tables).
The main idea for the temporal interval encoding is
based on taking advantage of three facts: (a) again, we
are indexing continuous paths, not just nodes; (b) the
intervals of all the continuous paths in which a node
n participates are disjoint; (c) the graph representing a
snapshot of a temporal XML document is acyclic. Thus,
we can encode the nodes in a way such that each node
has as many encodings as continuous paths it is part of.
In order to formally define the temporal interval encoding, we need to define first a total order relation
among nodes at different intervals.
Definition 22 Let p1 = (root, . . . , v, T1 ) and p2 =
(root, . . . , w, T2 ) be continuous paths in D. The partial
order relation ≺T is defined as follows:
T
1. If T1 T2 = φ then v ≺[0,N ow] w iff T1 .F ROM <
T2 .F ROM
.
T
2. If T1 T2 6= φ then v ≺T w iff v <t w for every t ∈ T ,
where <t is the order relation at instant t.
Definition 23 (Temporal Interval Encoding) Let
≺T be the order relation from Definition 22, and let
succ≺ (n, T ′ ) be the successor function in ≺T of node n at
interval T ′ ⊆ T (a node may have different successors at
different intervals). In addition, let gap(n, T ) be a function assigning an arbitrary integer to each node n in a
given interval T (the gap function represents the “integer
gap” between two consecutive encodings). The temporal
interval encoding function τ is defined over pairs hn, T i
such that there is a cp p = (root, . . . , n, T ), as follows:
τ (hn′ , T ′′ i) = τ (hn, T ′ i) + gap(n, T ) where T is an interval such that succ≺ (n, T ) = n′ , and τ (hn′ , T ′′ i) = 0
otherwise.
The gap function is designed to specify how much
“room” we want to leave between encodings for future
Temporal XML
25
✿ ❀ ❁❃❂ ❀
❄
❇❄ ✣✞✂ ✟ ✤ ✥ ✦ ✝
❅ ✣✞✂ ✟ ✤ ✥ ✦ ✝
❆ ✟ ✂ ✠✡✝
✹ ✍ ✺ ✗✻ ✖ ✏
❇ ✁✝ ✂ ✠
✮✶ ✱ ✷ ✶ ✵
❊ ✄ ☎✂ ✆ ✝ ✞
❋ ✟ ✂ ✠✡✝
✛ ✍ ✜ ✌✘ ✢
✮✶ ✱ ✷ ✶ ✵
✮✶ ✱ ✷ ✷ ✵
✮✷ ✸ ✱✲ ✳ ✴ ✵
✮✷ ✯ ✱✲ ✳ ✴ ✵
❆ ❄ ✄ ☎✂ ✆ ✝ ✞
❊❄
❆ ❅ ✟ ✂ ✠✡✝ ❆ ❆ ✁ ✂ ✁
❊❅
❊❆
✧ ★ ✩ ✖✍ ✪ ✢
❆❉ ✒ ✔ ✂ ☎
❅ ❉ ✟ ✂ ✠✑✝ ❅ ❈ ✁ ✂ ✁ ❇ ❆ ✟ ✂ ✠✡✝ ❇ ❉ ✁ ✂ ✁
❊❉
❉❆
❉❅
✙
✙
✮✶ ✱✲ ✳ ✴ ✵
☛ ☞✌✌☞✍ ✎ ✏
☎ ✂ ✁✓
❇● ✒ ✔ ✂ ☎
❅ ●❍
❅❇ ✒ ✔ ✂ ☎
❉❉
✭✭
✕ ✍ ✖ ✗✘ ✖
✙✚
❅ ❆ ✄ ☎✂ ✆ ✝ ✞
❇ ❅ ✄ ☎✂ ✆ ✝ ✞
❉❄
■ ❄ ✣✞✂ ✟ ✤ ✥ ✦ ✝
❋ ❄ ✟ ✂ ✠✑✝
✧ ✍ ✫ ☞★
❋ ❆ ✄ ☎✂ ✆ ✝ ✞
■ ❅ ✟ ✂ ✠✡✝
✼ ✍ ✽ ✾ ✽ ✗✻ ✽ ☞✻
✠ ✝ ❋● ✁✂ ✁
❋❉ ✟ ✂ ✡
✩ ✍ ✖ ✖ ☞✗✢
✮✶ ✱✯ ✶ ✵ ✮✯ ✰ ✱✲ ✳ ✴ ✵
❋❈ ✒ ✔ ✂ ☎ ❋❇
✙ ✚✬✙ ✭
Fig. 19 Indexing intervals with temporal interval encoding
updates. For instance, in the temporal encoding of Figure 19, there is only one gap assignment to the node with
encoding 6, which is the following: gap(6, [0, 20]) = 1
(the next encoding is 7 for the [0, 20] interval). In contrast, the node 16 has three different gap assignments:
gap(16, [0, 20]) = 4 (the next encoding for the [0, 20] interval is 20), gap(16, [21, 22]) = 44 (the next encoding for
the [21, 22] interval is 60), and gap(16, [23, N ow]) = 14
(the next encoding for the [23, N ow] interval is 30).
For representing the structural relationship between
nodes in different equivalence classes using the temporal
interval encoding we define what we call TCP tables.
TCP tables are CP tables where nodes are represented by
the temporal interval encoding. Since temporal encoding
does not require explicitly representing nodes’ parents,
in the TCP tables the parent attribute is dropped. In
contrast to CP tables, a TCP table is associated to a
summary node rather than an edge.
Definition 24 (TCP Tables) Consider the summary
SD . For each node s in the temporal summary graph GS
there is a TCP table in which each tuple t has attributes
node, from and to such that there is a continuous path
from the root of d to t.node with interval [t.from,
t.to]. When t.node has a value v associated to it, the
TCP table has an extra attribute named value, where
t.value= v. Tuples in the TCP tables are sorted by
node.
Example 18 Consider for instance Figure 19. The player
node corresponding to ‘Williams’ has initially been encoded as 61. This number encodes the node in the interval [0, 22], when it was a descendant of node 60. For the
interval [23, N ow], the node’s number is 30, because it
became a descendant of 1.
In other words, there are two continuous paths (with
disjoint intervals) from the root to the node, and for each
one of them we use a different encoding. Note that these
different node numbers do not imply a larger number of
tuples in the TCP tables with respect to the CP tables,
because there is always one tuple for each continuous
path, as in the encoding used before. For instance, consider the LCP summary graph shown in Figure 18 and
the NBA example with the temporal encoding of Figure
19. The TCP tables of nodes s11 and s6 are the following:
node
71
83
62
Node s11 TCP table
from
to
value
21
N ow “McGrady”
0
N ow
“Garrity”
0
22
“Williams”
node
8
13
21
31
Node s6 TCP table
from
to
value
0
20
“Oakley”
0
N ow
–
0
20
“McGrady”
23
N ow “Williams”
These tables are the CP tables from Example 16
but with temporal encoding and without the parent attribute. Note that nodes with “McGrady” and “Williams”
values appear in both tables but with different intervals
and temporal encodings.
8.2 Query Evaluation
The evaluation of a TXPath query is divided into stages
based on its filter sections. The filter sections of a TXPath query (also called filters) are the expressions that
appear between brackets in the query. A filter is a predicate which is applied to the pairs hnode, intervali that
are at the end of the cp’s that match the path expression
before it. For simplicity, we consider in this section TXPath expressions without nested filters. After each filter
26
Rizzolo, F. and Vaisman, A.
section, the evaluation of the rest of the query continues only for those pairs (node, interval) that satisfy the
filter.
We decompose each TXPath query into a sequence
of calls to six evaluation functions: getParent(Label),
getDescendants(Label), getChildren(Label),
getAncestors(Label), valFilter(valPred) and
tempFilter(TempPred), where Label is a node label,
valPred is a value predicate, and tempPred is temporal
predicate. Each function is evaluated on a list of tmap
tuples hs, n, Ii and returns another list of tmap tuples. In
order to return a TXPath answer, the summary node s is
dropped from the hs, n, Ii tuples so that the list returned
contains only pairs (n, I), just as the TXPath semantics
requires.
Example 19 Consider the query “name of players who
have played for the Toronto Raptors continuously since
instant 20” which is expressed in TXPath as
//franchise[name=‘Raptors’]//player/name[@from≥20]
This query can be evaluated top-down with the evaluation functions as follows:
list.add(root);
list = list.getDescendants(‘‘franchise’’);
list = list.getChildren(‘‘name’’);
list = list.valFilter(‘Raptors’);
list = list.getParent(‘‘franchise’’);
list = list.getDescendants(‘‘player’’);
list = list.getChildren(‘‘name’’);
list = list.tempFilter(‘‘from ≥ 20’’);
If the number of nodes satisfying the last predicate
(“from ≥ 20”) is smaller than those satisfying the first
predicate (‘Raptors’), it might be better in terms of performance to choose a bottom-up query plan like the following:
list.add(leaves);
list1 = list.getAncestors(‘‘name’’);
list1 = list1.tempFilter(‘‘from ≥ 20’’);
list1 = list1.getParent(‘‘player’’);
list1 = list1.getAncestors(‘‘franchise’’);
list2 = list.getAncestors(‘‘name’’);
list2 = list2.valFilter(‘Raptors’);
list2 = list2.getParent();
list = list1.intersect(list2);
list = list.getDescendants(‘‘player’’);
list = list.getChildren(‘‘name’’);
1. For each s such that (s, n, I) is in inList
1.1. Get the descendants of s in the summary graph with
label Label and add them to sN odes.
2. For each s in sN odes
2.1. For each n such that there is a tuple (s, n, I) in inList
2.1.1 Get (n′ , I ′ ) = succesor(n, I)
2.1.2 Assign to outList all tuples (s, t.node, [t.f rom,
t.to]) such that t is a tuple in the TCP table of s,
and τ (n, I) < t.node < τ (n′ , I ′ ).
3. Return outList.
Algorithm 7 (tempFilter)
INPUT: inList, tempP red.
OUTPUT: outList.
1. For each n such that (s, n, I) is in inList.
1.1. Get δx where x is the temporal depth of n during the
interval I.
1.2. For each interval i such that (i, valid) is in δx and i
satisfies tempP red
1.2.1. Assign to outList all tuples (s, n, I) such that n
is in valid and there is a tuple t in the TCP table
of s such that t.node = n and I = (t.f rom, t.to).
3. Return outList.
We illustrate next through and example how a TXPath query is evaluated using the evaluation functions
and a TempIndex.
Example 20 Consider again the query of Example 19,
expressed in TXPath as
//franchise[name=‘Raptors’]//player/name[@from≥20]
We will follow the top-down evaluation presented in
Example 19 on the LCP and interval summary graphs
of Figure 18 and the NBA example with the temporal
encoding of Figure 19. The evaluation begins by adding
the tuple hs1 , 0, [0, N ow]i to list, which contains the root
elements of the XML document and the LCP summary
graph. Then the evaluation continues by searching for
the descendants of the summary root with label “franchise”, which is s2 , and then its children with label “name”,
which is s3 . Since we have not filter out anything yet, list
contains at this point all tuples in the TCP table of node
s3 (without the values):
node
2
80
91
Node
from
0
0
0
s3 TCP table
to
value
N ow
“Raptors”
N ow
“Magic”
N ow “San Antonio”
The next step is selecting the node that has the “RapQuery plans that are a blend of top-down and bottom- tors” value (node 2), so that list is reduced now to the
up evaluations are also possible.
tuple (2, [0, N ow]). The evaluation continues by going
back to summary node s2 in order to obtain the “franWe present next algorithms for computing functions
chise” node that corresponds to the name “Raptors”. For
getDescendants(Label) and tempFilter(tempPred) on
this we will need the TCP table of s2 :
a TempIndex using the temporal interval encoding.
Algorithm 6 (getDescendants)
INPUT: inList, Label.
OUTPUT: outList.
Node s2 TCP
node from
1
0
60
0
90
0
table
to
N ow
N ow
N ow
Temporal XML
27
The parent node of 2 is the node with the biggest
temporal encoding smaller than 2 in the s2 TCP table,
which is 1. Then, the tuple hs2 , 1, [0, N ow]i is now assigned to list (the previous tuples are removed).
Since we have filtered out tuples from the TCP tables involved, we will need the entire encoding interval
of node 1 to continue evaluating the descendants. The
encoding interval will be used to determine exactly what
nodes of all descendant TCP tables are in fact descendants of node 1. The right end of the interval is obtained
from the delta1 table
from
0
δ1 table
to
valid
N ow {1, 60, 90}
by taking 60, the node next to 1 in the valid list of
the appropriate interval (the only one in this case). The
next step consist in obtaining all nodes with temporal
encodings between 1 and 60 from the corresponding TCP
tables. For that we first find the descendant “player”
nodes in the LCP summary graph (s5 and s10 ), and then
their “name” children (s6 and s11 ). The TCP tables of
s6 and s11 are the following:
node
71
83
62
Node s11 TCP table
from
to
value
21
N ow “McGrady”
0
N ow
“Garrity”
0
22
“Williams”
node
8
13
21
31
Node s6 TCP table
from
to
value
0
20
“Oakley”
0
N ow
–
0
20
“McGrady”
23
N ow “Williams”
Since our model deals only with transaction time, all
updates occur at the current time instant, denoted tc .
However, the update operators may be extended, allowing, if needed, some limited form of retroactive updating,
without changing other characteristics of the model. We
will discuss this issue below.
9.1 Insertion
The insertion of a new node in a temporal XML document requires specifying the new node n′ to be inserted,
and a current node n (i.e., a node with an incoming containment edge where Tec .T O = N ow). The new node n′ ,
and a containment edge from n to n′ with temporal label
[tc , N ow] are added to the graph. The DDL (Data Definition Language) syntax for insertion, along the lines of
Tatarinov et al. [53], is:
FOR variable IN PathExpression
INSERT ChildExpression
[VALUE value]
PathExpression returns pairs hnode, intervali. For
each pair such that interval.T O = N ow, a new node
is added as a child of node, with path label given by
ChildExpression. If the new node is a value node, the
VALUE keyword allows indicating the corresponding value.
This keyword is omitted when inserting an element node.
Algorithm 8 (InsertNode)
INPUT: Document D, Summary S, insert statement.
OUTPUT: Updated Document D and Summary S.
From these tables we select all nodes between 1 and 1. Evaluate PathExpression in insert statement with evaluation functions from Section 8.2.
60 and add them to list, which now contains hs6 , 8, [0, 20]i, 2. For each current node v in the output of the previous
hs6 , 13, [0, N ow]i, hs6 , 21, [0, 20]i, hs6 , 31, [23, N ow]i. The
step, let w′ be the new child to be inserted with interval
[tc , N ow]
last step consists of filtering list by selecting only those
2.1. Get w, the last child of v at current time tc which has
tuples t that have t.FROM ≥ 20, and this ends the evala current interval [tw , N ow]. Let w′′ a node in D such
uation.
′′
9 Temporal Updates
In this section we describe the updates allowed over a
temporal XML document. We will admit three kinds of
changes over the document: insertion of a new node, deletion (in the sense of temporal databases) of a node, and
update of containment edges. As usual, we will represent
a labelled edge as a tuple e(ni , nf , [ti , tf ]), where ni and
nf are the initial and final nodes, and [ti , tf ] represents
the interval of validity of the edge. Alternatively, as a
shorthand, we will use Te for denoting this interval. We
will also describe how these updates are propagated to
our temporal indexing scheme (assuming the temporal
interval encoding). Finally, we will discuss how the concepts explained in Section 4 interplay with the updating
process.
that succ≺ (w, T ) = w for some interval T .
2.2. Insert w′ into D and S by updating the order relation
and the successor function as follows: succ≺ (w, T1 ) =
w′ , succ≺ (w′ , T2 ) = w′′ , and succ≺ (w, T3 ) = w′′ .
(Note that intervals T1 , T2 , and T3 are a partition
of the former interval T and that succ≺ (w, T ) = w′′
is no longer pertinent.)
2.3. Update the gap function for w and w′ as follows
2.3.1. Set gap(w, T1 ) = gap(w, T )DIV 2, where DIV is
the integer division
2.3.2. Set gap(w′ , T2 ) = gap(w, T ) − gap(w, T1 )
2.3.3. Set gap(w, T3 ) = gap(w, T )
2.3.4. Delete gap(w, T )
2.4. Since succ≺ (w, T1 ) = w′ , then at this point the encoding of w′ at interval [tc , N ow] is assigned as follows:
τ (hw′ , [tc , N ow]i) = τ (hw, [tw , N ow]i) + gap(w, T1 )
3. Update the corresponding T CP tables, creating a new table if required (i.e., if there is no table associated to the
label that appears in ChildExpression).
4. Update the corresponding δi tables as follows
4.1. Get l, the last tuple in δi .
28
Rizzolo, F. and Vaisman, A.
is [0, N ow], instead of giving the new node the lifespan
[tc , N ow], we could specify any temporal label included
in [0, N ow]. We only need to add a statement to the operator’s syntax, indicating the lifespan of the new node.
Example 21 Let us consider the following expression evalGiven that we only deal with transaction time, a comuated on Figure 19 database:
plete discussion of this topic is beyond the scope of this
paper.
4.2. Set l.to = tc − 1.
4.3. Add a new tuple r s.t. r.f rom = tc , r.to = N ow,
r.valid = l.valid ∪ {w′′ }.
FOR $p IN //player[name/last=‘Carter’]/stats
INSERT $p/minutes
VALUE ‘33.2’
This update is processed as follows (assume for simplicity that tc is the instant 120 and that the new node is
w′ with interval [120, N ow]). We begin by evaluating the
path expression in the first line, which returns the node
with encoding 15, label stats and interval [0, N ow]. (In
this discussion, we will use encodings to identify nodes
when possible.) Then, we locate the last (and only) child
of 15 at current time, which is 16. At this point (before
the insertion) the successor of 16 at interval [23, N ow] is
30. The next step is to insert w′ by updating the order
relation as follows: change the interval in which the successor of 16 is 30 from [23, N ow] to [23, 119], set w′ as
successor of 16 during [120, N ow], and set 30 as successor
of w′ during [120, N ow].
The update continues with the gap function. We have
that gap(16, [23, N ow]) = 14 before the update. Then,
we set gap(16, [120, N ow]) = 7, gap(16, [23, 119]) = 14,
and then delete gap(16, [23, N ow]) = 14 because it is
no longer valid. Next, we assign an encoding to w′ with
τ as follows: τ (hw′ , [120, N ow]i) = τ (h16, [0, N ow]i) +
gap(16, [120, N ow]). Thus, τ (hw′ , [120, N ow]i) = 16 +
7 = 23.
The final step is to update the TCP and δi tables.
Since there is no node in the summary graph for w′ , we
add a new node labeled minutes to the summary graph
and create a new TCP table. Next, we insert the tuple
h11, 120, N ow, 33.2i into the new TCP table. Finally, we
update the δ5 table by adding a tuple r with r.f rom =
120 at the end and the other attributes of r are set as
follows. Let us denote l the last tuple in δ5 immediately
before the insertion of r. Thus, r becomes the last tuple
in δ5 , with r.f rom = 120, r.to = N ow, and r.valid =
l.valid ∪ {23}. Finally, we set l.to = 119.
We are assuming that updates are performed over a
consistent document, and must leave this document in
a consistent state. In the case of insertion of a node n,
the new node has only one incoming edge, meaning that
inconsistencies of type ii cannot occur. Also, the inserted
node has no outgoing edges. Therefore, consistency conditions of types i (temporal label outside the lifespan of
the node) and iii (cycles) cannot be introduced.
We commented above that the data model allows a
limited form of retroactive update. We will briefly clarify this notion. Suppose we want to insert a new player
node to the Orlando Magic franchise (i.e., an insertion
below node 60 in Figure 19). Since the lifespan of node 60
9.2 Deletion
We can delete (in the temporal database sense) attribute
nodes (except attributes of type ID), element nodes, and
reference edges from a temporal XML document. Again,
we only allow current objects to be deleted. Informally,
when deleting a node n at time td , ‘Now’ is replaced by
td in Tec .T O. The same occurs with all the containment
edges in the current subtree of n (the subtree with root
n where all the edges ec have Tec .T O = N ow). Reference
edges are deleted by setting Ter .T O = td in the temporal
label of the edge. Notice that no consistency checking
is required. Thus, this operation will always leave the
document in a consistent state.
Like in the case of insertion discussed above, a retroactive deletion could be implemented if the consistency
conditions of the model are satisfied. For example, in
Figure 19 we could delete node 86 at any instant between 17 and N ow.
Example 22 Suppose we want to ‘delete’ all statistics for
Williams (node 30 in Figure 19). The DDL for this update will be:
FOR $p IN /NBAdb//player[name=‘Williams’]
DELETE node $p//stats
The deletion is processed as follows. Again, we assume that deletion can only occur at the present time.
We begin by processing the path expression in the
first line using the summaries, returning node 30 along
with its interval. The next step is the delete operation,
which involves ‘deleting’ the subtree with root 32 at time
‘120’. In order to do that, we first replace the tuple
h32, 23, N owi in the TCP table of s8 with h32, 23, 120i.
Next, we replace h33, 23, N owi in the TCP table of s9
with h33, 23, 120i. Finally, we update the δ4 and δ5 tables
by inserting a new tuple r (recall that the table is ordered
according to the attribute instant), with r.f rom = 120,
r.to = N ow, and l.to = 119, where l is last tuple of each
table. In addition, we set r.valid = l.valid − {32}, for δ4
and r.valid = l.valid − {33}, for δ5 .
9.3 Edge updates
We will finish our discussion of temporal updates with
updates of containment edges. Let D be a temporal XML
document; n and ni two current nodes in D such that
Temporal XML
there exists a current containment edge from ni to n.
Let us consider another current node nj , not in a current
subtree of n; intuitively, a temporal update at instant tc
says that from tc on, the parent of node n will be nj .
Example 23 Suppose player “Garrity” starts playing for
the Toronto Raptors at the present time (instant ‘120’):
29
Doc
(MB)
20
40
60
80
100
Consistency checking gets a little more involved in
this case. Besides verifying that the new parent node
is current, we need to check consistency condition iii,
i.e., that no cycle is introduced by the update (in the
case of updates performed over current nodes, this limits
to check that no cycles are introduced at the current
instant.
10 Experiments
In this section we will show how indexing temporal intervals and continuous paths improves TXPath query evaluation. We compare TempIndex with other two systems:
a traditional, non-temporal structural summary and a
DOM-based structure. We have picked ToXin [46] as a
representative of the the systems that are based on nontemporal path summaries. We choose this particular system for convenience, since it is easily available to us; but
we believe the results would not be substantially different
using any other structural summaries proposal discussed
in Section 2. The second comparison will be against a
DOM representation of the base data without any kind
of summary.
Although using a non-temporal summary reduces the
search space for TXPath queries – compared to the DOM
approach – it does not help with the temporal semantics of the query evaluation. ToXin has data structures
that summarizes specific fragments of the XML data,
# of cp’s
1694010
3388020
5082030
6776040
8470050
# of TempIndex
summary nodes
94
94
94
94
94
Fig. 20 Benchmark data sets and index parameters
FOR /NBAdb//player[name=‘Garrity’]
SET PARENT
/NBAdb/franchise[name=‘Raptors’]/team
We begin by processing the path expression in the
first line using the summaries, returning node 82 along
with its interval. As we are using temporal interval encoding, all the nodes in the subtree with root 82 (including node 82 itself) must be given a new node number. In
this example, let us assume the following number assignments: 82 → 42, 83 → 43, 84 → 44, 85 → 45, 86 → 46.
Next, we insert the tuple h42, 121, N owi in the T CP table of s5 , and we replace the tuple h82, 0, N owi with
h82, 0, 120i in the TCP table of s10 . In addition, , we insert the tuple h43, 121, N ow, Garrityi in the TCP table
of s6 , and replace the tuple h83, 0, N ow, Garrityi with
h83, 0, 120, Garrityi in the TCP table of s11 . For the remaining elements, we perform an analogous procedure
and update the affected δk tables.
# of XML
data nodes
540300
1080600
1620900
2161200
2701500
Doc
(MB)
20
40
60
80
100
TempIndex
(MB)
95
190
285
380
475
Toxin
(MB)
201
402
604
805
906
DOM
(MB)
165
330
495
660
825
Fig. 21 Main-memory data structures sizes
Query
Q1
Q2
Q3
Q4
Q5
Q6
SN
TXPath template
//Player/Name
//APG
//Div[Name=‘X’]/Player[Interval=‘I’]
//SEQUENCE[APG≥‘n’ and Interval=‘I’]
/ancestor::Player/Name
for $p in //Player[Name=‘X’]
INSERT newNode $p//APG VALUE ‘V’
for $p in //Player[Name=‘X’]
DELETE node $p//stats
Snapshot
Fig. 22 Benchmark TXPath query templates
such as nodes, paths and subtrees, and thus avoid sequential scans of entire documents during query evaluation. However, like all traditional structural summaries,
it materializes paths rather than continuous paths; therefore, both ToXin and DOM have to compute all continuous paths on-the-fly during query evaluation time. Our
experiments show how important indexing the temporal structure of the data base is for evaluating TXPath
queries.
TempIndex is implemented in Java 2 and uses Berkeley DB Java Edition [50] as persistent storage. This is a
substantial difference with respect to the implementation
presented in [37], which was a pure main-memory system. In the current TempIndex implementation all data
structures are loaded into main-memory during query
evaluation and update processing, and are saved to disk
afterwards. This allows us to run queries on databases
much larger than the main-memory available by loading
and saving different index fragments during evaluation
time. We have also optimized the internal representation
of time intervals and attributes, with the consequent reduction in the index size with respect to [37].
For all our experiments we use query processing time
as the performance metric. We evaluate the performance
of the three systems on a set of seven query templates, as
shown in Figure 22. Query templates that contain value
and interval selections (Q3 through Q6) were tested with
30
Rizzolo, F. and Vaisman, A.
Query
Q1
Q2
Q3
Q4
Q5
Q6
20 MB
2006
16776
504
1174
104
8
40 MB
4012
33552
1008
2348
208
16
Doc. size
60 MB 80 MB
6018
8024
50328
67104
1512
2016
3522
4696
312
416
24
32
100 MB
10030
83880
2520
5870
520
40
Fig. 23 Answer sizes of retrieval queries (Q1 to Q4) and
number of update points of update queries (Q5 and Q6)
1000
100
10
1
0.1
0.01
1000
20
40
60
TempIndex
100
10
80
Toxin
100
DOM
Fig. 25 Query Q2 – log scale
1
1000
0.1
100
0.01
20
40
TempIndex
60
80
Toxin
100
10
DOM
1
Fig. 24 Query Q1 – log scale
ten different actual queries with various combinations of
values and intervals. For those four queries we report the
average results. The results for the SN template are also
an average of computing ten document snapshots at ten
different instants.
Templates Q1 and Q2 are TXPath retrieval queries
without interval constraints. Even though the intervals
are not specified in the expression, these are still TXPath
queries and thus the answers are pairs hnode, intervali.
Templates Q3 and Q4 have interval constraints. Half of
the queries tested for Q3 and Q4 were snapshot queries,
i.e., queries where the interval ‘I’ was actually a time
instant. Remember that in Section 6 we distinguished
snapshot queries from the document reconstruction as of
a given time instant, which we denoted document snapshot(reported in the SN template). A document snapshot
represents the state of the database at a given point in
time. In other words, it is a query of the form “Give
the state of the NBA database as of October 10, 1995”.
Therefore, while the answer to queries Q1 to Q4 are
hnode, intervali pairs, the answer to a document snapshot is an XML document. Finally, Q5 and Q6 are TXPath update queries, as described in Section 9.
We run our benchmark queries over the NBA database, which we consider to be a representative example
of temporal data. We loaded the data from the NBA web
site (www.nba.com) into a relational database (Microsoft
SQL Server 2000.) From this database we produced five
0.1
0.01
20
40
TempIndex
60
80
Toxin
100
DOM
Fig. 26 Query Q3 – log scale
documents of 20, 40, 60, 80, and 100 Megabytes. We ran
all queries over the five documents and the results are reported in Figures 24 to 30. For the experiments we used
a Pentium 4 PC at 2Ghz with 1GB of RAM memory and
a 60 GB hard drive. We report the number of nodes and
continuous paths in the temporal documents, as well as
the number of summary nodes in TempIndex in Figure
20. The size of the query answer for each query is shown
in Figure 23.
In all retrieval queries TempIndex performed faster
than ToXin. The TempIndex speed-up against ToXin
ranged from a minimum of nine times (document snapshot–
20MB) to a maximum of 220 times (Q2–100MB). Since
both systems summarize label paths and values, the difference in performance can be mostly attributed to the
summarization of continuous paths.
Q2 is one of the fastest in TempIndex but one of the
slowest in ToXin. The reason for that is that the answer
to Q2 is a whole class of continuous paths in the temporal
index, which is very easy to find and retrieve using the
TempIndex summary graph. Although in ToXin we can
Temporal XML
31
1000
1000
100
100
10
10
1
1
0.1
0.1
0.01
20
40
60
TempIndex
80
Toxin
100
20
DOM
Fig. 27 Query Q4 – log scale
40
60
TempIndex
80
Toxin
100
DOM
Fig. 29 Query Q6: Delete – log scale
1000
1000
100
100
10
1
10
0.1
1
0.01
20
40
TempIndex
60
80
Toxin
100
DOM
20
40
TempIndex
60
80
Toxin
100
DOM
Fig. 28 Query Q5: Insert – log scale
Fig. 30 Snapshot – log scale
narrow the search by following only those label paths
that match the regular expression in the query, we still
have to compute all continuous paths over them.
The document snapshots, in contrast, require heavy
computation even for TempIndex. We can still narrow
the search considerably by using the interval index to
locate the classes corresponding to the instant in time
we are looking for. However, once these classes are found
we have to reconstruct an entire document navigating
back and forth over them. That being said, TempIndex
still is almost one order of magnitude faster than ToXin
and DOM. Since a non-temporal path summary is not
very efficient for temporal document reconstruction, the
snapshot computation performance of ToXin and DOM
are quite similar.
Queries Q1 and Q2 do not contain either interval or
value selection predicates and have relatively large answer sets. However, keep in mind that not having interval
predicates does not mean that the temporal semantics is
not present: the continuous paths always have to be computed in order to return TXPath query answers. This is
the reason behind the two orders of magnitude difference in performance between ToXin and TempIndex for
queries Q1 and Q2.
The answer sets of Q1 are closer to the root and
smaller than those of Q2. This affects the query processing time in ToXin because the continuous paths to
be computed are fewer and much shorter in Q1 than in
Q2, with the consequent impact on query evaluation (Q2
queries take almost twice the time than Q1 ones). In contrast, since the DOM implementation is not aware of the
label path structure of the data graph, it requires the
traversal of the whole temporal graph in order to match
the regular expression on both Q1 and Q2. Consequently,
the difference in query processing time between Q1 and
Q2 is minimal in DOM.
Queries Q3 and Q4 require the additional computation of value and interval selection, which is reflected in
the TempIndex results. In contrast, the size of the answer set and the length of the continuous paths seems to
have a bigger impact on ToXin performance than the selection operations, and almost no impact at all in DOM.
32
The reason for that seems to be that ToXin spends most
of the query processing time on continuous path computations, while DOM does it on data graph traversal.
Update queries Q5 (insert) and Q6 (delete) require
label path traversal in order to locate the update point.
Since no continuous path computation is involved, the
difference between ToXin and TempIndex is minimal.
In contrast, the DOM implementation has to traverse
the whole temporal graph in order to locate the update
point, with the consequent time difference against both
ToXin and TempIndex.
11 Conclusion
In this paper we studied the problem of modeling and
querying temporal data in XML. We first proposed an
abstract data model for temporal XML, and compare
this model against other proposals, pointing out benefits
and limitations. We discussed four different alternatives
for implementing the abstract data model as temporal
XML documents. Based on our data model we studied the problem of validating temporal XML documents
against the temporal constraints that the data model
imposes. This problem has been overlooked in other proposals of temporal data models for XML. We gave algorithms for checking the presence of temporal inconsistencies in a document and fixing them, and studied the
algorithms’ complexity.
We also studied the problem of indexing temporal
XML documents. For this, we first introduced a temporal XML query language denoted TXPath, that extends
the semantics of XPath 2.0 to return sequences of (node,
interval) pairs instead of just sequences of nodes. The
indexing scheme we proposed is based on the materialization of continuous paths instead of paths. A new class
of summaries, denoted TSummaries, that adds the time
dimension to the usual path summarization schemes,
serves as framework to our indexing scheme. We presented two new kinds of summaries: LCP and Interval
summaries. The indexing scheme, denoted TempIndex,
integrates these summaries, also including other data
structures.
We compared the performance of a persistent implementation of TempIndex, against a traditional nontemporal structural summary (ToXin), and a DOM-based
structure. This comparison highlights the benefits of materializing continuous paths. TempIndex ran one order of
magnitude faster than ToXin and DOM, for snapshots.
For retrieval queries, TempIndex ran, on the average,
from 10 to 210 times faster than the other schemes. In addition, we sketched a language for updates, and showed
that the cost of updating the index is compatible with
real-world requirements.
Future work includes extending the study of new classes
of Temporal Summaries, for their application to different
settings. We also believe that our work on consistency
Rizzolo, F. and Vaisman, A.
issues can be a good starting point for studying and reasoning about constraints with indeterminate dates, of
the types presented in [16, 28]. The problem of reasoning about temporal constraints in XML is still underexplored.
Acknowledgements. The work presented in this
paper is the continuation of a research project started
jointly with our beloved friend and mentor Alberto O.
Mendelzon, who sadly passed away in June, 2004.
We are grateful to the reviewers for their hard work
and invaluable insights which helped to greatly improve
the paper. We would also like to thank Mariana Zerega,
who collaborated in the implementation of many of the
algorithms presented in this work, and Marcela Campo,
for her help with the algorithms presented in Section 4.
Alejandro Vaisman was partially supported by the
Millennium Nucleus Center for Web Research, Grant
P04-67-F, Mideplan, Chile.
References
1. Serge Abiteboul, Sophie Cluet, Guy Ferran, and MarieChristine Rousset. The Xyleme project. Computer Networks 39(3), pages 225–238, 2002.
2. T. Amagasa, M. Yoshikawa, and S. Uemura. A temporal
data model for XML documents. In Proceedings of DEXA
Conference, pages 334–344, 2000.
3. T. Bozkaya and M. Ozsoyoglu. Indexing valid time intervals. In Proceedings of DEXA Conference, pages 541–550,
1998.
4. P. Buneman, S. Davidson, W. Fan, C. Hara, and W. Tan.
Keys for XML. Computer Networks 39(5), pages 473–
487, 2002.
5. P. Buneman, S. Khanna, K Tajima, and W. Tan. Archiving scientific data. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data,
pages 1–12, Madison, USA, 2002.
6. S. Chawathe, S. Abiteboul, and J. Widom. Managing
historical semistructured data. In Theory and Practice
of Object Systems, Vol 5(3), pages 143–162, 1999.
7. S. Chawathe, H. G.Molina, K. Ireland, Y. Papakonstantinou, J. Ullman, and J. Widom. The TSIMMIS project:
Integration of heterogeneous information sources. In
Proeedings of 100th Anniversary Meeting of the Information Processing Society of Japan, pages 7–18, 1994.
8. S. Chien, V. Tsotras, and C. Zaniolo. Version management of XML documents. In Proceedings of the Third International Workshop on the Web and Databases, pages
75–80, Dallas, TX, 2000.
9. S. Chien, V. Tsotras, and C. Zaniolo. Efficient management of multiversion documents by object referencing. In
Proceedings of the 27th International Conference on Very
Large Data Bases, pages 291–300, Rome, Italy, 2001.
10. J. Chomicki. Temporal query languages: a survey. In
Proceedings of the 1st International Conference on Temporal Logic,LNAI 827, pages 506–534, 1994.
11. Chin-Wan Chung, Jun-Ki Min, and Kyuseok Shim.
APEX: An adaptive path index for XML data. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pages 121–132, 2002.
12. J. Clifford, C. E. Dyreson, T. Isakowitz, C. S. Jensen, and
R. T. Snodgrass. On the semantics of “now” in databases.
ACM Trans. Database Syst., 22(2):171–214, 1997.
Temporal XML
13. Mariano P. Consens and Tova Milo. Optimizing queries
on files. In Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, pages
301–312, 1994.
14. S. De Capitani. An authorization model for temporal
XML documents. In Proceedings of SAC’02, pages 1088–
1093, Madrid, Spain, 2002.
15. Natasha Drukh, Neoklis Polyzotis, Minos N. Garofalakis,
and Yossi Matias. Fractional XSKETCH synopses for
XML databases. In Second International XML Database
Symposium, XSym 2004, pages 189–203, 2004.
16. C. Dyreson and R. Snodgrass. Supporting valid-time indeterminacy. ACM Transactions on Database Systems,
23(1):1–57, 1998.
17. C.E. Dyreson. Observing transaction-time semantics
with TTXPath. In Proceedings of WISE 2001, pages 193–
202, 2001.
18. C.E. Dyreson, M.H. Bolen, and C.S. Jensen. Capturing
and querying multiple aspects of semistructured data. In
Proceedings of the 25th VLDB Conference, pages 290–
301, 1999.
19. O. Etzion, S. Jajodia, and S. Sripada (eds.). Temporal Databases: Research and Practice. Springer-Verlag,
LNCS 1399, 1998.
20. W. Fan and Jérôme Siméon. Integrity constraints for
XML. Journal of Computer and Systems Sciences, 66(1),
pages 254–291, 2003.
21. D. Florescu and D. Kossmann. Storing and querying
XML data using a RDBMS. IEEE Data Engineering
Bulletin, 22(3), pages 27–34, 1999.
22. C. Gao and R. Snodgrass. Syntax, semantics and query
evaluation in the τ XQuery temporal XML query language. Time Center Technical Report TR-72, 2003.
23. C. Gao and R. Snodgrass. Temporal slicing in the evaluation of XML queries. In Proceedings of the 29th International Conference on Very Large Data Bases, pages
632–643, Berlin, Germany, 2003.
24. M. Gergatsoulis and Y. Stavrakas. Representing changes
in XML documents using dimensions. In Proceedings of
the First Symposium on XML databases (XSym 2003),
pages 208–222, Berlin, Germany, 2003.
25. Roy Goldman and Jennifer Widom. Dataguides: Enabling query formulation and optimization in semistructured databases. In Proceedings of the 23rd International
Conference on Very Large Data Bases, pages 436–445,
1997.
26. F. Grandi. Introducing an annotated bibliography on
temporal and evolution aspects in the world wide web.
SIGMOD Record 33(2), pages 4–86, 2004.
27. F Grandi and F. Mandreoli. The valid web: an XML/XSL
infrastructure for temporal management of web documents. In Proceedings of the International Conference on
Advances in Information Systems, pages 294–303, 2000.
28. F. Grandi and F. Mandreoli. Effective representation
and efficient management of indeterminate dates. In
TIME’01, pages 164–169, 2001.
29. Hao He and Jun Yang. Multiresolution indexing of XML
for frequent queries. In Proceedings of the 20th International Conference on Data Engineering, pages 683–694,
2004.
30. H. Kaplan, T. Milo, and R. Shabo. A comparison of labeling schemes for ancestor queries. In Proceedings of
the thirteenth annual ACM-SIAM Symposium on Discrete Algorithms, pages 954–963, 2002.
31. Raghav Kaushik, Philip Bohannon, Jeffrey F. Naughton,
and Henry F. Korth. Covering indexes for branching
path queries. In Proceedings of the 2002 ACM SIGMOD
International Conference on Management of Data, pages
133–144, 2002.
32. Raghav Kaushik, Philip Bohannon, Jeffrey F. Naughton,
and Pradeep Shenoy. Updates for structure indexes. In
33
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
Proceedings of the 28th International Conference on Very
Large Data Bases, pages 239–250, 2002.
Raghav Kaushik, Pradeep Shenoy, Philip Bohannon, and
Ehud Gudes. Exploiting local similarity for indexing
paths in graph-structured data. In Proceedings of the 18th
International Conference on Data Engineering, pages
129–140, 2002.
Hartmut Liefke and Dan Suciu. XMILL: An efficient
compressor for XML data. In Proceedings of the 2000
ACM SIGMOD International Conference on Management of Data, pages 153–164, 2000.
M.G. Manukyan and L.A. Kalinichenko. Temporal XML.
In Proceedings of ADBIS, pages 581–590, Vilnius, Lithuania, 2001.
A. Marian, S. Abiteboul, G. Cobena, and L. Mignet.
Change-centric management of versions in an XML warehouse. In Proceedings of the 27th VLDB Conference,
pages 581–590, Rome, Italy, 2001.
Alberto O. Mendelzon, Flavio Rizzolo, and Alejandro
Vaisman. Indexing temporal XML documents. In Proceedings of the 30th International Conference on Very
Large Databases, pages 216–227, Toronto, Canada, 2004.
Tova Milo and Dan Suciu. Index structures for path expressions. In Proceedings of the 7th International Conference on Database Theory, pages 277–295, 1999.
Svetlozar Nestorov, Jeffrey D. Ullman, Janet L. Wiener,
and Sudarshan S. Chawathe. Representative objects:
Concise representations of semistructured, hierarchial
data. In Proceedings of the 13th International Conference on Data Engineering, pages 79–90, 1997.
B Oliboni, E. Quintarelli, and L. Tanca. Temporal aspects of semistructured data. Proceedings of the Eight International Symposium of Temporal Representation and
Reasoning, pages 119–127, 2001.
Neoklis Polyzotis and Minos N. Garofalakis. Statistical
synopses for graph-structured XML databases. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pages 358–369, 2002.
Neoklis Polyzotis and Minos N. Garofalakis. Structure
and value synopses for XML data graphs. In Proceedings
of the 28th International Conference on Very Large Data
Bases, pages 466–477, 2002.
Neoklis Polyzotis and Minos N. Garofalakis. XCLUSTER
synopses for structured XML content. In Proceedings of
the 22nd International Conference on Data Engineering,
2006.
Neoklis Polyzotis, Minos N. Garofalakis, and Yannis E.
Ioannidis. Approximate XML query answers. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pages 263–274, 2004.
Chen Qun, Andrew Lim, and Kian Win Ong. D(k)-index:
An adaptive structural summary for graph-structured
data. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pages
134–144, 2003.
Flavio Rizzolo and Alberto O. Mendelzon. Indexing XML
data with ToXin. In Proceedings of 4th International
Workshop on the Web and Databases, pages 49–54, 2001.
B. Salzberg and V. Tsotras. Comparison of access methods for time-evolving data. ACM Computing Surveys,
vol. 31,no. 2,pp 158-221, 1999.
N. Santoro and R. Khatib. Labelling and implicit routing in networks. The Computer Journal (28), pages 5–8,
1985.
Ralf Schenkel, Anja Theobald, and Gerhard Weikum.
HOPI: An efficient connection index for complex XML
document collections. In Proceedings of the 9th Conference on Extending Database Technology, pages 237–255,
2004.
Sleepycat Software. Berkeley DB Java Edition, 2006.
http://www.sleepycat.com/products/bdbje.html.
34
51. Richard Snodgrass. The TSQL2 Temporal Query Language. Kluwer Academic Publishers, 1995.
52. A. Tansel, J. Clifford, and S. Gadia (eds.). Temporal
Databases: Theory, Design and Implementation. Benjamin/Cummings, 1993.
53. I. Tatarinov, G. Ives, A. Halevy, and D. Weld. Updating XML. In Proceedings of ACM SIGMOD Conference,
pages 413–424, Santa Barbara, California, 2001.
54. P. Wadler. A formal semantics of patterns in XSLT. In
Markup Technologies, pages 183–202, Philadelphia, 1999.
55. F. Wang and C. Zaniolo. Temporal queries in XML document archives and web warehouses. In Proceedings of the
10th International Symposium on Temporal Representation and Reasoning (TIME’03), pages 47–55, Cairns,
Australia, 2003.
56. F. Wang and C. Zaniolo. XBiT: An XML-based bitemporal data model. In Proceedings of the 23rd International Conference on Conceptual Modeling, pages 810–
824, Shanghai, China, 2004.
57. F. Wang, X. Zhou, and C. Zaniolo. Efficient XML-based
techniques for archiving, querying and publishing the histories of relational databases. In Time Center TeEchnical
Report, 2005.
58. F. Wang, X. Zhou, and C. Zaniolo. Temporal XML? SQL
strikes back! In Proceedings of the 12th International
Symposium on Temporal Representation and Reasoning
(TIME’05), pages 47–55, Burlington, USA, 2005.
59. World
Wide
Web
Consortium.
XQuery
1.0:
An
XML
Query
Language,
2002.
http://www.w3.org/TR/2002/WD-xquery-20021115.
60. World Wide Web Consortium. XML Path Language
XPath 2.0, 2003. http://www.w3.org/TR/2003/WDxpath20-20030502.
61. Ke Yi, Hao He, Ioana Stanoi, and Jun Yang. Incremental
maintenance of XML structural indexes. In Proceedings
of the 2004 ACM SIGMOD International Conference on
Management of Data, pages 491–502, 2004.
Rizzolo, F. and Vaisman, A.