0% found this document useful (0 votes)
5 views

Query Languages for Graph Databases

Uploaded by

cesarefinmatica
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Query Languages for Graph Databases

Uploaded by

cesarefinmatica
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Query Languages for Graph Databases

Peter T. Wood
Department of Computer Science and Information Systems, Birkbeck, University of London

ptw@dcs.bbk.ac.uk

ABSTRACT case that edges are labelled in some way, sometimes


Query languages for graph databases started to be inves- with sets of attribute-value pairs. Similarly, in gen-
tigated some 25 years ago. With much current data, such eral nodes may be labelled with sets of attribute-
as linked data on the Web and social network data, be- value pairs. However, we will mostly limit our dis-
ing graph-structured, there has been a recent resurgence cussion to graphs in which each node is identified by
in interest in graph query languages. We provide a brief a distinct label (identifier) and each directed edge
survey of many of the graph query languages that have is labelled with a symbol drawn from some finite
been proposed, focussing on the core functionality pro- alphabet Σ; hence E ⊆ V × Σ × V .
vided in these languages. We also consider issues such Some application areas require graph structures
as expressive power and the computational complexity that are more elaborate than the simple model de-
of query evaluation. scribed above. For example, hypergraphs have been
used to model hypertext [63], while the hypernode
model [58] allows for nodes that can themselves
1. INTRODUCTION comprise graphs. In contrast, the so-called blobs
Graphs are widely used for representing data, with of the Hy+ system [19] comprise sets of nodes and
the result that a number of query languages for share some similarities with the blobs of higraphs
graphs have been proposed over the past few decades. [35]. In addition, a number of graph data models
In the 1980s motivating applications came from ar- require that each graph conforms to a schema. How-
eas such as hypertext systems [17, 63]. When semi- ever, for the purposes of this paper, we will assume
structured data [2, 12] and object databases [34] be- only the simple model described above; for details
came prominent in the 1990s, these provided fruitful on more elaborate graph data models, we refer the
areas for the study of graph models and query lan- reader to the survey by Angles and Gutiérrez [7].
guages. In the last decade, the semantic web [9, 57] Figure 1 shows an example of a graph G con-
and social networks [5, 24, 60, 61] have taken over forming to our simple definition, inspired by an
as key areas amenable to graph-based approaches. example from NAGA [43, 64]. Node labels in G
Further application areas for graph querying include denote names of authors, literary prizes and loca-
transportation networks [11], semantic associations tions. Edge labels denote the hasW on relation-
as part of criminal investigations [62] (also called ship between authors and prizes (abbreviated w),
link analysis), biological networks [46, 47, 48], pro- the bornIn relationship between authors and places
gram analysis [50], workflow and data provenance (abbreviated b), the livesIn relationship between
[6, 39]. authors and places (abbreviated i), and the located-
Each of the above application areas has its own In relationship between places. Note that there can
requirements in terms of an appropriate graph data be paths comprising locatedIn edges: for example,
model. In its simplest form a graph G is a pair Bacchus Marsh is a town locatedIn the state of Vic-
(V, E), where V is a finite set of nodes and E is toria which is locatedIn the country Australia.
a finite set of edges connecting pairs of nodes. Of A typical query Q on graph G might ask to find
course, edges can be directed or undirected, although authors who have won both the Booker and Nobel
prizes—a simple conjunctive query (CQ) returning
we will consider only the directed case here (which a set of nodes as answer. Query Q might be ex-
is more general). In most applications it is also the pressed using the following syntax
Database Principles Column. Column editor: ans(x) ← (x, hasW on, N obel), (x, hasW on, Booker)
Pablo Barceló, Department of Computer Science, Uni-
versity of Chile. E-mail: pbarcelo@dcc.uchile.cl. where x is interpreted as a node variable, while

50 SIGMOD Record, March 2012 (Vol. 41, No. 1)


b hasW on hasW on
Neruda Chile Coetzee Nobel x Booker

w i citizenOf |
w ans
w b ((bornIn | livesIn) · locatedIn∗ )

Nobel SouthAfrica Booker Australia y

b w
w Figure 2: Query to find places related to
w l
b l
authors who have won both the Nobel and
Gordimer Carey Bacchus Marsh Victoria Booker prizes.

Figure 1: A graph of authors, prizes they [2], StruQL [26] and UnQL [13], as well as in terms
have won, and places where they were born. of query containment [16, 28], query rewriting [15]
and so on. CRPQs reappeared more recently in
NAGA [43], for example, in a form similar to that
hasW on, N obel and Booker are constants. This
in Figure 2. The ability to query paths using regu-
syntax is reminiscent of Datalog, except that atoms
lar expressions, and hence provide the functionality
in the body do not have predicate names since only
of CRPQs, has only very recently been introduced
a single graph is being queried; indeed, the atoms
in SPARQL 1.1 [36].
are similar to the triple patterns used in SPARQL
However, for a number of problems arising in
[36], the W3C query language for RDF [44].
graph querying CRPQs are insufficiently powerful
It is common in querying graphs that users may
[10]. These problems include comparing semantic
want to find pairs (x, y) of nodes such that there is
associations in RDF graphs [8], comparing biologi-
a path from x to y whose sequence of edge labels
cal sequences [31], and so on. In such settings, we
matches some pattern. One way of specifying such
want to be able to express relations among paths.
a pattern is by using a regular expression defined
For example, the following query Q finds entities x
over the alphabet of edge labels [52]. Such a query is
and y such that the same sequence of edge labels π
called a regular path query (RPQ). So, using the ex-
connects x and y as connects Coetzee and y:
ample of Figure 1, an RPQ using the regular expres-
sion citizenOf | ((bornIn | livesIn) · locatedIn∗ ) (x, y) ← (Coetzee, π, y), (x, π, y), Σ∗ (π) (1)
asks for pairs of author x and place y such that
x is a citizen of y or was born in or lives in y, or Here, π is a path variable, and Σ∗ denotes any se-
where x was born in or lives in some place which quence of edge labels. Q is an example of an ex-
is connected to y by a sequence of any number of tended conjunctive regular path query (ECRPQ),
locatedIn relationships. as proposed in [10].
CQs and RPQs can be combined to form con- We might also want to include paths themselves
junctive regular path queries (CRPQs). For exam- in the output of a query. This has been proposed,
ple, the following query Q adds the conjuncts from for example, as an extension to SPARQL [45], and
the example CQ to those of the RPQ as follows: is also provided by ECRPQs; to return the paths
ans(x, y) as part of the answer to query Q above, one simply
← (x, hasW on, N obel), (x, hasW on, Booker) includes the path variable π in the head. Of course,
(x, (citizenOf | ((bornIn | livesIn) · locatedIn∗ )), y) if there are cycles in the input graph, the answer to
a query may be infinite. In such cases a compact
CRPQs formed the basis of the languages G [21] representation of the set of answers to an ECRPQ
and GraphLog [17], although those languages used can be returned in the form of an automaton [10].
a syntax of graph patterns. Figure 2 shows how the ECRPQs are described in more detail in Section 3.3.
CRPQ Q above would be expressed in GraphLog, Requirements arising from querying and analysing
where there is an obvious mapping between the social networks bring the need for further capabili-
edges in the graph pattern and the atoms in the ties to be provided by graph query languages [5, 24,
body of Q. The thick edge in Figure 2 is called the 60, 61]. In particular, aggregation functions play an
distinguished edge, representing edges which occur essential role in network analysis, while the ability
in the answer of the query; hence, it corresponds to to transform networks by creating new nodes based
the head of Q. on (aggregations of) sets of existing nodes is also
CRPQs were much studied with respect to query- crucial. We discuss these requirements further, and
ing semistructured data in languages such as Lorel provide examples, in Sections 3.4 and 3.5.

SIGMOD Record, March 2012 (Vol. 41, No. 1) 51


With the proliferation of data on the Web (e.g., quences of node and edge types. GraphDB includes
in the form of linked data), it is less likely that users object-oriented features such as classes for nodes
will be familiar with the terms and structure used, and edges, as well as paths. The intended area of
which in any case will also be more heterogeneous. application for GRAM was Hypertext, while that
In such situations, queries that permit flexible [42, for GraphDB was spatial networks such as trans-
51] or approximate matching [30, 40] of data may portation systems. As a result, GraphDB provides
be helpful. We consider this further in Section 3.6. built-in operators such as one for shortest path.
After a brief survey of many graph query lan- GOOD is another graph query language based on
guages in the next section, we focus on the core an object-oriented model [33, 34]. GOOD’s query-
functionality provided by such languages in Section ing mechanism is via graph transformations: node
3. This section covers subgraph matching (Sec- addition/deletion and edge addition/deletion. Also
tion 3.1), finding nodes connected by paths (Sec- provided is an abstraction mechanism to group ob-
tion 3.2), comparing and returning paths (Section jects by means of their properties, as well as meth-
3.3), aggregation (Section 3.4), node creation (Sec- ods for defining sequences of operations. GOOD
tion 3.5), and approximate matching and ranking gave rise to a number of successor languages, such
(Section 3.6). Section 4 covers the expressive power as G-Log [56] and the update language GUL [38].
of languages and the computational complexity of A number of query languages were developed to
query evaluation. We conclude the paper in Sec- query graphs represented in the Object Exchange
tion 5. We do not cover a number of other topics Model (OEM) [55] or one of its variants/derivatives.
of interest such as query containment or query op- OEM was developed to model semistructured data
timisation or evaluation in general. which had no predefined schema and could be het-
erogeneous. The Lore (Lightweight Object Repos-
itory) graph data model and its associated query
2. A BRIEF SURVEY
language Lorel [2] distinguish two types of nodes:
In this section we give a brief overview of some of complex objects and atomic objects (values) which
the graph query languages developed over the past have no outgoing edges. A graph must have a num-
25 years or so. In particular, we highlight the dif- ber of named nodes (or entry points), and every
ferent syntax used by various languages, as well as node must be reachable from a named node.
their proposed area of application. Section 3 dis- In common with many graph query languages,
cusses the functionality underlying these languages Lorel uses a syntax based on OQL, allowing regular
in more detail, while the expressive power and com- expressions over edge labels. A distinctive feature
plexity of evaluating queries in some of the lan- of Lorel is the availability of path variables.
guages is presented in Section 4. Say we wish to formulate a Lorel query Q equiva-
We have already mentioned the query languages lent to the ECRPQ shown in (1). In Lore, each node
G [21] and GraphLog [17]. The data model used by has an oid rather than a label as in Figure 1, so au-
these languages is that of a labelled, directed graph. thors’ names, for example, would be represented by
In G, a query is a set of pairs of graphs, each pair separate atomic objects connected to author nodes
comprising a pattern graph and a summary graph. by an edge labelled name, say. We also assume that
This pair of graphs essentially represents a CRPQ, Winners is a named entry point, with edges labelled
with a set of such pairs being interpreted as dis- author to author nodes. Then Q can be written as
junction. GraphLog replaced the summary graph
with a distinguished edge, as shown in Figure 2. select X, Y
GraphLog also added edge inversion, negation and from Winners.author A, Winners.author X
aggregation functions (Section 3.4), while defining A.#@P.Y , X.#@Q.Y
a semantics different from that of G. The semantics where A.name = ‘Coetzee’
of G was defined in terms of matching simple paths and path-of (P ) = path-of (Q)
in the graph being queried (Section 3.2), whereas where # denotes a path of any length, so is equiv-
the meaning of a GraphLog query was given by the alent to the regular expression Σ∗ used earlier. @P
meaning of the stratified Datalog program to which binds the path (of oids and labels) to variable P ,
it was translated (examples are given in Sections 3.2 and path-of returns a sequence of edge labels.
and 3.4). Strudel is another system whose data model is
Other early graph query languages include GRAM based on the OEM data model [26]. Its intented
[4] and GraphDB [32]. The data models of both area of application is the implementation of data-
require the presence of a graph schema. Both pro- intensive web sites, whose content and structure is
vide regular expressions defined over alternating se- specified using the query language StruQL. The

52 SIGMOD Record, March 2012 (Vol. 41, No. 1)


language is compositional; the result of a query on gation operators applied to them. BiQL [24] uses an
a site graph is another site graph. Once again, reg- SQL-like syntax to query and transform networks.
ular path expressions are used, but because there New networks are formed using a CREATE clause.
is a need to create graphs corresponding to web Aggregation functions can be used in the WHERE
sites, new constructs are needed. These include the clause to restrict results, as well as in the SELECT
link clause which creates a new graph from existing clause to define new attribute values. SNQL [61]
graphs, using Skolem terms for new nodes. uses a data model comprising actors, relationships
UnQL is a functional query language for semistruc- and attribute values (all represented as nodes), with
tured data based on structural recursion [13]. The edges associating attribute values with actors or re-
data model used somewhat different to that of OEM, lationships and associating actors with the relation-
being value-based rather than object-based. UNQL ships in which they participate. The query language
queries are translated to an internal algebra, Un- is based on GraphLog, adding Skolem functions in
CAL, which allows for optimisation. Once again, order to create new nodes in the output. These new
regular path patterns are provided, but the func- nodes are based on grouping or aggregating nodes
tional nature of UnQL means that graphs can be or attribute values in the input (see Section 3.5).
constructed, using data constructors, rather than
only queried. UnQL’s data model includes special 3. QUERY LANGUAGE FUNCTIONAL-
symbols called markers, which are related to object ITY
identifiers in models such as OEM, except that not
every node in a graph has to have a marker. Each In this section, we focus on the functionality pro-
graph also has certain nodes designated as inputs vided by typical graph query languages. The follow-
and certain nodes designated as outputs. ing subsections will consider functionality in terms
YAGO/NAGA combines database and informa- of the following broad categories: subgraph match-
tion retrieval techniques to provide a semantic search ing, finding nodes connected by paths, comparing
engine for web derived knowledge [64]. The NAGA and returning paths, aggregation, node creation,
data model is a directed, weighted multigraph in and approximate matching and ranking. Of course,
which nodes represent entities, edges represent re- many languages offer operations such as union (dis-
lationships, and weights represent confidence of ex- junction), composition and negation of queries, but
tracted facts. A query is a connected, directed we will not cover these separately.
graph in which each edge is labelled with a regu- We start with some general notation and defini-
lar expression over edge labels or a variable or the tions. Let G = (V, E) be a graph as defined in
connect keyword (similar to Figure 2). A query Section 1. Given a query expression Q and a graph
using connect returns the paths connecting the cor- G, the evaluation of Q on G is denoted Q(G).
responding nodes. Answers to queries are ranked in Let us call the following question the query eval-
terms of informativeness, confidence and compact- uation problem (QEP). Given a query expression Q
ness (e.g., short paths rank higher than long paths). and a graph G, is Q(G) non-empty? As usual, one
SocialScope [5] aims to provide information dis- can consider the complexity of this problem by pos-
covery and presentation from social content sites sibly fixing one of the two inputs. Combined com-
such as Yahoo! Travel. To do this, it proposes a plexity corresponds to when both Q and G are part
uniform algebraic framework operating on the so- of the input. Query complexity is when the input
cial content graph, a graph in which both nodes is Q, with G being fixed, while data complexity is
and edges have attributes. The algebra provides when the input is G, with Q being fixed. We often
operations of node selection, edge selection, graph consider data complexity to be the most relevant
union, intersection and difference, graph composi- measure since graphs are assumed to be large and
tion and semi-join, and aggregation functions for query expressions short.
node aggregation and edge aggregation.
3.1 Subgraph matching
Other query languages for social networks include
SoSQL [60], BiQL [24] and SNQL [61]. The two ba- In some sense, the simplest form of graph query
sic query structures in SoSQL are paths and groups supported by all languages is one which finds sub-
(sets of nodes). Paths can have predicates and ag- graphs within a graph. This corresponds to a con-
gregate functions applied to them. Path predicates junctive query (CQ). Let us fix a countable set of
can include path operators such as all or at most n, node variables (typically denoted by x, y, z, . . .). A
specifying that all or at most n nodes or edges sat- We will from now on not usually distinguish between
isfy a given predicate. Groups can also have aggre- an expression in a query language and the query (func-
tion) it denotes, simply using the term “query” for both.

SIGMOD Record, March 2012 (Vol. 41, No. 1) 53


conjunctive query (CQ) Q over a finite alphabet Σ of such a path ρ, denoted by λ(ρ), is the string
is an expression of the form: a0 · · · am−1 ∈ Σ∗ . The length of ρ is m. We also
� define the empty path as (v, ε, v) for each v ∈ V ;
ans(z1 , . . . , zn ) ← (xi , ai , yi ) (2) the label of such a path is the empty string ε.
1≤i≤m
Regular path queries Determining reachability
such that m > 0, each xi and yi is a node variable
between nodes in a graph is a querying mechanism
or constant (1 ≤ i ≤ m), each ai ∈ Σ (1 ≤ i ≤ m),
found in most graph query languages. The class of
and each zi is some xj or yj (1 ≤ i ≤ n, 1 ≤ j ≤ m).
regular path queries [15, 21, 49, 52] provides queries
The atom ans(z1 , . . . , zn ) is the head of the query,
which return all pairs of nodes in a graph connected
while the expression on the right of the arrow is its
by a path conforming to some regular expression. A
body. The query Q is Boolean if its head is of the
regular path query (RPQ) Q is an expression of the
form ans(), i.e. n = 0.
form
Let x̄ = (x1 , . . . , xm ), ȳ = (y1 , . . . , ym ) and z̄ =
(z1 , . . . , zn ). The semantics of CQs Q of the form ans(x, y) ← (x, r, y) (3)
(2) are defined as follows. Let σ be a mapping from
where x and y are node variables, and r is a reg-
x̄, ȳ to the set of nodes of a graph G = (V, E) which
ular expression over Σ. Here we use | for alter-
is the identity on constants. We define a relation
nation (disjunction) and · for concatenation. We
(G, σ) |= Q which holds iff (σ(xi ), ai , σ(yi )) ∈ E,
also allow the shorthands of r+ for (r · r∗ ), r? for
for 1 ≤ i ≤ m. Then Q(G) is the set of tuples σ(z̄)
(r|�), and Σ for (a1 | · · · |an ). We may also use a−
such that (G, σ, ) |= Q. If Q is Boolean, we let Q(G)
to match an edge labelled a in the reverse direc-
be true iff (G, σ) |= Q for some σ.
tion, i.e., from head to tail rather than from tail
Using the example graph G from Figure 1, the
to head [16]. For example, the regular expression
following CQ finds authors born in South Africa
citizenOf | ((bornIn | livesIn)·locatedIn∗ ) is used
who have won both the Nobel and Booker prizes:
in the example query in Figure 2.
ans(x) ← (x, hasW on, N obel), Let G be a graph, r be a regular expression, and
(x, hasW on, Booker), ρ be a path in G. Path ρ satisfies r if λ(ρ) ∈ L(r),
(x, bornIn, SouthAf rica) the language denoted by r. Given an RPQ Q of the
form given in (3), the answer of Q on G, denoted
Each CQ of the form shown in (2) is formulated
Q(G), is the set of all pairs of nodes (x, y) in G such
with respect to a single graph and returns a set of
there is a path from x to y which satisfies r.
bindings for each node variable mentioned in the
The Regular Path Problem is, given a query
head of the query. However, in some application
Q of the form given in (3), a graph G and a pair of
areas, the database to be queried comprises a set of
nodes x and y, is (x, y) ∈ Q(G)? It is well known
graphs, and the answer to a query is the subset of
that this problem can be solved efficiently [52]. One
graphs in which a match is found (e.g., in biolog-
algorithm is as follows: (i) construct a nondetermin-
ical applications). Other languages allow single or
istic finite automaton (NFA) Mr , with initial state
multiple graphs to be queried and return the set of
s0 and final state sf , accepting L(r); (ii) consider
matching subgraphs [37, 43].
G as an NFA with initial state x and final state y;
Although CQs are in some sense the simplest
(iii) form the product automaton Mr ×G of Mr and
form of graph queries, they have been the subject of
G; and (iv) determine whether there is a path from
much study, particularly in terms of finding efficient
(s0 , x) to (sf , y) in Mr × G. Each step of this algo-
ways of evaluating them on large graphs. This is be-
rithm can be performed in Ptime, so the Regular
cause the combined complexity of the QEP for CQs
Path Problem has Ptime combined complexity.
is the same as the problem of subgraph ismorphism,
Alternatively, we can translate Q into a set of
which is well-known to be NP-complete. Because of
Datalog rules. So if Q uses the regular expression
this, Fan et al. have been investigating an alterna-
citizenOf | ((bornIn | livesIn) · locatedIn∗ ) from
tive semantics for graph pattern matching based on
Figure 2, the translation might be as follows:
graph simulation [25].
ans(x, y) ← citizenOf (x, y)
3.2 Finding nodes connected by paths ans(x, y) ← assoc(x, y)
Let G = (V, E) be a graph over alphabet Σ, with ans(x, y) ← assoc(x, z), partOf (z, y)
v0 , vm ∈ V . A path ρ between nodes v0 and vm in assoc(x, y) ← bornIn(x, y)
G is a sequence v0 a0 v1 a1 v2 · · · vm−1 am−1 vm , where assoc(x, y) ← livesIn(x, y)
m ≥ 0, vi ∈ V (1 ≤ i ≤ m), ai ∈ Σ (1 ≤ i < partOf (x, x) ← locatedIn(x, x)
m), and (vi , ai , vi+1 ) ∈ E (1 ≤ i < m). The label partOf (x, y) ← locatedIn(x, z), partOf (z, y)

54 SIGMOD Record, March 2012 (Vol. 41, No. 1)


As an alternative to the semantics defined above, sk , if the length of sk is at least i, or ⊥ otherwise. In
we might instead want to match only simple paths other words, we pad shorter strings with the sym-
in G that satisfy the regular expression r. A path bol ⊥, and then view the n strings as one string
ρ is simple if no node is repeated on ρ. The Regu- over the alphabet of n-tuples of letters. An n-ary
lar Simple Path Problem can then be stated as, relation S on Σ∗ is regular if the set {[s̄] | s̄ ∈ S}
given a graph G, a pair of nodes x and y in G and of strings over alphabet (Σ⊥ )n is regular (i.e., ac-
a regular expression r, is there a simple path from cepted by an automaton over (Σ⊥ )n , or given by a
x to y satisfying r? It turns out, however, that the regular expression over (Σ⊥ )n ). We shall often use
Regular Simple Path Problem is NP-complete, the same letter for both a regular expression over
even for fixed regular expressions [52]. (Σ⊥ )n and the relation over Σ∗ it denotes, as doing
so will not lead to ambiguity.
Conjunctive regular path queries Extend-
In additional to the set of node variables defined
ing regular path queries by allowing conjunctions of
for CRPQs, we now also fix a countable set of path
atoms yields the class of conjunctive regular path
variables (denoted by π, ω, χ, . . .). An extended con-
queries [16, 28]. A conjunctive regular path query
junctive regular path query (ECRPQ) Q over Σ is
(CRPQ) Q over Σ is an expression of the form
an expression of the form:
� � �
ans(z1 , . . . , zn ) ← (xi , ri , yi ) (4) ans(z̄, χ̄) ← (xi , πi , yi ), Rj (ω̄j ), (5)
1≤i≤m
1≤i≤m 1≤j≤t
in which all variables are as for a CQ, except that such that
each ri is now a regular expression over Σ. The
query of Figure 2 corresponds to a CRPQ. (i) m > 0, t ≥ 0,
Let x̄, ȳ and z̄ be defined as for CQs, and G = (ii) each Rj is a regular expression that defines a
(V, E) be a graph. The semantics of CRPQs Q regular relation over Σ,
of the form (4) are defined analogously to that for
(iii) x̄ = (x1 , . . . , xm ) and ȳ = (y1 , . . . , ym ) are tu-
CQs, with σ being a mapping from x̄, ȳ to V . The
ples of node variables,
relation (G, σ) |= Q holds iff, for 1 ≤ i ≤ m, there
exists a path ρi in G from σ(xi ) to σ(yi ) such that (iv) π̄ = (π1 , . . . , πm ) is a tuple of distinct path
λ(ρi ) ∈ L(ri ). Now Q(G) is defined as for CQs. variables,
(v) {ω̄1 , . . . , ω̄t } are distinct tuples of path vari-
3.3 Comparing and returning paths ables, such that each ω̄j is a tuple of variables
As suggested in Section 1, there are a number in from π̄, of the same arity as Rj ,
situations in which we may want to specify relations
(vi) z̄ is a tuple of node variables from x̄, ȳ, and
between paths as well as to have the actual path(s)
connecting two nodes returned as query answers, (vii) χ̄ is a tuple of path variables from those in π̄.
whether to find connections in linked data on the The semantics of ECRPQs is defined by a natu-
web (DBPedia, Freebase), for analysis in social or ral extension of the semantics of CRPQs. For an
other networks, or to determine data provenance. ECRPQ Q of the form (5), a graph G = (V, E) and
Providing both of these capabilities gives rise to mappings σ from node variables to V and µ from
the class of extended CRQPs (ECRPQs), introduced path variables to paths, we write (G, σ, µ) |= Q if
in [10] from where most of the material in this sub-
section is derived. ECRPQs extend the class of CR- • µ(πi ) is a path in G from σ(xi ) to σ(yi ), for
PQs in two ways. Firstly, they allow free path vari- 1 ≤ i ≤ m, and
ables in the heads of queries. Secondly, they permit • for each ω̄j = (πj1 , . . . , πjk ), the tuple of strings
checking relations on sets of paths in the bodies of consisting of labels of µ(πj1 ), . . . , µ(πjk ) be-
queries, rather than simply conformance of individ- longs to the relation Rj .
ual paths to regular languages.
We first define the notion of regular relations over The answer of Q on G is defined as
Σ, which are used in ECRPQs. Let ⊥ be a sym-
Q(G) = {(σ(z̄), µ(χ̄)) | (G, σ, µ) |= Q}.
bol not in Σ. We denote the extended alphabet
(Σ ∪ {⊥}) by Σ⊥ . Let s̄ = (s1 , . . . , sn ) be an n- We now present some examples, taken from [10].
tuple of strings over alphabet Σ. We construct a In a query language for RDF/S introduced in [8],
string [s̄] over alphabet (Σ⊥ )n , whose length is the paths can be compared based on specific semantic
maximum of the sj ’s, and whose i-th symbol is a tu- associations. Edges correspond to RDF properties
ple (c1 , . . . , cn ), where each ck is the i-th symbol of and paths to property sequences. A property a can

SIGMOD Record, March 2012 (Vol. 41, No. 1) 55


be declared to be a subproperty of property b, which hasW on ans
y x count(y)
we denote by a ≺ b. Two property sequences u
and v are called ρ-isomorphic if and only if u =
u1 , . . . , un and v = v1 , . . . , vn , for some n, and ui ≺ Figure 3: GraphLog query to count the num-
vi or vi ≺ ui for every i ≤ n [8]. Nodes x and y are ber of prizes for each author.
called ρ-isoAssociated iff x and y are the origins of
two ρ-isomorphic property sequences. sp(min(sum(d))
Finding ρ-isoAssociated nodes cannot be expressed
using a CRPQ, not least because doing so requires
x y
checking that two paths are of equal length. How-
dist( )+ (d)
ever, pairs of ρ-isomorphic sequences can be ex-
pressed using � �the regular relation �R given by the

expression Figure 4: GraphLog shortest path query
a,b∈Σ: (a≺b∨b≺a) (a, b) . An ECRPQ
returning pairs of nodes x and y that are ρ-iso-
Associated can then be written as follows:
complex ones for computing the eccentricity of a
ans(x, y) ← (x, π1 , z1 ), (y, π2 , z2 ), R(π1 , π2 ) node, the distance between pairs of nodes, or the
diameter of a graph. To formulate queries which
Path variables in an ECRPQ can also be used to
return the values of such properties, we need aggre-
return the actual paths found by the query, a mech-
gation operators such as count, sum, min and max.
anism found in the query languages proposed in [2,
Aggregation was available in early graph query
8, 39, 45]. For instance, SPARQLeR [45] intro-
languages such as G+ [22] and GraphLog [20]. It is
duces path variables and regular expressions into
also a feature of query languages for social networks,
the SPARQL query language, allowing paths to be
such as [24, 60, 61]. GraphLog and SNQL [61] have
output. As an example, the SPARQLeR query that
semantics that are based on Datalog with aggrega-
returns every path between the RDF resources r
tion. For consistency with the rest of the paper,
and s, provided the path includes the resource e,
we will use the language of CRPQs extended with
can be expressed by the ECRPQ
aggregation functions, denoted CRPQagg , without
ans(π1 , π2 ) ← (r, π1 , e), (e, π2 , s) giving a formal definition.
where π1 and π2 are the actual paths matched. In GraphLog [20], aggregate terms are allowed
Regular expressions with backreferencing (REBRs) in the label of a distinguished edge or distinguished
[3], as provided by egrep and Perl, for example, ex- node. The simple query in Figure 3 counts, for each
tend regular expressions by including expressions author x, the number of prizes y they have won.
of the form (r)%X, where r is a regular expres- The equivalent CRPQagg is:
sion and X is a variable, which binds a string w ∈ ans(x, count(y)) ← (x, hasW on, y)
L(r) to X. Subsequent uses of X in the expression
then match w. REBRs can denote non-regular lan- In general, GraphLog and SNQL queries are trans-
guages [3]. Although ECRPQs can mimick some of lated into recursive Datalog programs, and com-
this functionality over paths in a graph, it was re- bining recursion with aggregation can lead to non-
cently shown that ECRPQs cannot express all RE- termination. Hence, care is taken to ensure that the
BRs [29]. On the other hand, ECRPQs can match recursive rules that are produced perform transi-
patterns, such as an bn cn , where a, b, c ∈ Σ and tive closure computations over closed semirings [18],
n ∈ N, that cannot be denoted by REBRs, by using such as formed by the operators min and sum in
an equal-length (el) predicate as follows: computing shortest paths, as illustrated below.
Finding the length of the shortest path between
ans(x, y) ← (x, π1 , z1 ), (z1 , π2 , z2 ), (z2 , π3 , y),
each pair of nodes requires that we summarise val-
a∗ (π1 ), b∗ (π2 ), c∗ (π3 ),
ues (i.e., distances) along a path (by summing them)
el(π1 , π2 ), el(π2 , π3 ),
� and then aggregate these summarised values (by
where el(π, π � ) is shorthand for ( a,b∈Σ (a, b))∗ (π, π � ). finding the minimum). This is what the GraphLog
query Q in Figure 4 does. The variable d in Q is
3.4 Aggregation a collecting variable, sum is a summarising func-
Determining various properties of graphs requires tion, and min is used to aggregate the summarised
computation that goes beyond matching and path distances. Query evaluation is in Ptime if sum-
finding. Such properties range from simple compu- marisation and aggregation operators form a closed
tations to determine the degrees of nodes to more semiring [18]. Query Q might be translated to the

56 SIGMOD Record, March 2012 (Vol. 41, No. 1)


following Datalog program [20]: population attributes. Although SNQL provides a
graph-based syntax, the following Datalog-like pro-
len(x, x, x, 0) ← dist(x, y, l)
gram shows a simplification of how such a query
len(x, x, x, 0) ← dist(y, x, l)
might be translated
len(x, z, y, d) ← sp(x, z, s), dist(z, y, l), d = s + l
sp(x, y, min(d)) ← len(x, z, y, d) ans(f (c), isa, city) ← temp(p, c)
Predicate len(x, z, y, d) specifies that there is a path ans(f (c), name, c) ← temp(p, c)
of length d = s + l from x to y via z, where the ans(f (c), population, count(p)) ← temp(p, c)
length of the shortest path from x to z is s and the temp(p, c) ← (p, isa, person), (p, livesIn, c)
distance from z to y is l. Here, f is a Skolem function and count is an ag-
As mentioned in Section 2, SocialScope [5] pro- gregate function. In general, more than one Skolem
vides node aggregation and edge aggregation. The function may be needed in an SNQL query.
result of aggregation can be represented as a new
attribute of a node or edge in the graph. With the 3.6 Approximate matching and ranking
help of a user-defined function for defining similar- In many applications, users may not be famil-
ity between users, [5] shows how aggregate func- iar with the graph structure being queried, its con-
tions can be used to express collaborative filtering straints or edge labels. As a result, they may for-
on travel recommendations. mulate queries which return no answers or fewer
3.5 Node creation answers than they expected. Early work to address
such problems when querying semistructured data
In languages such as GraphLog the answer to a was done by Kanza and Sagiv [42], who proposed a
query is a graph, where the edges and their labels form of flexible querying based on a notion of home-
can be new, but the nodes are drawn from those omorphism between a query and a graph.
of the graph being queried. However, in a number In this section, we will consider a more general
of applications, there is a need for the output of a notion of approximate matching of paths [30, 40],
query to contain nodes that were not part of the where the results can be ranked in terms of their
input. This requirement appears in web site man- “closeness” to the original query. Consider a regular
agement (and hence in StruQL [26]) and in social path query Q of the form given in (3), using regu-
network analysis and transformation (and hence in lar expression r. Approximate matching is achieved
BiQL [24] and SNQL [61]), for example. More gen- by applying edit operations to L(r). Possible edit
eral graph creation is also a feature of languages operations include insertion, deletion, substitution,
that adopt a functional or algebraic approach such transposition and inversion of symbols. For sim-
as UnQL [13], GraphQL [37], and GOOD [34]. plicity, we will only consider the first three of these
One approach to supporting node creation, fol- here.
lowed in a number of languages, is to adopt a mech- Let x, y ∈ Σ∗ . Applying an edit operation to
anism equivalent to that of Skolem functions, first x yielding y can be modelled as a binary relation
used for semistructured data in the Mediator Spec- ❀ over Σ∗ such that x ❀ y holds iff there exist
ification Language MSL [54]. In a query language u, v ∈ Σ∗ , a, b ∈ Σ, with a �= b, such that one of the
supporting recursion, it is important to ensure that, following is satisfied:
where possible, node creation and recursion do not
combine to yield non-termination. x = uav, y = ubv (substitution)
Recall that BiQL [24] can create new graphs from x = uav, y = uv (deletion)
existing graphs. To do this new object identifiers x = uv, y = ubv (insertion)
are needed. They define semantics of their basic k
language in terms of a translation to Datalog, and Let ❀ stand for the composition of ❀ with itself k
then extend this by means of a single Skolem func- times. The edit distance de (x, y) between x and y
tion to represent new object identifiers. is the minimum number k of edit operations such
k
As mentioned in Section 2, SNQL [61] also uses that x ❀ y.
Skolem functions to represent new nodes in the out- Each operation may have a different cost. In gen-
put of a query. Assume that we have a network eral, different instances of the same operation may
representing people and the cities in which they have different costs. For example, a user may be
live. City names are represented as values of the prepared to substitute taxi by train at a cost of
livesIn attribute associated with people. We want one, but taxi by bus at a cost of two.
to transform this network into one in which cities More generally, Grahne and Thomo study ap-
become actors (rather than values), with name and proximate matching of RPQs in [30], where they

SIGMOD Record, March 2012 (Vol. 41, No. 1) 57


assume that approximations are specified by means ible by conjunctive queries, regular path queries,
of a weighted regular transducer. Such a transducer conjunctive regular path queries and extended con-
can be represented by a regular expression defined junctive regular path queries by CQ, RPQ, CRPQ
over triples of the form (a, k, b), which specify that and ECRPQ, respectively. Furthermore, let FO de-
a can be replaced by b with cost k. note the class of queries expressible in first-order
A weighted transducer T = (ST , Σ, σT , S0T , FT ) logic (relational calculus or algebra). Then we have
comprises a finite set of states ST , an input/output
CQ ⊂ FO
alphabet Σ, a set of initial states S0T , a set of final
states FT , and a transition relation σT ⊆ ST × Σ × and
N × Σ × ST . A transition (s, a, k, b, t) specifies that RPQ ⊂ CRPQ ⊂ ECRPQ.
if the transducer is in state s and reads symbol a,
it outputs symbol b at cost k and moves to state t. For relating these latter classes with FO and with
As stated in Section 3.2, we can construct an NFA Datalog, we need some further definitions.
Mr from the regular expression r in query Q and The language of first-order logic with transitive
can consider graph G as an NFA as well. Now we closure, denoted FO + TC (introduced by Immer-
can form the product automaton Mr × T × G (see man in [41]) extends first-order logic with formulas
[30] for details). Then (a, b, k) ∈ ansT (Q, G) iff of the form T C(λx̄, ȳ.φ(x̄, ȳ)), where x̄ are ȳ are k-
the shortest path from an initial state ( , , a) to a tuples, and φ(x̄, ȳ) is a formula in FO+TC denoting
final state ( , , b) in Mr × T × G has cost k. As a binary relation on k-tuples. Then T C(λx̄, ȳ.φ(x̄, ȳ))
noted in [30], if we are interested in nodes reachable denotes the transitive closure of φ.
from a limited number of source nodes, we can use A linear Datalog program is one in which each
Dijkstra’s shortest path algorithm to return answers rule has at most one recursive subgoal. A strati-
in increasing order of cost. Furthermore, we can fied Datalog program is one in which use of negated
construct the product automaton incrementally. predicates is stratified. Let SL-Datalog and Graph-
The simpler setting in which approximation is log denote the sets of queries expressible as strat-
captured by edit operations, all with cost one, can ified linear Datalog programs and in the language
be captured by a transducer T with a single state s GraphLog (without aggregation), respectively. Then
and transitions from s to s labelled: we have the following result [18]:

• (a, 1, ε), for each a ∈ Σ (deletion), FO + TC = Graphlog = SL-Datalog


• (ε, 1, a), for each a ∈ Σ (insertion), and Similar expressive power is exhibited by StruQL
• (a, 1, b), for a, b ∈ Σ, a �= b (substitution). and UnQL. A theorem from [26] states that the clo-
sure of StruQL under composition expresses pre-
Alternatively, in [40] ideas from approximate string
cisely the Boolean queries expressible in FO + TC,
matching [65] can be used to construct an approx-
while a theorem from [13] shows that all UnCAL
imate NFA MQ from Q, from which the product
queries can be expressed in FO+TC (where UnCAL
of MQ and MG can be traversed. Extending ap-
is the algebra associated with UnQL). Now Immer-
proximate matching from RPQs to CRPQs, as well
man’s result tells us that all GraphLog, StruQL
as adding an inversion edit operation for reversing
and UnQL queries can be computed in NLogspace.
the traversal of a graph edge, was studied in [40,
Adding aggregation operators to GraphLog in the
59]. Both the approximate NFA and the product
form of closed semirings leaves query evaluation in
automaton can be constructed incrementally, while
NLogspace, as long as the summarisation opera-
any rank-join algorithm can be used on the con-
tors are in {min, max, +} and aggregation opera-
juncts in order to return answers in increasing over-
tors are in {min, max}. However, when summari-
all distance from the original CRPQ. This results in
sation can include × and aggregation includes +,
an algorithm with Ptime combined complexity if
as needed to express the parts explosion query for
the conjuncts are acyclic and there is a fixed num-
example, then query evaluation is in NC2 [20]
ber of head variables [40].
Further study of the expressiveness of stratified
aggregation in Datalog was undertaken in [53]. They
4. EXPRESSIVE POWER AND COMPU- show that (1) Datalog extended with stratified nega-
TATIONAL COMPLEXITY tion cannot express a query to count the number
We now consider some results on the expressive of paths between every pair of nodes in an acyclic
power of graph query languages and the complexity graph, (2) Datalog extended with stratified nega-
of the QEP for such languages. For simplicity of no- tion and arithmetic on integers (the + operator) can
tation, let us denote the classes of queries express- express all computable queries on ordered domains,

58 SIGMOD Record, March 2012 (Vol. 41, No. 1)


and (3) Datalog extended with stratified negation some 25 years ago. I would also like to thank all
and generic function symbols can express all com- the colleagues with whom I have collaborated.
putable queries (on ordered and unordered domains).
Recently, Fletcher et al. [27] considered the rel- 6. REFERENCES
ative expressive power of navigational graph query [1] S. Abiteboul and P. C. Kanellakis. Object identity as
languages built from the operators identity, union, a query language primitive. J. ACM, 45(5):798–842,
composition, intersection, set difference, projection, 1998.
[2] S. Abiteboul, D. Quass, J. McHugh, J. Widom, and
co-projection (values x such that there does not ex-
J. L. Wiener. The LOREL query language for
ist a y such that (x, y) is in the result of some ex- semistructured data. Int. J. Digit. Libr., 1(1):68–88,
pression), converse (i.e., inverse), transitive closure, April 1997.
and the diversity relation (all pairs of unequal con- [3] A. V. Aho. Algorithms for finding patterns in strings.
In Handbook of Theoretical Computer Science,
stants in the active domain). They provide a com- Volume A: Algorithms and Complexity (A), pages
plete comparison in terms of expressive power, both 255–300. Elsevier and MIT Press, 1990.
for path queries and Boolean queries. [4] B. Amann and M. Scholl. GRAM: A graph data model
and query language. In ECHT, pages 201–211, 1992.
The GOOD query language provides for greater [5] S. Amer-Yahia, L. V. S. Lakshmanan, and C. Yu.
expressive power than the other languages consid- SocialScope: Enabling information discovery on social
ered above. In [33, 34], it is shown that GOOD content sites. In CIDR, 2009.
restricted to node addition/deletion and edge addi- [6] M. K. Anand, S. Bowers, and B. Ludäscher.
Techniques for efficiently querying scientific workflow
tion/deletion is relationally complete. Adding ab- provenance graphs. In EDBT, pages 287–298, 2010.
straction gives the expressive power of the nested [7] R. Angles and C. Gutiérrez. Survey of graph database
relational algebra, while the full language including models. ACM Comput. Surv., 40(1), 2008.
[8] K. Anyanwu and A. P. Sheth. ρ-queries: enabling
methods is Turing-complete. querying for semantic associations on the semantic
The invention of values or object identifiers (oids) web. In WWW, pages 690–699, 2003.
also adds power, as shown in database query lan- [9] M. Arenas, C. Gutiérrez, and J. Pérez. An extension
guages such as IQL [1] and ILOG [14]. By relying of SPARQL for RDFS. In SWDB-ODBIS, pages 1–20,
2007.
on rules that use both recursion and oid invention, [10] P. Barceló, C. A. Hurtado, L. Libkin, and P. T. Wood.
it can be shown that such languages can express all Expressive languages for path queries over
computable database queries [14]. However, when graph-structured data. In PODS, pages 3–14, 2010.
[11] C. L. Barrett, R. Jacob, and M. V. Marathe.
recursion and invention are not allowed to interact, Formal-language-constrained path problems. SIAM J.
queries can be evaluated in Ptime [1]. Comput., 30(3):809–837, 2000.
[12] P. Buneman. Semistructured data. In PODS, pages
5. CONCLUSION 117–121, 1997.
[13] P. Buneman, M. F. Fernandez, and D. Suciu. UnQL:
We have provided a survey of query languages A query language and algebra for semistructured data
based on structural recursion. VLDB J., 9(1):76–110,
for graph databases, focussing on a number of im- 2000.
portant aspects of functionality. While many lan- [14] L. Cabibbo. The expressive power of stratified logic
guage features/constructs have been the subject of programs with value invention. Inf. Comput.,
147(1):22–56, 1998.
formal study, work remains to be done in terms of [15] D. Calvanese, G. de Giacomo, M. Lenzerini, and
an integrated and consistent framework in which M. Y. Vardi. Rewriting of regular expressions and
to study graph query languages. In particular, the regular path queries. J. Comput. Syst. Sci.,
areas of approximate querying and graph transfor- 64(3):443–465, 2002.
[16] D. Calvanese, G. D. Giacomo, M. Lenzerini, and
mations merit greater study. M. Y. Vardi. Containment of conjunctive regular path
The requirements of social network modelling and queries with inverse. In KR, pages 176–185, 2000.
analysis provide further opportunities to extend the [17] M. P. Consens and A. O. Mendelzon. Expressing
structural hypertext queries in GraphLog. In ACM
capabilities of graph languages. For example, many Hypertext, pages 269–292, 1989.
aspects of social network analysis rely on some prob- [18] M. P. Consens and A. O. Mendelzon. GraphLog: a
abilistic interpretation of graphs, so query languages visual formalism for real life recursion. In PODS,
need to be adapted and studied accordingly. Work pages 404–416, 1990.
[19] M. P. Consens and A. O. Mendelzon. Hy+ : a
in the area of expressive query languages for prob- hygraph-based query and visualization system. In
abilitic databases has recently been initiated [23]. SIGMOD, pages 511–516, 1993.
[20] M. P. Consens and A. O. Mendelzon. Low complexity
aggregation in graphlog and datalog. Theor. Comput.
Acknowledgements Sci., 116(1&2):95–116, 1993.
I would like to dedicate this survey to the mem- [21] I. F. Cruz, A. O. Mendelzon, and P. T. Wood. A
graphical query language supporting recursion. In
ory of Alberto Mendelzon, whose insight and vision SIGMOD, pages 323–330, May 1987.
inspired me to investigate graph query languages [22] I. F. Cruz, A. O. Mendelzon, and P. T. Wood. G+ :

SIGMOD Record, March 2012 (Vol. 41, No. 1) 59


Recursive queries without recursion. In EDS, pages DILS, pages 203–211, 2004.
355–368, Redwood City, 1988. Benjamin/Cummings. [47] W.-J. Lee, L. Raschid, P. Srinivasan, N. Shah, D. L.
[23] D. Deutch, C. Koch, and T. Milo. On probabilistic Rubin, and N. F. Noy. Using annotations from
fixpoint and markov chain query languages. In PODS, controlled vocabularies to find meaningful
pages 215–226, 2010. associations. In DILS, pages 247–263, 2007.
[24] A. Dries, S. Nijssen, and L. De Raedt. A query [48] U. Leser. A query language for biological networks.
language for analyzing networks. In CIKM, pages Bioinformatics, 21(2):33–39, 2005.
485–494, 2009. [49] L. Libkin and Vrgoc̆. Regular path queries on graphs
[25] W. Fan, J. Li, S. Ma, N. Tang, and Y. Wu. Adding with data. In ICDT (to appear), 2012.
regular expressions to graph reachability and pattern [50] Y. A. Liu and S. D. Stoller. Querying complex graphs.
queries. In ICDE, pages 39–50, 2011. In PADL, pages 199–214, 2006.
[26] M. F. Fernández, D. Florescu, A. Y. Levy, and [51] F. Mandreoli, R. Martoglia, G. Villani, and W. Penzo.
D. Suciu. Declarative specification of web sites with Flexible query answering on graph-modeled data. In
strudel. VLDB J., 9(1):38–55, 2000. EDBT, pages 216–227, 2009.
[27] G. H. L. Fletcher, M. Gyssens, D. Leinders, J. V. den [52] A. O. Mendelzon and P. T. Wood. Finding regular
Bussche, D. V. Gucht, S. Vansummeren, and Y. Wu. simple paths in graph databases. SIAM J. Comput.,
Relative expressive power of navigational querying on 24(6):1235–1258, December 1995.
graphs. In ICDT, pages 197–207, 2011. [53] I. S. Mumick and O. Shmueli. How expressive is
[28] D. Florescu, A. Y. Levy, and D. Suciu. Query stratified aggregation? Ann. Math. Artif. Intell.,
containment for conjunctive queries with regular 15(3-4):407–434, 1995.
expressions. In PODS, pages 139–148, 1998. [54] Y. Papakonstantinou, S. Abiteboul, and
[29] D. D. Freydenberger and N. Schweikardt. H. Garcia-Molina. Object fusion in mediator systems.
Expressiveness and static analysis of extended In VLDB, pages 413–424, 1996.
conjunctive regular path queries. In AMW, 2011. [55] Y. Papakonstantinou, H. Garcia-Molina, and
[30] G. Grahne and A. Thomo. Regular path queries under J. Widom. Object exchange across heterogeneous
approximate semantics. Ann. Math. Artif. Intell., information sources. In ICDE, pages 251–260, 1995.
46(1-2):165–190, 2006. [56] J. Paredaens, P. Peelman, and L. Tanca. G-Log: A
[31] D. Gusfield. Algorithms on Strings, Trees and graph-based query language. IEEE TKDE,
Sequences: Computer Science and Computational 7(3):436–453, 1995.
Biology. Cambridge University Press, 1997. [57] J. Pérez, M. Arenas, and C. Gutiérrez. nSPARQL: A
[32] R. H. Güting. GraphDB: Modeling and querying navigational language for RDF. In ISWC, pages
graphs in databases. In VLDB, pages 297–308, 1994. 66–81, 2008.
[33] M. Gyssens, J. Paradaens, and D. V. Gucht. A [58] A. Poulovassilis and M. Levene. A nested-graph model
graph-oriented object database model. In PODS, for the representation and manipulation of complex
pages 417–424, 1990. objects. ACM TOIS, 12(1):35–68, January 1994.
[34] M. Gyssens, J. Paredaens, J. V. den Bussche, and [59] A. Poulovassilis and P. T. Wood. Combining
D. V. Gucht. A graph-oriented object database model. approximation and relaxation in semantic web path
IEEE TKDE, 6(4):572–586, August 1994. queries. In ISWC, pages 631–646, 2010.
[35] D. Harel. On visual formalisms. C. ACM, [60] R. Ronen and O. Shmueli. SoQL: A language for
31(5):514–530, May 1988. querying and creating data in social networks. In
[36] S. Harris and A. Seaborne, editors. SPARQL 1.1 ICDE, pages 1595–1602, 2009.
Query Language, W3C Working Draft, 12 May 2011. [61] M. San Martı́n, C. Gutiérrez, and P. T. Wood. SNQL:
[37] H. He and A. K. Singh. Graphs-at-a-time: Query A social networks query and transformation language.
language and access methods for graph databases. In In AMW, 2011.
SIGMOD, pages 405–418, 2008. [62] A. Sheth, B. Aleman-Meza1, I. B. Arpinar,
[38] J. Hidders. Typing graph-manipulation operations. In C. Bertram, Y. Warke, C. Ramakrishnan,
ICDT, pages 391–406, 2003. C. Halaschek, K. Anyanwu, D. Avant, F. S. Arpinar,
[39] D. A. Holland, U. Braun, D. Maclean, K.-K. and K. Kochut. Semantic association identification
Muniswamy-Reddy, and M. I. Seltzer. Choosing a data and knowledge discovery for national security
model and query language for provenance. In Proc. applications. J. Database Management, 16(1):33–53,
Int. Provenance and Annotation Workshop, 2008. 2005.
[40] C. A. Hurtado, A. Poulovassilis, and P. T. Wood. [63] F. W. Tompa. A data model for flexible hypertext
Ranking approximate answers to semantic web database systems. ACM TODS, 7(1):85–100, January
queries. In ESWC, pages 263–277, 2009. 1989.
[41] N. Immerman. Languages that capture complexity [64] G. Weikum, G. Kasneci, M. Ramanath, and F. M.
classes. SIAM J. Comput., 16(4):760–778, 1987. Suchanek. Database and information-retrieval
[42] Y. Kanza and Y. Sagiv. Flexible queries over methods for knowledge discovery. C. ACM,
semistructured data. In PODS, pages 40–51, 2001. 52(4):56–64, 2009.
[43] G. Kasneci, F. M. Suchanek, G. Ifrim, M. Ramanath, [65] S. Wu and U. Manber. Fast text searching allowing
and G. Weikum. NAGA: Searching and ranking errors. C. ACM, 35(10):83–91, Oct. 1992.
knowledge. In ICDE, pages 953–962, 2008.
[44] G. Klyne and J. J. Carroll, editors. Resource
Description Framework (RDF): Concepts and
Abstract Syntax, W3C Recommendation, 10 February
2004.
[45] K. Kochut and M. Janik. SPARQLeR: Extended
SPARQL for semantic association discovery. In
ESWC, pages 145–159, 2007.
[46] Z. Lacroix, H. Murthy, F. Naumann, and L. Raschid.
Links and paths through life sciences data sources. In

60 SIGMOD Record, March 2012 (Vol. 41, No. 1)

You might also like