NOSQL, Graph Databases & Cypher
NOSQL, Graph Databases & Cypher
Cypher
Advances in Data Management, 2018
Engineer at Neo4j
Work on the Cypher Features Team
2
Agenda
3
Preamble
4
The wider landscape: 2012
7
A brief tour of NOSQL
8
NOSQL: non-relational
9
NOSQL: non-relational
The name is not a really good one, because some of these support
SQL and SQL is really orthogonal to the capabilities of these systems.
However, tricky to find a suitable name.
A good way to think of these is as “the rest of the databases that
solve the rest of our problems”
Scalability:
Horizontal (scale out): the addition of more nodes (commodity servers) to a system (cluster)
- simple NOSQL stores
Vertical (scale up): the addition of more resources – CPU, memory – to a single machine
10
Non-relational vs. relational
13
Non-relational
14
Non-relational
15
Relational vs. Aggregate Data Model
Relational
Data are divided into rows (tuples) with pre-defined columns
(attributes)
There is no nesting of tuples
There is no list of values
Aggregate
Think of this as a collection of related objects, which should be
treated as a unit 16
Relational vs. Aggregate Data Model
17
Non-relational families
18
Non-relational families
Store Key/Value Column Document Graph
Design Key/Value pairs; indexed Columns and Column Multiple Key/Value pairs Focus on the connections
by key Families. Directly form a document. Values between data and fast
accesses the column may be nested documents navigation through these
values or lists as well as scalar connections
values
oriented
Complexity + ++ ++ +++
Inspiration/ Berkley DB, Memcached, SAP Sybase IQ, Google Lotus Notes Graph theory
Distributed Hashmaps BigTable
Relation
Products Voldemort, Redis, Riak(?) HBase, Cassandra, MongoDB, Couchbase Neo4j, DataStax
Hypertable Enterprise Graph
19
Non-relational families
20
Key/Value stores
A key-value store is a simple hash table
Generally used when all access to the data is via a primary key
Value is a BLOB data store does not care or necessarily know what is ‘inside’
Use cases
Storing Session Information
Scalable
Available
Weaknesses
Simplistic data model – moves a lot of the complexity of the application into the application layer itself
A big table, with column families. Column families are groups of related
data, often accessed together
Each column family has columns (e.g. name and payment) and supercolumns
(have a name and an arbitrary number of associated columns)
Source: NOSQL Distilled
Each column family may be treated as a separate table in terms of sharding:
Profile for Customer 1234 may be on Node 1, orders for Customer 1234 may be on
Node 2
23
Column stores
Use cases
Logging and customer analytics
Event Logging
Counters
Smart meters and monitoring
Sensor data
24
Column stores
Strengths
Weaknesses
Collections of documents
A document is a key-value collection
Stores and retrieves documents, which can be XML, JSON, BSON..
Documents are self-describing, hierarchical tree data structures
which can consist of maps, collections and scalar values, as well as
nested documents
Documents stored are similar to each other but do not have to be
exactly the same
26
Document stores
Use cases
Portfolio Management
Quantitative Analysis
Automated Trading
27
Document stores
Strengths
Simple but powerful data model – able to express nested structures
Good scaling (especially if sharding supported)
No database maintenance required to add / remove ‘columns’
Powerful query expressivity (especially with nested structures) – able
to pose fairly sophisticated queries
Weaknesses
Unsuited for interconnected data
28
29
30
Graph stores
“Odd man out” in the non-relational group: not aggregate-oriented
Nodes
A query on the graph is also known as traversing the graph: traversing the relationships is very fast
Graph theory:
People talk about Codd’s relational model being mature because it was proposed in 1969: 49 years old.
Fast
For connected data, can be many orders of magnitude faster than RDBMS
Schema-optional model
Weaknesses
If the data has no / few connections, there is not much benefit in using a graph database
32
Graph stores: use cases
Connected data
Hierarchical data
Genealogy
Financial services – finance chain, dependencies, risk management, fraud detection etc. For example, if you want to find out how vulnerable a
company is to a bit of "bad news" for another company, the directness of the relationship can be a critical calculation. Querying this in several SQL
statements takes a lot of code and won't be fast, but a graph store excels at this task.
33
Neo4j: a property graph database
34
Verticals
35
Graph stores: Neo4j
(Thanks to Stefan Plantikow, Tobias Lindaaker & Mark Needham for some of the following slides/images)
36
Graph stores: Neo4j
Nodes
Represent objects in the graph
Can be labelled
37
Graph stores: Neo4j
Nodes
Represent objects in the graph
Can be labelled
Relationships
Relate nodes by type and
direction
38
Graph stores: Neo4j
Nodes
Represent objects in the graph
Can be labelled
Relationships
Relate nodes by type and
Direction
Properties
Name-value pairs that can go on nodes and relationships
39
Nodes
40
Labels
41
Relationships
42
Relationships
43
Properties
45
Language drivers
46
**where a real time response is needed
Graph stores
Less about the volume of data or availability
Path finding**
Deep joins**
Use in any case where the relationship between the data is just as important as the data itself.
48
Cypher
49
Introducing Cypher
50
Cypher: matching patterns
51
Cypher: nodes
() or (n)
Surround with parentheses
(n:Label)
Specify a Label, starting with a colon :
--> or -[r:TYPE]->
Wrapped in hyphens and square brackets
<>
Specify the direction of the relationships
53
Cypher: patterns
54
Cypher: patterns
55
Cypher: restaurant recommendations
Friends, restaurants in cities, their cuisines, and restaurants liked by people
56
Cypher: restaurant recommendations
Find Sushi restaurants in New York liked by Philip’s friends
Four connected facts:
57
Cypher: restaurant recommendations
MATCH (philip:Person {name: ‘Philip’}),
(philip)-[:IS_FRIEND_OF]-(friend),
(restaurant:Restaurant)-[:LOCATED_IN]->(:City {name: ‘New York’}),
(restaurant)-[:SERVES]->(:Cuisine {name: ‘Sushi’}),
(friend)-[:LIKES]->(restaurant)
RETURN restaurant.name, collect(friend.name) AS likers, count(*) AS occurrence
ORDER BY occurrence DESC
58
Cypher in a nutshell
// Pattern matching
MATCH (me:Person)-[:FRIEND]->(friend)
// Filtering with predicates
WHERE me.name = "Frank Black"
AND friend.age > me.age
// Projection of expressions
RETURN toUpper(friend.name) AS name, friend.title AS title
// Path binding
MATCH p=(a)-[:ONE]-()-[:TWO]-()-[:THREE]-()
61
Designing a query language: what is involved?
Syntax
Semantics
Academic research
Compare and contrast with SQL, SPARQL, ...
62
Designing a query language: considerations
(node1)-[:RELATIONSHIP]->(node2)
Keywords
Suitability e.g. CREATE or ADD
Symmetry e.g. ADD and DROP
Delimiters
Do not reuse “(”, “[”...
Consistent behaviour with existing implementation
Complexity
Ensure the constructs are future-proof
63
openCypher...
openCypher implementations
64
openCypher
opencypher.org
Consensus-based system
65
openCypher website
Blog
New Features
Upcoming Meetings
Artifacts
66
Language Artifacts
github.com/openCypher
Cypher 9 reference
Style Guide
SIGMOD 2018
http://homepages.inf.ed.ac.uk/pguaglia/papers/sigmod18.pdf
68
TCK
Scenario: Optionally matching named paths Background:
Given an empty graph Given any graph
And having executed:
""" Scenario: Creating a node
CREATE (a {name: 'A'}), (b {name: 'B'}), (c {name: 'C'}) When executing query:
CREATE (a)-[:X]->(b) """
""" CREATE ()
When executing query: """
""" Then the result should be empty
MATCH (a {name: 'A'}), (x) And the side effects should be:
WHERE x.name IN ['B', 'C'] | +nodes | 1 |
OPTIONAL MATCH p = (a)-->(x)
RETURN x, p
"""
Then the result should be:
|x |p |
| ({name: 'B'}) | <({name: 'A'})-[:X]->({name: 'B'})>|
| ({name: 'C'}) | null |
And no side effects
based on slide by M. Rydberg 69
Language specification and improvements
Cypher 9 reference
71
Query composition
X, (likes.hates)*(eats|drinks)+, Y
75
Path patterns
PATH PATTERN
older_friends = (a)-[:FRIEND]-(b) WHERE b.age > a.age
MATCH p=(me)-/~older_friends+/-(you)
WHERE me.name = $myName AND you.name = $yourName
RETURN p AS friendship
77
Thank you!
78