0% found this document useful (0 votes)
18 views

NOSQL, Graph Databases & Cypher

Dr. Petra Selmer will give a presentation about NOSQL databases, graph databases, and the Cypher query language. She is an engineer at Neo4j who works on the Cypher query language and manages the openCypher project. The presentation will provide an overview of the database landscape, introduce property graph and Neo4j databases, and discuss the evolution of the Cypher query language.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

NOSQL, Graph Databases & Cypher

Dr. Petra Selmer will give a presentation about NOSQL databases, graph databases, and the Cypher query language. She is an engineer at Neo4j who works on the Cypher query language and manages the openCypher project. The presentation will provide an overview of the database landscape, introduce property graph and Neo4j databases, and discuss the evolution of the Cypher query language.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 78

NOSQL, graph databases &

Cypher
Advances in Data Management, 2018

Dr. Petra Selmer


Engineer at Neo4j and member of the openCypher Language Group
1
About me
Member of the Cypher Language Group
Design new features for Cypher

Manage the openCypher project

Engineer at Neo4j
Work on the Cypher Features Team

Maintainer of the Cypher chapter in the Neo4j Developer Manual

PhD in flexible querying of graph-structured data (Birkbeck, University of London)

2
Agenda

The wider landscape


NOSQL in brief
Introduction to property graph databases (in particular Neo4j)
The Cypher query language
Evolving Cypher

3
Preamble

The area is HUGE


The area is ever-changing!

4
The wider landscape: 2012

Matthew Aslett, The 451 Group


5
The wider landscape: 2016

Matthew Aslett, The 451 Group


6
The wider landscape

Several dimensions in one picture:


Relational vs. Non-relational

Analytic (batch, offline) vs. Operational (transactional, real-time)

Increasingly difficult to categorise these data stores:


Everyone is now trying fiercely to integrate features from databases found in other spaces.

The emergence of “multi-model” data stores:


One may start with one data model and add other models as new requirements emerge.

7
A brief tour of NOSQL

8
NOSQL: non-relational

NOSQL: “Not Only SQL”, not “No SQL”


Basically means “not relational” – however this also doesn't quite
apply, because graph data stores are very relational; they just track
different forms of relationships than a traditional RDBMS.
A more precise definition would be the union of different data
management systems differing from Codd’s classic relational model

9
NOSQL: non-relational

The name is not a really good one, because some of these support
SQL and SQL is really orthogonal to the capabilities of these systems.
However, tricky to find a suitable name.
A good way to think of these is as “the rest of the databases that
solve the rest of our problems”
Scalability:
Horizontal (scale out): the addition of more nodes (commodity servers) to a system (cluster)
- simple NOSQL stores

Vertical (scale up): the addition of more resources – CPU, memory – to a single machine
10
Non-relational vs. relational

What’s wrong with relational DBs? They’re great!


ACID
Enforcement of referential integrity and constraints
SQL
Excellent support by many languages and technology stacks
Excellent tooling
Well-understood operational processes (DBAs): backups, recovery, tuning etc
Good security management (user access, groups etc) 11
Problems with relational

Scaling with large and high-velocity data


‘Big Data’
Expensive / difficult / impossible to scale reads and writes vertically and
horizontally
Complexity of data
Impedance mismatch
Performance issues (joins)
Difficult to develop and maintain
12
Problems with relational

Schema flexibility and evolution


Not trivial
Application downtime

13
Non-relational

Not intended as a replacement for RDBMS


One size doesn’t fit all
Use the right tool for the job

14
Non-relational

Today's data problems are getting complicated: the scalability,


performance (low latency), and volume needs are greater.
In order to solve these problems, we're going to have to use an
alternative data store or use more than one database technology.

15
Relational vs. Aggregate Data Model

Relational
Data are divided into rows (tuples) with pre-defined columns
(attributes)
There is no nesting of tuples
There is no list of values
Aggregate
Think of this as a collection of related objects, which should be
treated as a unit 16
Relational vs. Aggregate Data Model

17
Non-relational families

18
Non-relational families
Store Key/Value Column Document Graph

Design Key/Value pairs; indexed Columns and Column Multiple Key/Value pairs Focus on the connections
by key Families. Directly form a document. Values between data and fast
accesses the column may be nested documents navigation through these
values or lists as well as scalar connections
values

Scalability +++ +++ ++ ++

Aggregate- Yes Yes Yes No

oriented

Complexity + ++ ++ +++

Inspiration/ Berkley DB, Memcached, SAP Sybase IQ, Google Lotus Notes Graph theory
Distributed Hashmaps BigTable
Relation

Products Voldemort, Redis, Riak(?) HBase, Cassandra, MongoDB, Couchbase Neo4j, DataStax
Hypertable Enterprise Graph
19
Non-relational families

20
Key/Value stores
A key-value store is a simple hash table
Generally used when all access to the data is via a primary key

Simplest non-relational data store

Value is a BLOB data store does not care or necessarily know what is ‘inside’

Use cases
Storing Session Information

User Profiles, Preferences

Shopping Cart Data

Sensor data, log data, serving ads


21
Key/Value stores
Strengths

Simple data model

Great at scaling out horizontally for reads and writes

Scalable

Available

No database maintenance required when adding / removing columns

Weaknesses

Simplistic data model – moves a lot of the complexity of the application into the application layer itself

Poor for complex data

Querying is simply by a given key: more complex querying not supported


22
Column stores
Rows are split across multiple nodes through sharding on the primary key

A big table, with column families. Column families are groups of related
data, often accessed together

Example (see diagram):


One row for Customer 1234

Customer table partitioned into 2 column families: profile and orders

Each column family has columns (e.g. name and payment) and supercolumns
(have a name and an arbitrary number of associated columns)
Source: NOSQL Distilled
Each column family may be treated as a separate table in terms of sharding:

Profile for Customer 1234 may be on Node 1, orders for Customer 1234 may be on
Node 2
23
Column stores

Use cases
Logging and customer analytics
Event Logging
Counters
Smart meters and monitoring
Sensor data

24
Column stores
Strengths

Data model supports (sparse) semi-structured data

Naturally indexed (columns)

Good at scaling out horizontally

Can see results of queries in real time

Weaknesses

Unsuited for interconnected data

Unsuited for complex data reads and querying

Require maintenance – when adding / removing columns and grouping them

Queries need to be pre-written; no ad-hoc queries defined “on the fly”


25
Document stores

Collections of documents
A document is a key-value collection
Stores and retrieves documents, which can be XML, JSON, BSON..
Documents are self-describing, hierarchical tree data structures
which can consist of maps, collections and scalar values, as well as
nested documents
Documents stored are similar to each other but do not have to be
exactly the same
26
Document stores
Use cases

High Volume Data Feeds

Tick Data capture

Risk Analytics & Reporting

Product Catalogs & Trade Capture

Portfolio and Position Reporting

Reference Data Management

Portfolio Management

Quantitative Analysis

Automated Trading
27
Document stores

Strengths
Simple but powerful data model – able to express nested structures
Good scaling (especially if sharding supported)
No database maintenance required to add / remove ‘columns’
Powerful query expressivity (especially with nested structures) – able
to pose fairly sophisticated queries
Weaknesses
Unsuited for interconnected data
28
29
30
Graph stores
“Odd man out” in the non-relational group: not aggregate-oriented

Designed for COMPLEX data – richer data, a lot of expressive power

Data model – nodes and edges:

Nodes

Edges are named relationships between nodes

A query on the graph is also known as traversing the graph: traversing the relationships is very fast

Graph theory:

People talk about Codd’s relational model being mature because it was proposed in 1969: 49 years old.

Euler’s graph theory was proposed in 1736: 282 years old!

Semantic Web technologies: RDF, ontologies, triple stores and SPARQL


31
Graph stores
Strengths

complexity = f(size, variable structure, connectedness)

Powerful data model

Fast

For connected data, can be many orders of magnitude faster than RDBMS

Good, well-established querying models: Cypher, SPARQL and Gremlin

Schema-optional model

Weaknesses

If the data has no / few connections, there is not much benefit in using a graph database

32
Graph stores: use cases
Connected data

Hierarchical data

Recommendation engines, Business intelligence

Network impact analysis, Social computing, Geospatial

Systems management, web of things / Internet of things

Genealogy

Product catalogue, Access Control

Life Sciences and scientific computing (especially bioinformatics)

Routing, Dispatch, Logistics and Location-Based Services

Financial services – finance chain, dependencies, risk management, fraud detection etc. For example, if you want to find out how vulnerable a
company is to a bit of "bad news" for another company, the directness of the relationship can be a critical calculation. Querying this in several SQL
statements takes a lot of code and won't be fast, but a graph store excels at this task.
33
Neo4j: a property graph database

34
Verticals

35
Graph stores: Neo4j

Labelled property graph database


https://github.com/opencypher/openCypher/blob/master/docs/property-graph-model.adoc
Four building blocks:
Nodes
Relationships
Properties
Labels

(Thanks to Stefan Plantikow, Tobias Lindaaker & Mark Needham for some of the following slides/images)
36
Graph stores: Neo4j

Nodes
Represent objects in the graph
Can be labelled

37
Graph stores: Neo4j

Nodes
Represent objects in the graph
Can be labelled
Relationships
Relate nodes by type and
direction

38
Graph stores: Neo4j
Nodes
Represent objects in the graph
Can be labelled
Relationships
Relate nodes by type and
Direction
Properties
Name-value pairs that can go on nodes and relationships
39
Nodes

Used to represent entities and complex value types in


your domain
Can contain properties
Nodes of the same type can have different properties

40
Labels

Every node can have zero or more labels


Used to represent roles (e.g. user, product, company)
Group nodes
Allows us to associate indexes and constraints with
groups of nodes

41
Relationships

Every relationship has a type and a direction


Adds structure to the graph
Provides semantic context for nodes

Can contain properties


Every relationship must have a start node and end node
No dangling relationships

42
Relationships

43
Properties

Each node and relationship may have zero or more


properties
Represent the data: name, age, weight, createdAt etc…
Key-value pairs (a map):
String key: “name”
Typed value: string, number, boolean, lists
44
Relational vs. graph models

45
Language drivers

46
**where a real time response is needed
Graph stores
Less about the volume of data or availability

More about how your data is related

Densely-connected, variably structured domains**

Lots of join tables? Connectedness**

Lots of sparse tables? Variable structure**

Path finding**

Deep joins**

Use in any case where the relationship between the data is just as important as the data itself.

Don’t use if your data is simple or tabular.

More use cases for graphs at http://neo4j.com/customers/


47
Neo4j: Resources
Neo4j Manual: https://neo4j.com/docs/developer-manual/current/
Graph Databases (book available online at www.graphdatabases.com)
Getting started: http://neo4j.com/developer/get-started/
Online training: http://neo4j.com/graphacademy/
Meetups (last Wed of the month) at http://www.meetup.com/graphdb-london (free talks
and training sessions)

48
Cypher

49
Introducing Cypher

Declarative graph pattern matching language


SQL-like syntax
ASCII art based
Able to read and mutate the data, as well as perform
various aggregate functions such as count and so on

50
Cypher: matching patterns

51
Cypher: nodes

() or (n)
Surround with parentheses

Use an alias n to refer to our node later in the query

(n:Label)
Specify a Label, starting with a colon :

Used to group nodes by roles or types (similar to tags)

(n:Label {prop: ‘value’})


Nodes can have properties
52
Cypher: relationships

--> or -[r:TYPE]->
Wrapped in hyphens and square brackets

A relationship type starts with a colon :

<>
Specify the direction of the relationships

-[:KNOWS {since: 2010}]->


Relationships can have properties

53
Cypher: patterns

Used to query data


(n:Label {prop: ‘value’})-[:TYPE]->(m:Label)

54
Cypher: patterns

Find Alice who knows Bob


In other words:
find Person with the name ‘Alice’
who KNOWS
a Person with the name ‘Bob’

(p1:Person {name: ‘Alice’})-[:KNOWS]->(p2:Person {name: ‘Bob’})

55
Cypher: restaurant recommendations
Friends, restaurants in cities, their cuisines, and restaurants liked by people

56
Cypher: restaurant recommendations
Find Sushi restaurants in New York liked by Philip’s friends
Four connected facts:

1. People who are friends of Philip


2. Restaurants located in New York
3. Restaurants serving Sushi
4. Restaurants liked by Philip’s Friends

57
Cypher: restaurant recommendations
MATCH (philip:Person {name: ‘Philip’}),
(philip)-[:IS_FRIEND_OF]-(friend),
(restaurant:Restaurant)-[:LOCATED_IN]->(:City {name: ‘New York’}),
(restaurant)-[:SERVES]->(:Cuisine {name: ‘Sushi’}),
(friend)-[:LIKES]->(restaurant)
RETURN restaurant.name, collect(friend.name) AS likers, count(*) AS occurrence
ORDER BY occurrence DESC

restaurant.name likers occurrence

iSushi [Michael, Andreas] 2

Zushi Zam [Andreas] 1

58
Cypher in a nutshell
// Pattern matching
MATCH (me:Person)-[:FRIEND]->(friend)
// Filtering with predicates
WHERE me.name = "Frank Black"
AND friend.age > me.age
// Projection of expressions
RETURN toUpper(friend.name) AS name, friend.title AS title

// Data creation and manipulation


CREATE (you:Person)
SET you.name = "Aaron Fletcher"
CREATE (you)-[:FRIEND]->(me)

// Sequential query composition and aggregation


MATCH (me:Person {name: $name})-[:FRIEND]-(friend)
WITH me, count(friend) AS friends
MATCH (me)-[:ENEMY]-(enemy)
RETURN friends, count(enemy) AS enemies

based on slide by T. Lindaaker 59


Cypher patterns in a nutshell
// Node patterns
MATCH (), (node), (node:Node), (:Node), (node {type:"NODE"})

// Rigid relationship patterns


MATCH ()-->(), ()-[edge]->(),
()-[edge:RELATES]->(),
()-[:RELATES]->(),
()-[edge {score:5}]->(),
(a)-[edge]->(b)
(a)<-[edge]-(b), (a)-[edge]-(b)

// Variable length relationship patterns


MATCH (me)-[:FRIEND*]-(foaf)
MATCH (me)-[:FRIEND*1..3]-(foaf)

// Path binding
MATCH p=(a)-[:ONE]-()-[:TWO]-()-[:THREE]-()

based on slide by T. Lindaaker 60


Evolving Cypher

61
Designing a query language: what is involved?

Syntax
Semantics
Academic research
Compare and contrast with SQL, SPARQL, ...

62
Designing a query language: considerations
(node1)-[:RELATIONSHIP]->(node2)
Keywords
Suitability e.g. CREATE or ADD
Symmetry e.g. ADD and DROP
Delimiters
Do not reuse “(”, “[”...
Consistent behaviour with existing implementation
Complexity
Ensure the constructs are future-proof
63
openCypher...

...is a community effort to evolve Cypher, and make


it the de-facto language for querying property
graphs

openCypher implementations

SAP, Redis, Agens Graph, Cypher.PL, Neo4j, CAPS, CoG, ...

64
openCypher
opencypher.org

openCypher Implementers Group (oCIG)

Evolve Cypher through an open process

Comprises vendors, researchers, implementers, interested parties

Regular meetings to discuss and agree upon new features

Consensus-based system

65
openCypher website

Blog

New Features

Upcoming Meetings

Recordings and Slides

References (Links, Papers)

Artifacts
66
Language Artifacts
github.com/openCypher
Cypher 9 reference

ANTLR and EBNF Grammars

Formal Semantics (SIGMOD, to be published here)

Technology Compatibility Kit (TCK) - Cucumber test


suite)

Style Guide

Implementations & Code

openCypher for Apache Spark

openCypher for Gremlin

open source frontend (part of Neo4j, to be published


here)
67
Formal Semantics

SIGMOD 2018

http://homepages.inf.ed.ac.uk/pguaglia/papers/sigmod18.pdf

68
TCK
Scenario: Optionally matching named paths Background:
Given an empty graph Given any graph
And having executed:
""" Scenario: Creating a node
CREATE (a {name: 'A'}), (b {name: 'B'}), (c {name: 'C'}) When executing query:
CREATE (a)-[:X]->(b) """
""" CREATE ()
When executing query: """
""" Then the result should be empty
MATCH (a {name: 'A'}), (x) And the side effects should be:
WHERE x.name IN ['B', 'C'] | +nodes | 1 |
OPTIONAL MATCH p = (a)-->(x)
RETURN x, p
"""
Then the result should be:
|x |p |
| ({name: 'B'}) | <({name: 'A'})-[:X]->({name: 'B'})>|
| ({name: 'C'}) | null |
And no side effects
based on slide by M. Rydberg 69
Language specification and improvements

Cypher 9 reference

Cypher Improvement Request


(CIR)

Cypher Improvement Proposal


(CIP)

Next version: Cypher 10


70
Upcoming Cypher features

71
Query composition

"Meaning of the whole is determined by the meanings of its constituents


and the rules used to combine them"

Organize a query into multiple parts

Extract parts of a query to a view for re-use

Replace parts of a query without affecting other parts

Build complex workflows programmatically

based on slide by S. Plantikow 72


Implications for Cypher
Pass both multiple graphs and tabular data into a query

Return both multiple graphs and tabular data from a query

Select which graph to query

Construct new graphs from existing graphs

based on slide by S. Plantikow 73


Cypher query
pipeline composition

based on slide by S. Plantikow 74


Complex path patterns

Regular path queries

X, (likes.hates)*(eats|drinks)+, Y

Inclusion of node and relationship tests

75
Path patterns
PATH PATTERN
older_friends = (a)-[:FRIEND]-(b) WHERE b.age > a.age

MATCH p=(me)-/~older_friends+/-(you)
WHERE me.name = $myName AND you.name = $yourName
RETURN p AS friendship

based on slide by T. Lindaaker 76


Getting involved
Please follow news at opencypher.org and @opencypher on twitter

There's a great slack channel for implementers

Next openCypher Implementer Group call on Wednesday, 14 March

Language change request issues (CIRs) and full proposals (CIPs)

Own ideas? Talk to us! Or create a Pull Request at


https://github.com/opencypher/openCypher

77
Thank you!

78

You might also like