DATA MODELING WITH GRAPH
DATABASES
Ross McNeely
Principal Consultant, Practice Manager, Business Intelligence
“Data Junkie”
DATA MODELING WITH GRAPH DATABASES
CREATE TABLE #Info
(Info_Type VARCHAR(25)
,Info_Value VARCHAR(50))
INSERT INTO #Information VALUES
('Name','Ross McNeely')
,('Email','rmcneely@tailwindbi.com')
,('Company','Tail Wind Informatics')
,('CompanySite','www.tailwindtech.com')
,('LinkedIn','www.linkedin.com/in/rossmcneely')
,('Blog','www.mcneelydwbi.wordpress.com');
SELECT Info_Type, Info_Value FROM #Information
SPEAKER BIO
Ross McNeely is the Principle
Consultant & BI Practice Manager at
Tail Wind Informatics.
Ross has been working with MS
SQL Server BI stack for over a
decade.
Enterprise Information Management
& Business Intelligence are Ross’
primary focus.
Business “Go
Intelligence Farther,
Solutions Faster”
HTTP://TAILWINDTECH.COM
• Introduction to the Graph Model (15 min)
• Data Modeling with Graph Databases (15 min)
• Relational and Graph Models (10 min)
• Healthcare Use Case (20 min)
Agenda • Deeper Dive into Graph Databases (20 min)
• Logistics Use Case (15 min)
• Security Use Case (15 min)
• Summary (5 min)
DATA MODELING WITH GRAPH DATABASES
• Defining the Graph Database
• Overview of the Graph Market
Introduction to • Benefits of the Graph Data Model
the Graph Model
DATA MODELING WITH GRAPH DATABASES
DEFINING THE GRAPH DATABASE
INTRODUCTION TO THE GRAPH MODEL
NoSQL Primary Groupings
Key Value Column Store
Document
Graph
INTRODUCTION TO THE GRAPH MODEL
Graph Defined:
1“Formally, a graph is just a collection of vertices and edges-or, in less
intimidating language, a set of nodes and the relationships that connect
them.”
Graph
Less Formally Defined:
-A graph is a set of nodes, relationships, and properties.
-A network of connected objects.
INTRODUCTION TO THE GRAPH MODEL
• Nodes (“vertices”)
Property • Relationships (“edges”)
Graph • Properties
INTRODUCTION TO THE GRAPH MODEL
Nodes • Nodes represent entities
Nodes contain properties. Think of nodes as documents that
store properties in the form of arbitrary key-value pairs.
name: bode
miller
INTRODUCTION TO THE GRAPH MODEL
• Relationships are the lines
Relationships between nodes.
Relationships connect and structure nodes.
Olympic
_Address
INTRODUCTION TO THE GRAPH MODEL
• Properties are values about the
Properties node or relationship.
name: bode
Properties can be added to nodes and relationships. miller
Allows you to create additional semantics to relationships.
Address
Type:Olympic
Address:123
Fake Street
INTRODUCTION TO THE GRAPH MODEL
Basic Graph
Node Ross
Property
Relationship
knows knows
Jack knows Megan
OVERVIEW OF THE GRAPH MARKET
INTRODUCTION TO THE GRAPH MODEL
INTRODUCTION TO THE GRAPH MODEL
ArangoDB
Trinity Neo4J BigData
Graph Processing
Bitsy
BrightStartDB
DEX/Sparksee
Filament
GraphBase
Horton
HyperGraphDB
FlockDB Allegro OpenLink
R2DF
Titan
Graph Storage VelocityGraph
VertexDB
INTRODUCTION TO THE GRAPH MODEL
Property Graph Triples* Hypergraph
Neo4j Allegro Graph Hyper Graph DB
*Triple Stores come from the Semantic Web movement. A triple is a subject-predicate-object
data structure
BENEFITS OF THE GRAPH MODEL
INTRODUCTION TO THE GRAPH MODEL
DATA MODELING WITH GRAPH DATABASES
• It is an agile modeling approach
What does the • No pre-defined schema
graph database • General purpose graph data
offer? schema
• Easy of use with the Business
DATA MODELING WITH GRAPH DATABASES
• Performance increase when dealing with
Performance connected data.
• We can add nodes/relationships as the
Flexibility business domain dictates.
• Agile and test-driven software development
Agility practices.
• Why Data Model with a Graph Database?
Data Modeling • Graph Modeling
with Graph
Databases
DATA MODELING WITH GRAPH DATABASES
WHY DATA MODEL WITH A GRAPH DATABASE
DATA MODELING WITH GRAPH DATABASES
DATA MODELING WITH GRAPH DATABASES
Q: Why did I want to use a graph database?
A: Here is the simplified version of my requirements.
• Requirement #1: It is all about the relationships.
• Requirement #2: First learn requirement #1.
INTRODUCTION
Graph StructureTO THE GRAPH MODEL
Label: isMember
Since: 1/20/2014
Name: Ross
Age: 34
Label: Member
Label: Knows
Since: 5/20/2006 Label: Knows
Since 5/20/2008
Type: Activity
Label: isMember
Name: Jack Name: Martial
Since: 6/15/2013
Age: 7 Arts
Label: Member
GRAPH MODELING
DATA MODELING WITH GRAPH DATABASES
DATA MODELING WITH GRAPH DATABASES
The Modeling Half The Database Half
Graph CRUD
CRUD Matrix
Function\Entity Appointment
Enter C
Confirm RU
Cancel D
DATA MODELING WITH GRAPH DATABASES
• 1“A graph database management system
(G-DBMS) is an online database
Graph Database: management system with Create, Read,
Update, and Delete (CRUD) methods that
expose a graph data model.”
DATA MODELING WITH GRAPH DATABASES
Graph Modeling Rules: “By the book1”
• Nodes for Things, Relationships for Structure
• Use nodes to represent entities –this is things that of interest
• Use relationships to (build structure)
• Express connections between entities
• Establish semantic context for each entity
• Use node properties to represent entity attributes, plus metadata
• Use relationship properties to express the strength, weight, or quality of a
relationship, plus metadata.
INTRODUCTION TO THE GRAPH MODEL
How do you use
a graph database? Query Result Options
• Traversal of the • Follow the • A set
database. relationships from • A path
node to node • A pattern
INTRODUCTION TO THE GRAPH MODEL
Set Path Pattern
• The Similarities
• The Differences
Relational and
Graph Models
DATA MODELING WITH GRAPH DATABASES
RELATIONAL AND GRAPH MODELS
• Define and agree upon the domain entities
Similarities • Define the interactions, and governing rules
• Whiteboard stage is the same
• Few changes from conceptual to logical to physical
Differences • Graphs storage model matches the logical model
• After the initial domain definition we enhance the
graph instead of defining the tables.
RELATIONAL AND GRAPH MODELS
Relational
Graph
• Patient Matching
Healthcare Use
Case
DATA MODELING WITH GRAPH DATABASES
HEALTHCARE USE CASE
Accountable Care • Patient Protection and Affordable Care
Organizations Act of 2010
(ACOs) • Transform health providers into ACOs
What does this • Patient Matching
boil down to?
HEALTHCARE USE CASE
• 6Two specific objectives
Patient • Identify common attributes
Matching • Define processes and best practices
Scope of • 6Up to 14% percent of medical
Problem records contain erroneous data
HEALTHCARE USE CASE
PatientMaster PatientSourceB
PK PatientMasterID PK FirstName
• FirstName
PK LastName
FK1,FK2 FirstName PK DOB
• LastName FK1,FK2 LastName PK Gender
Master • DOB
FK1,FK2
FK1,FK2
DOB
Gender
PK
PK
SSN
Address1
Data • Gender
FK1
FK1
SSN
Address1
Lookup
• SSN PatientSourceC
• Address1 PK FirstName
PK LastName
PK DOB
PK Gender
HEALTHCARE USE CASE
Normalization
PatientExternal PatientMaster
PK PatientExternalID PatientSourceRef PK PatientMasterID
FirstNameOriginal PK PatientSourceRefID
FK1,FK2 FirstName
LastNameOriginal FK1,FK2 LastName
DOBOriginal FK1 PatientExternalID
FK1,FK2 DOB
GenderOriginal FK2 PatientMasterID
FK1,FK2 Gender
SSNOriginal IsActiveRecord
FK1 SSN
Address1Original FK1 Address1
HEALTHCARE USE CASE
Source Source Address
A B C
Came_From
I created a
matching site based
Patient Patient Patient Patient Patient
on social graph William Bill Pat Pat Joe
database example
Lives_At Lives_In
Address State DOB Gender
• Graph Modeling Continued
• Graph Modeling Mistakes
Deeper Dive • Patterns
into Graph • Misc.
Databases
DATA MODELING WITH GRAPH DATABASES
GRAPH MODELING CONTINUED
DEEPER DIVE INTO GRAPH DATABASES
DEEPER DIVE INTO GRAPH DATABASES
Graph Modeling Guidelines:
• The query patterns drive the data model
• Normalization is a natural trend in graph modeling
• In general normalization has a low cost
• Complexity with normalization will drive traversal speeds up
• The SIP Methodology2
• Use in-graph indices for range queries*
• Node and Relationship Redundancy is not bad.
• Schema development over time
• Database extensions* *Have not used myself
DEEPER DIVE INTO GRAPH DATABASES
Graph Modeling Dilemmas:
• Q: Should I create a Relationship or a Property?
• Q: Should every node with the same key/value
(property) be connected?
• A: It depends.
GRAPH MODELING MISTAKES
DEEPER DIVE INTO GRAPH DATABASES
DEEPER DIVE INTO GRAPH DATABASES
What was I thinking?
CHAOS
I started without a
plan
DEEPER DIVE INTO GRAPH DATABASES
This is easy!
DESIGN PATTERNS3
Linked List
Multiple Relationships
Tags and Categories
Multi Level Tree
R-Tree (spatial)
Activity Stream
Anti-pattern: Unconnected graph
PATTERNS
DEEPER DIVE INTO GRAPH DATABASES
DEEPER DIVE INTO GRAPH DATABASES
Anti-pattern Pattern: Linked List
country: Competes name: bode Sport Name: downhill
Olympian usa _for miller _order Rank: 12
name Sport
country Sport
_order _order
sport1_name
sport1_rank Name: super
sport2_name Name: super-g
combined downhill
sport2_rank Rank: 3
Rank: 12
sport3_name Sport
sport3_rank Sport
_order _order
Name: super
combined slalom
Rank: 7
DEEPER DIVE INTO GRAPH DATABASES
Anti-pattern Pattern: Multiple Relationships
Olympian country:
name Competes_in
usa
country Competes Order: 1
sport1_name _for
sport1_rank name: bode Placed
Rank: 8
Downhill
sport2_name miller
sport2_rank
sport3_name Placed Competes_in
sport3_rank Rank: 12 Order: 2
Super
Combined
DEEPER DIVE INTO GRAPH DATABASES
Anti-pattern Pattern: Tags and Categories1
Id: App 1 Id: App 2
Data Center Status: Up/Down Status: Up/Down
Runs_on Runs_on Runs_on
Database_server
Application Id: Vir Machine 15 Id: Vir Machine 16 Id: Vir Machine 17
Virtual Machine Status: Up/Down Status: Up/Down Status: Up/Down
Server
Rack Hosted_by Hosted_by Hosted_by
Id: Server 1 Id: Vir Machine 2
Status: Up/Down Status: Up/Down
In In
Id: Rank 1
Status: Up/Down
DEEPER DIVE INTO GRAPH DATABASES
Pattern: Multi-Level Tree1 timeline
Year Year
2013 2014
Month Month
december january
Day Day Day Day
15 25 1 2
on on on on
Event A Event B Event C Event D
DATA MODELING WITH GRAPH DATABASES
DEEPER DIVE INTO GRAPH DATABASES
Pattern: Stream Analysis5
http://blog.bruggen.com/2013/11/clickstreams-are-so-much-nicer-in-neo4j.html
MISCELLANEOUS
DEEPER DIVE INTO GRAPH DATABASES
DEEPER DIVE INTO GRAPH DATABASES
Fine-Grained Relationships Generic Relationships
name: bode name: bode
miller miller
Olympic Address
_Address Type:Olympic
Address:123 Address:123
Fake Street Fake Street
DEEPER DIVE INTO GRAPH DATABASES
OLTP • Graph Databases
• Native Graph Storage
• Graph Compute Engines
OLAP •
•
Index-free adjacency processing
Identify clusters in data
• Optimized for scanning and processing large sets
DEEP DIVE INTO GRAPH DATABASES
• Monitoring
Enterprise • Live backups
Ready • High performance caches,
• HA clustering
DEEPER DIVE INTO GRAPH DATABASES
• Joins have a low cost
• Index-free “adjacency of entities”
Physical • Performance is in related to the result size
Model • CONS
• Tabular Data Items
• Blobs
• Social
• Recommendations
Common • Geo
Use Cases •
•
Master Data Management
Network & Data Center Mgmt
• Authorization and Access Control
• Multiple Picks
• Multiple Drops
Logistic Use
Case
DATA MODELING WITH GRAPH DATABASES
LOGISTICS USE CASE
• Carries need to optimize
Multiple • Make multiple pickups
Picks/Drops • Make multiple drop-offs
Numerous • MIT Supply Chain Management
Examples
LOGISTICS USE CASE
PickDropRef
Carrier Site
PK PickDropRefID
PK CarrierID PK SiteID
FK1 CarrierID
Name FK2 SiteID Name
SomeAttribute SiteType SomeAttribute
Sequence
LOGISTICS USE CASE
Package Drop
2 C
Pickup
Carrier
A
Package Drop Pickup Package
1 B B 3
• Users
Security Use
Case
DATA MODELING WITH GRAPH DATABASES
SECURITY USE CASE1
• 1“Ensure that users and
Authorization administrators see and change only
& Access those parts of the organization and
Control the products and services they are
entitled to manage.”
1“This model comprises
two hierarchies. The first
hierarchy, admins within
each customer
organization are assigned
to groups; these groups
are then granted various
permissions against that
organization’s structure.”
“Graph Databases”1
• Graph Modeling
• Graph Databases
• Tail Wind Informatics
Summary • Ross McNeely
DATA MODELING WITH GRAPH DATABASES
REFERENCES
1 “Graph Databases by Ian Robinson, Jim Webber, Emil Eifrem (O’Reilly). Copyright 2013 Neo Technology, Inc.,
978-1-449-35626-2.”
2 “Controlling Complexity in Enterprise Architectures: The SIP Methodology by Roger Sessions” (ObjectWatch).
3 http://www.neo4j.org/develop/modeling (Michael Hunger)
4 http://en.wikipedia.org/wiki/R-tree
5 http://blog.bruggen.com/2013/11/clickstreams-are-so-much-nicer-in-neo4j.html
6 http://www.himss.org/News/NewsDetail.aspx?ItemNumber=22312
General References
https://www.gartner.com/doc/2081316
http://www.neo4j.org/learn/neo4j
http://franz.com/agraph/allegrograph/
http://www.hypergraphdb.org/index
http://scm.mit.edu/research