09 - Introduction to Graph Data Model
09 - Introduction to Graph Data Model
Introduction to the
Graph Data Model
Mark Fontenot, PhD
Northeastern University
Material referenced from Graph Algorithms - Practical Examples in Apache Spark and Neo4j by Needham and Hodler (O’Reilly Press, 2019)
What is a Graph Database
- Data model based on the graph data structure
- Composed of nodes and edges
- edges connect nodes
- each is uniquely identified
- each can contain properties (e.g. name, occupation, etc)
- supports queries based on graph-oriented operations
- traversals
- shortest path
- lots of others
2
Where do Graphs Show up?
- Social Networks
- yes… things like Instagram,
- but also… modeling social interactions in fields like
psychology and sociology
- The Web
- it is just a big graph of “pages” (nodes) connected by
hyperlinks (edges)
- Chemical and biological data
- systems biology, genetics, etc.
- interaction relationships in chemistry
3
Basics of Graphs and Graph Theory
4
What is a graph?
Labeled Property Graph
- Composed of a set of node (vertex) objects and
relationship (edge) objects
- Labels are used to mark a node as part of a group
- Properties are attributes (think KV pairs) and can
exist on nodes and relationships
- Nodes with no associated relationships are OK.
Edges not connected to nodes are not permitted.
5
Example
2 Labels:
- person
- car
4 relationship types:
- Drives
- Owns
- Lives_with
- Married_to
Properties
6
Paths
A path is an ordered sequence of nodes connected by
edges in which no nodes or edges are repeated.
1 3
2
Ex: 1 → 2 → 6 → 5
6 5
4
Not a path:
1→2→6→2→3
7
Flavors of Graphs
Connected (vs. Disconnected) – there is a path between
any two nodes in the graph
Weighted (vs. Unweighted) – edge has a weight property
(important for some algorithms)
Directed (vs. Undirected) – relationships (edges) define a
start and end node
Acyclic (vs. Cyclic) – Graph contains no cycles
8
Connected vs. Disconnected
9
Weighted vs. Unweighted
10
Directed vs. Undirected
11
Cyclic vs Acyclic
12
Sparse vs. Dense
13
Trees
14
Types of Graph Algorithms - Pathfinding
- Pathfinding
- finding the shortest path between two nodes, if one exists, is
probably the most common operation
- “shortest” means fewest edges or lowest weight
- Average Shortest Path can be used to monitor efficiency and
resiliency of networks.
- Minimum spanning tree, cycle detection, max/min flow… are
other types of pathfinding
15
BFS vs DFS
16
Shortest Path
17
Types of Graph Algorithms - Centrality &
Community Detection
- Centrality
- determining which nodes are “more important” in a network
compared to other nodes
- EX: Social Network Influencers?
- Community Detection
- evaluate clustering or partitioning of nodes of a graph and
tendency to strengthen or break apart
18
Centrality
19
Some Famous Graph Algorithms
- Dijkstra’s Algorithm - single-source shortest path
algo for positively weighted graphs
- A* Algorithm - Similar to Dijkstra’s with added
feature of using a heuristic to guide traversal
- PageRank - measures the importance of each node
within a graph based on the number of incoming
relationships and the importance of the nodes from
those incoming relationships
20
Neo4j
- A Graph Database System that supports both
transactional and analytical processing of
graph-based data
- Relatively new class of no-sql DBs
- Considered schema optional (one can be imposed)
- Supports various types of indexing
- ACID compliant
- Supports distributed computing
- Similar: Microsoft CosmoDB, Amazon Neptune
21
??
22