InfoAdvisors MDM Neo4j Graph
InfoAdvisors MDM Neo4j Graph
InfoAdvisors MDM Neo4j Graph
InfoAdvisors 1
Your Master Data Is a Graph: Are You Ready?
to a single location. Even with a CRM system in place, we typically end up with customer information
maintained in several systems. The same is true for product and accounting data as well.
The most successful programs will not strive to find a single physical location for all data, but will
provide the standards, tools and services necessary to provide a consistent vision of enterprise data.
There will be data we can store in one place, using the technologies that best fit its data story. Data will
also likely be found in multiple physical systems due to the increasing use of packaged applications as
well for performance and geographically-distributed processing needs. Once we understand our
environment, we can architect solutions that build upon those needs.
The most successful programs will not strive to find a single physical location for all
data of one concept, but will provide the standards, tools and services necessary to
provide a consistent vision of enterprise data.
The future of master data management will derive value from data and its relationships to other data.
MDM will be about supplying consistent, meaningful views of master data. In many cases, we will be
able to unify data into one location, especially to optimize for query performance and data fit. Graph
databases offer exactly that type of data/performance fit, as we will see below. In this paper, we discuss
why your master data is a graph and how graph databases like Neo4j are the best technologies for
master data.
InfoAdvisors 2
Your Master Data Is a Graph: Are You Ready?
Networks
A network comprises a set of nodes, with relationships between them.
Figure 1- Network
You might think of a network of friends, servers, databases or customers. In a graph database, the nodes
in a graph don’t have to be all the same type of thing – and neither do the relationships. Instead of
defining an entity that imposes a standard on all instances, we build nodes and relationships that can
each have their own set of properties. The nodes in Figure 1- Network might be customers, orders,
products and promotions. Each of these could have different types of properties. Not just by type of
node, but also by instance of each node. For the customer node, we might have given name and family
name, birthdate and acquisition date as properties. But we might not have all of those properties for all
customers. In a relational database, we would define common properties but set some of them to NULL
if they don’t have values for that instance. The relationships we have between nodes gets the same
power: we can store properties about each relationship without imposing those properties on all of
them. That’s right, we can store properties about the relationships as well. This is a key difference from
relational databases, where we would need to convert a relationship to a table to store metadata about
that relationships.
We don’t have to have the same structure for all instances in a graph database. As we will see later,
below, this flexibility pays off for real-world data stories.
InfoAdvisors 3
Your Master Data Is a Graph: Are You Ready?
George
Tilly
Bert Janie
Ernie Pete
Tommy
Zuzu
We can easily implement this in a relational database, with a recursive relationship on an EMPLOYEE
table. A small hierarchy such as in Figure 2-Employee Hierarchy, maintaining those reporting
relationships is easy. As soon as we model a much larger set, though, maintenance gets more
expensive. We often need to use workarounds such as special hierarchy datatypes or calculated
columns that keep track of levels and pointers. Then what happens when a node gets a promotion? All
those relationships must be reset and recalculated. If that node participates in many types of
hierarchies, it’s possible that many relationships must be reset and recalculated. This data story is so
complex there are specialized books and tutorials on how to design and maintain these workarounds.
Even with all those tricks, we still struggle to maintain the data and application performance levels.
InfoAdvisors 4
Your Master Data Is a Graph: Are You Ready?
Even though we think of them as hierarchies, most business hierarchies are actually networks. Once you
have one real-life complexity of multiple reporting relationships with multiple types of relationships
your vision of a beautiful, perfect hierarchy is destroyed. This happens with organizational charts,
product “hierarchies”, locations and document. Once you see your world as not fitting a true hierarchy,
you can see all the graphs around you.
In a graph database, the logical model is the physical model. You can even think about the graph model
as a service model. We can do whiteboard-level modeling of nodes and relationships, then add
properties and labels. At that point we have completed all the data modeling we need to do to
implement in a graph database. If we have traditional logical data models we can even pull in properties
we already know about into our graph model. Because the logical model is the only model, going from
data to database takes significantly less time and fewer resources (modelers, architects, DBAs and
developers) than building relational master data solutions.
We can also label each instance so that we can query our data based on a role they fill. For instance, we
might want to query only organizational customers in certain queries.
InfoAdvisors 5
Your Master Data Is a Graph: Are You Ready?
In Figure 4 - Employee Roles, Activities, Skills, Degrees, Teams Graph, we can see that both nodes and
relationships have varied properties. This is our data model, just as we might draw it on a whiteboard.
InfoAdvisors 6
Your Master Data Is a Graph: Are You Ready?
Today they use Neo4j to allow buyers to pre-package groups of items so that sales associates can quickly
provide bundles for customers. Leveraging the flexibility and the focus on product and promotion
relationships in Neo4j, they have been able to reduce the time required to put together product bundles
from one month to 10 days. In store, they can help customers choose the items they want and sell them
other services or items to bundle well with those items. Using the many different types of data
relationships in their product and customer master data is key to improving customer service,
completing orders and decreasing cost of sales, as well as substantially reducing queue times at their
physical stores.
This retailer hosts nearly one million nodes and millions of relationships between those nodes and
continues to scale up as volume and number of users increases. They are already working on adding
Neo4j to cable and digital TV services and products to bring the same competitive advantages to their
other lines of business.
Polyvore allows shoppers to put together sets of clothing and products in a highly visual, interactive
format. A shopper (a stylist in Polyvore terms) can pull products, their metadata and prices from any
website using a clipper tool, then enhance these sets with graphics, text and personal photos. The stylist
can then put together sets of sets, called collections, to build even more relationships among items.
InfoAdvisors 7
Your Master Data Is a Graph: Are You Ready?
This use case of product data, metadata and stylist-assigned tags is all about the relationships between
these concepts. A traditional product catalog would focus on the products and their properties. Yet in
the Polyvore case, the power is in these shopper-generated data relationships. It’s not just all about the
data; it’s all about the relationships. In using the power of the graph in Neo4j, Polyvore is able to
identify not just trends, but the factors that are influencing those trends spanning prices, styles, colors,
sizes and anything else the community tags items to be. Brands gain insights into how their products are
coveted by the most influential purchasers in e-commerce.
In that data, though, there will be valuable data relationships that a competitive retailer will want to
uncover. While possible to manage and query all these in a relational database, would be very time-
consuming and expensive to do so. In addition, as the volume of that data increases, the more
expensive (in processing time) for a relational database to return results. The more data we collect, the
more real-time results will become less and less feasible. Think about the following questions we might
want to ask of this data:
What sorts of items tend to be purchased together by a category of customer? What about over
time?
Which members of a household tend to make which types of purchases on which days? How
does this vary by distance from a store?
Deduplication of Data
Duplication of master data happens due to acquisition of third party data and mergers of organizations.
The use of vendor packages can also lead to duplication of master data. Graph databases can provide
faster and easier methods for identifying whether customer “Karen Lopez” in Toronto, ON is the same
customer as “Karena Lopez” in Scarborough, ON. We can do this by leveraging the advantages of graph
queries that can analyze all the data relationships across several instances to give a confidence score
each record being the same person. Understanding that two Karen Lopez’s with differing addresses get
the same vehicle serviced or visit a location at the same time allows us to understand the patterns and
InfoAdvisors 8
Your Master Data Is a Graph: Are You Ready?
the relationships in the data. We can use those data relationships to identify master data where we
might not already have common customer identifier.
If I change the rules about use of this promotion on these products, what would the impact be?
If I change the bundle for this set of product and services, how would that impact our average
order totals?
If we retired this bundle, how many customers would be impacted and what bundles should we
offer them as a replacement?
Given the customer demographics of a new sales territory, which products and promotions
might be the best to advertise?
All of these questions focus on the data relationships across sets of master data.
As we’ve seen in the data stories above, having master data in a graph makes asking these questions
easier, more flexible, and faster to process
The rise of polyglot persistence requires CIOs and IM leaders to evaluate new
approaches to data consistency across multiple persistence types, optimized data
access to multiple persistence types, and integration of data across multiple
persistence types.i
InfoAdvisors 9
Your Master Data Is a Graph: Are You Ready?
Organizations that want not only to collect data, but use it to support business decisions in real time.
That means leveraging technologies that provide the best value for the types of questions we ask of our
data. Relational databases aren’t going away; we still have strong use cases for highly structured,
business-rule driven datastores. These stories demand the types of constraints and “sameness” that
relational databases were designed to provide. We need those technologies to enforce integrity in our
data, to ensure we are collecting all the data we need to complete a transaction and to provide a highly-
predictable, stable dataset for using in our business processes. However, those strengths of a relational
systems can get in the way of other use case for managing and analyzing data. That’s why some data
stores demand less structure, fewer constraints and less consistency between instances of data. In fact,
that’s one of the driving reasons for implementing a NoSQL solution.
Performance
With scalability built into the underlying design, Neo4j allows master data solutions to grow as fast as
the business does. As a database built specifically for storing and processing graph queries, unlocking
the value in data relationships is significantly faster than other SQL or NoSQL solutions.
Flexibility
A graph database allows us to maintain data without a prescriptive set of attributes or properties that
are the same for each node – or even for a relationship between nodes. This works to our advantage
when we are bringing together data from multiple sources. For example, we might have customer data
from our online retail presence, our bricks and mortar stores, plus customer data we’ve acquired
through an opt-in mailing list. Each of those data sources likely have overlapping and different
attributes, with a variety of completeness. If we tried to import this data into a relational database,
we’d be forced to either turn off data quality constraints or to cleanse and reject some data in order to
InfoAdvisors 10
Your Master Data Is a Graph: Are You Ready?
meet integrity rules. But we don’t want to do that with this data – we want to bring it all in to a
database so that we can work with the data as it is.
We are also free to add new data, on the fly, because we aren’t trying to force all nodes or relationships
must have the same type of metadata. We can bring in the data we want, relate it to existing data (or
not) and continue on asking questions. Since the logical model is the physical model, we don’t have to
re-architect our solution every time we want to gain new insights about our master data.
This real-world, flexible management of data and data relationships gives organizations the power to ask
more questions, to get more answers, than if we determined all them at data acquisition time. The
flexibility to add more data relationships, in real time, allows organizations to continue to enhance their
data stories.
Summary
We’ve looked at the future of master data management for competitive organizations being a hybrid,
polyglot architecture. We know that the value in master data management isn’t just in the data about
things, but also in the powerful relationships in the data. And we have seen how difficult is it to develop
fast and flexible systems for traversing all those relationships. Traditional relational systems have their
benefits, but traversing a network of master data is expensive, less business agile and slow.
Graph databases such as Neo4j are built from the ground up to support graph data stories, the same
stories that your data wants to tell. The fact that the logical data model is the physical model means
that data professionals can deliver answers to these data relationship questions faster and with more
flexibility than ever before.
What’s Next?
You’ve learned that graph databases have a place in modern enterprise data architectures; what should
you do next?
1. Read Graph Databases http://graphdatabases.com and for a deeper dive into the theory and
practical application of graph technologies.
2. Take the graph database online course http://neo4j.com/graphacademy/online-course
3. Talk to key data-driven business users in your organization about the types of questions they’d
love to ask their data, but have not been able to do so due to technological limitations or costs.
4. Learn about RDBMS to Graph Databases concepts and tools http://neo4j.com/developer/graph-
db-vs-rdbms
5. Help those business users understand that there are data relationships that may not be
apparent in the existing data structures.
6. Download Neo4j http://neo4j.com or use Amazon Machine Image (AMI)
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html to build an environment in
the cloud.
7. Discover new insights into your own master data that will give your organization a competitive
advantage based on data relationships in your existing data.
InfoAdvisors 11
Your Master Data Is a Graph: Are You Ready?
Find us at datamodel.com
Neo Technology is headquartered in San Mateo, CA, with offices in Sweden, UK, Germany, and Malaysia.
For more information, please visit Neo4j.com.
i
Randall, L., Huedecker, N., Feinberg, D. (January 2015) “The Rise of Polyglot Persistence Demands Your Consideration”, Gartner Research,
https://www.gartner.com/doc/2954719/rise-polyglot-persistence-demands-consideration
InfoAdvisors 12