Introduction To NoSQL Database
AIDS – B.E – BDA
Dr. Pooja K Revankar
Assistant Professor,
Dept. of Computer Science and Engg.,
SIES Graduate School of Technology
1
Dr. Pooja K R
Agenda
• Introduction to NoSQL
• Limitations of Relational Database
• What is NoSQL
• Business Drivers of NoSQL
• NoSQL Data Architecture Patterns
• NoSQL solution for big data
• Choosing distribution models
2
Dr. Pooja K R
Introduction to NoSQL Databases
•A database Management System provides the mechanism to store
and retrieve the data.
•There are different kinds of database Management Systems:
1. RDBMS (Relational Database Management Systems)
2. OLAP (Online Analytical Processing)
3. NoSQL (Not only SQL)
3
Dr. Pooja K R
Different SQL Databases
4
Dr. Pooja K R
What is NoSQL?
NoSQL is a set of concepts that allows the rapid and
efficient processing of data sets with a focus on
performance, reliability, and agility.
5
Dr. Pooja K R
Limitations of Relational databases
•Need to define structure and schema of data first and then
only we can process the data.
•Provides consistency and integrity of data by
enforcing ACID properties.
•Most of the applications store their data in JSON format.
•RDBMS don’t provide you a better way of performing
operations such as create, insert, update, delete etc on this
data.
6
Dr. Pooja K R
Advantages of NoSQL
•High scalability
•High Availability
7
Dr. Pooja K R
RDBMS Vs NoSQL
• RDBMS: It is a structured data that provides more functionality but
gives less performance.
• NoSQL: Structured or semi structured data, less functionality and high
performance.
8
Dr. Pooja K R
NOSQL DATABASES
9
Dr. Pooja K R
What is NoSQL?
• More than rows in tables
• Free of joins
• Schema-free
• Works on many processors
• Uses shared-nothing commodity computers
• Supports linear scalability
• Innovative
10
Dr. Pooja K R
NoSQL Database Categories
•Document Database
•Key value stores
•Graph store
•Wide column stores
11
Dr. Pooja K R
NoSQL Data Architecture Patterns
12
Dr. Pooja K R
NOSQL BUSINESS DRIVERS
VOLUME
VELOCITY
VARIABILITY
AGILITY
13
Dr. Pooja K R
What is the CAP Theorem?
CAP theorem is also called brewer's theorem. It states that
is impossible for a distributed data store to offer more than
two out of three guarantees:
1. Consistency
2. Availability
3. Partition Tolerance
14
Dr. Pooja K R
BASE Properties
15
Dr. Pooja K R
BASE Properties
NoSQL relies upon a softer model known as the BASE model(instead of
ACID properties)
Basically Available: Guarantees the availability of the data . There
will be a response to any request (can be failure too).
Soft state: The state of the system could change over time.
Eventual consistency: The system will eventually become
consistent once it stops receiving input.
16
Dr. Pooja K R
NoSQL Database Categories
•Document Database
•Key value stores
•Graph store
•Wide column stores
17
Dr. Pooja K R
NoSQL Data Architecture Patterns
18
Dr. Pooja K R
Data Models
NoSQL databases are classified in four major data
models :
19
Dr. Pooja K R
Key-value
Simplest NOSQL databases
The main idea is the use of a hash table
Access data (values) by strings called keys
Data has no required format
Data model: (key, value) pairs
Key maps to a BLOB(Binary Large Object)
Example of Key-value store DataBase : Redis,
Dynamodb, Riak, Memcache etc.
20
Dr. Pooja K R
Operations using KEY VALUE STORE
• Get(key)
• Put (key, value)
• Multi-get(Key1, Key2,….Keyn)
• Delete(key)
21
Dr. Pooja K R
KEY VALUE STORE PROS
Any data type in value field
Consistent
Returned values on queries can be used to convert into lists,
data frames etc.
Scalable
Reliable
Key can be synthetic or auto generated
22
Dr. Pooja K R
KEY VALUE STORE CONS
No indexes are made on values.
Do not provide traditional DBMS capabilities ,such as ACID
properties when multiple transactions are executed
simultaneously.
No queries on values.
Maintaining unique keys is a problem if volume is large.
23
Dr. Pooja K R
Key Value Stores
24
Dr. Pooja K R
Key Value Stores
25
Dr. Pooja K R
Document-Based Store NoSQL
•In this type of database, the record and its associated data are stored
in a single document.
•So this model is not completely unstructured but it is a kind of Semi-
structured data.
•The difference between a document and Key value pair is that in
document type storage is that in this type some kind of encoding is
provided while storing the data in documents.
• It can be XML encoding or JSON encoding.
•The below example shows a document that can be stored in a
document database but with a different encoding.
26
Dr. Pooja K R
DOCUMENT STORES
The central concept of a document-oriented database is the notion
of a document.
Documents in a document store are roughly equivalent to the
programming concept of an object.
They are not required to adhere to a standard schema, nor will
they have all the same sections, slots, parts or keys.
Generally, programs using objects have many different types of
objects, and those objects often have many optional fields.
Every object, even those of the same class, can look very different.
Document stores are similar in that they allow different types of
documents in a single store, allow the fields within them to be
optional, and often allow them to be encoded using different
encoding systems.
27
Dr. Pooja K R
DOCUMENT STORES
JSON DOCUMENT XML DOCUMENT
28
Dr. Pooja K R
DOCUMENT STORES
29
Dr. Pooja K R
Document-Based Store NoSQL
•The document type is mostly used for CMS systems, blogging
platforms, real-time analytics & e-commerce applications. It should not
use for complex transactions which require multiple operations or
queries against varying aggregate structures.
•Amazon SimpleDB, CouchDB, MongoDB, Riak, Lotus Notes,
MongoDB, are popular Document originated DBMS systems.
30
Dr. Pooja K R
Example:
•The difference between conventional databases and document-based
databases is that data here is not stored in tables like conventional
databases but are stored in documents.
•The examples of databases using the above data model are MongoDB
and Couchbase.
•These types of databases are used extensively especially in big data
analysis.
31
Dr. Pooja K R
COLUMN ORIENTED DATABASES
Column-oriented databases primarily work on columns and every column is treated
individually.
Values of a single column are stored contiguously.
Column stores data in column specific files.
In Column stores, query processors work on columns too.
All data within each column data file have the same type which makes it ideal for
compression.
Column stores can improve the performance of queries as it can access specific
column data.
High performance on aggregation queries (e.g. COUNT, SUM, AVG, MIN, MAX).
Works on data warehouses and business intelligence, customer relationship
management (CRM), Library card catalogs etc.
32
Example of Column-oriented databases : BigTable, Cassandra, SimpleDB etc
Dr. Pooja K R
COLUMN-ORIENTED DATABASE
33
Dr. Pooja K R
GRAPH DATABASES
A graph database stores data in a graph.
It is capable of elegantly representing any kind of data in a highly
accessible way.
A graph database is a collection of nodes and edges.
Each node represents an entity (such as a student or business)
and each edge represents a connection or relationship between
two nodes.
Every node and edge is defined by a unique identifier.
Each node knows its adjacent nodes.
As the number of nodes increases, the cost of a local step (or hop)
remains the same.
Index for lookups.
Example of Graph databases: OrientDB, Neo4J, Titan.etc.
38
Dr. Pooja K R
GRAPH STORES
39
Dr. Pooja K R
GRAPH STORES
40
Dr. Pooja K R
Analyzing big data with a shared-nothing architecture
41
Dr. Pooja K R
Analyzing big data with a shared-nothing architecture
42
Dr. Pooja K R
Analyzing big data with a shared-nothing architecture
•A shared nothing architecture (SN) is a distributed computing
architecture in which each node is independent and self-sufficient, and
there is no single point of contention across the system.
•More specifically, none of the nodes share memory or disk storage.
•People typically contrast SN with systems that keep a large amount of
centrally-stored state information, whether in a database, an application
server, or any other similar single point of contention.
43
Dr. Pooja K R
Analyzing big data with a shared-nothing architecture
•The advantages of SN architecture versus a central entity that controls
the network (a controller-based architecture) include eliminating any
single point of failure, allowing self-healing capabilities and providing an
advantage with offering non-disruptive upgrades.
•Shared nothing is popular for web development because of its
scalability.
•SN system can scale almost infinitely simply by adding nodes in the
form of inexpensive computers, since there is no single bottleneck to
slow the system down.
•A SN system typically partitions its data among many nodes on
different databases (assigning different computers to deal with different
users or queries),
• It may require every node to maintain its own copy of the application's
data, using some kind of coordination protocol. This is often referred to
as database sharding.
44
Dr. Pooja K R
Choosing distribution models: master-slave versus peer-to-peer
45
Dr. Pooja K R
Master-slave versus peer-to-peer
• In master-slave configuration where all incoming database requests
(reads or writes) are sent to a single master node and redistributed
from there.
•The master node is called the NameNode in Hadoop.
• This node keeps a database of all the other nodes in the cluster and
the rules for distributing requests to each node.
• In the peer-to-peer model stores all the information about the cluster
on each node in the cluster.
•If any node crashes, the other nodes can take over and processing
can continue.
46
Dr. Pooja K R
Choosing distribution models: master-slave versus peer-to-
peer
• Peer-to-peer systems distribute the responsibility of the master to
each node in the cluster.
• In this situation, testing is much easier since you can remove any
node in the cluster and the other nodes will continue to function.
•The disadvantage of peer-to-peer networks is that there’s an
increased complexity and communication overhead that must occur for
all nodes to be kept up to date with the cluster status.
47
Dr. Pooja K R
Master Slave Distribution Model
•With a master-slave distribution model, the role of managing the
cluster is done on a single master node.
•This node can run on specialized hardware such as RAID drives to
lower the probability that it crashes.
•The cluster can also be configured with a standby master that’s
continually updated from the master node.
•The challenge with this option is that it’s difficult to test the standby
master without jeopardizing the health of the cluster.
•Failure of the standby master to take over from the master node is a
real concern for high-availability operations.
48
Dr. Pooja K R
NoSQL systems to handle big data problems
49
Dr. Pooja K R
Case Study:
• Google maps stores GIS in Bigtable
• Storing analytical information in BigTables
•References:
• https://dzone.com/articles/what-nosql
50
Dr. Pooja K R
Thank You!
(poojakr@sies.edu.in)
51
Dr. Pooja K R