0% found this document useful (0 votes)
5 views

Module 3

Uploaded by

nehal1103sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Module 3

Uploaded by

nehal1103sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

NoSQL

Module 3
SQL Databases
RDMS Database
Era of Distributed Computing
But...
❑ Relational databases were not built
for distributed applications.
Because...
❑ Joins are expensive
❑ Hard to scale horizontally
❑ Impedance mismatch occurs
❑ Expensive (product cost, hardware,
Maintenance)
And....
It’s weak in:
❑ Speed (performance)
❑ High availability
❑ Partition tolerance
New Trends…
• Massive write performance.
• Fast key value look ups.
Characteristics • Flexible schema and data types.
Required – Use • No single point of failure.
Cases • Fast prototyping and development.
• Out of the box scalability.
• Easy maintenance.
Performance of
RDBMS
• Nothing. One size fits all? Not really.
• Impedance mismatch.
• Object Relational Mapping doesn't work
quite well.
What went • Rigid schema design.
wrong with • Harder to scale.
RDBMS? • Replication.
• Joins across multiple nodes? Hard.
• How does RDMS handle data growth? Hard.
• Need for a DBA.
Introduction to NoSQL

NoSQL stands for Not Only


SQL
It’s more than rows in tables

It’s free of joins

It’s schema-free
What is It works on many processors
NoSQL? It uses shared-nothing commodity computers

It supports linear scalability

It’s innovative
It’s not about the SQL language

It’s not only open source


What It’s not only big data
NoSQL is It’s not about cloud computing
NOT? It’s not about a clever use of RAM and SSD

It’s not an elite group of products


Volume

Velocity

Variability
NoSQL Business
Drivers Agility

• The most complex part of building applications


using RDBMSs is the process of putting data
into and getting data out of the database. If
your data has nested and repeated subgroups
of data structures, you need to include an
object-relational mapping layer.
• Atomicity: All or nothing.
• Consistency: Consistent state of data and
transactions.
• Isolation: Transactions are isolated from each
other.
• Durability: When the transaction is committed,
ACID state will be durable.

Semantics Any data store can achieve Atomicity, Isolation and


Durability but do you always need consistency? No.

By giving up ACID properties, one can achieve


higher performance and scalability.
A distributed system can support only
two of the following characteristics:
• Consistency
• Availability
Brewer’s
• Partition tolerance
CAP • Proven by Nancy Lynch et al. MIT labs.
Theorem
• http://www.cs.berkeley.edu/~brewer/
cs262b-2004/PODC-keynote.pdf

14
• Consistency: Clients should read the
same data. There are many levels of
consistency.
– Strict Consistency – RDBMS.
– Tunable Consistency – Cassandra.
– Eventual Consistency – Amazon
Consistency Dynamo.
• Client perceives that a set of
operations has occurred all at once –
Pritchett
• More like Atomic in ACID transaction
properties

14 August 2024 15
• Availability: Data to be available.
• Node failures do not prevent
survivors from continuing to operate
Availability – Wikipedia
• Every operation must terminate in an
intended response – Pritchett

14 August 2024 16
• Partial Tolerance: Data to be
partitioned across network segments
due to network failures.
• the system continues to operate
despite arbitrary message loss –
Partition Tolerance Wikipedia
• Operations will complete, even if
individual components are
unavailable – Pritchett

14 August 2024 17
➢ ACID:
• Strong consistency.
• Less availability.
• Pessimistic concurrency.
• Complex.
A Clash of ➢ BASE:
cultures • Availability is the most important thing.
Willing to sacrifice for this (CAP).
• Weaker consistency (Eventual).
• Best effort.
• Simple and fast.
• Optimistic.
Why NoSQL?

NoSQL stands for Not Only


SQL
▪ A new class of databases emerged, which
mainly follow the BASE properties
▪ These were dubbed as NoSQL databases
▪ E.g., Amazon’s Dynamo and Google’s Bigtable
NoSQL ▪ Main characteristics of NoSQL databases
Databases include:
▪ No strict schema requirements
▪ No strict adherence to ACID properties
▪ Consistency is traded in favor of Availability
• Key-Value Store – Stores data as
values in hash table of keys
• Column Store – Each storage block
NoSQL Data contains data from only one column
Architecture • Document Store – Stores documents
Patterns made up of tagged elements
• Graph Databases – Stores data as
nodes and relationships that can be
traversed

14 August 2024 21
▪ Keys are mapped to (possibly) more complex value
(e.g., lists)

▪ Keys can be stored in a hash table and can be


distributed easily
Key-Value
Stores ▪ Such stores typically support regular CRUD (create,
read, update, and delete) operations
▪ That is, no joins and aggregate functions

▪ E.g., Amazon DynamoDB and Apache Cassandra


Storing RDBMS data as Key-Value pair

Employee Table
(Name – employees)

Format for Key-value representation $table_name:$primary_key_value:$attribute_name = $value

employee:$employee_id:$attribute_name = $value
Key-Value form representation employee:1:first_name = "John"
of Employee Table employee:1:last_name = "Doe"
employee:1:address = "New York“
employee:2:first_name = "Benjamin"
employee:2:last_name = "Button"
employee:2:address = "Chicago"
Retrieving data from Key-value store
• Consider SQL query:
SELECT employee_id FROM employees WHERE address = “New York”;
• In Key-value, method call: getEmployeeIDList(attribute:"address", value:"New York");
• You should implement the above Java function to achieve this
functionality.
1. public List<Integer> getEmployeeIDList(String attribute, String value) {
2. List<Integer> employeeIDs = new ArrayList();
3.
4. DBIterator keyIterator = levelDBStore.iterator();
5. keyIterator.seek(bytes("employee")); // moves the iterator to the keys starting with "employee"
Retrieving data from Key-value store- cont…
6. try { while (keyIterator.hasNext()) {
7. String key = asString(keyIterator.peekNext().getKey()); // key arrangement : employee:$employee_id:$attribute_name = $value
8. String[] keySplit = key.split(":"); // split the key
9. int employeeID = Integer.parseInt(keySplit[1]);
10. if (keySplit[keySplit.length - 1].equals(attribute)) { // check the attribute
11. String storedValue = asString(levelDBStore.get(bytes(key)));
12. if(storedValue.equals(value)){ // check the value
13. employeeIDs.add(employeeID); } } // if both checks are valid, employee id is added
14. if (!keySplit[0].equals("employee")) break; // breaking condition : prefix is not "employee"

15. keyIterator.next(); } }
16. finally { keyIterator.close(); }
17. return employeeIDs; // return resulted employee ids
▪ Columnar databases are a hybrid of RDBMSs and
Key-Value stores
▪ Values are stored in groups of zero or more columns,
but in Column-Order (as opposed to Row-Order)
▪ Values are queried by matching keys
▪ E.g., HBase and Vertica
Columnar Record 1 Column A

Alice 3 25 Bob Alice Bob Carol


Databases 4
45
19 Carol 0 3
19
4
45
0 25

Row-Order Columnar (or Column-Order)


Column A = Group A

Alice Bob Carol


3 25 4 19
0 45
Column Family {B, C}
Columnar with Locality Groups
Row verses
• Representing RDBMS data in
Columnar HBASE or Cassandra
Databases
Column Family
Query on Columnar DB – Example HBase
• To create a new table : Specify table name and ColumnFamily name
• create ‘test’, ‘cf’
• list and describe are used to obtain information of table and it’s description
respectively
• list ‘test’ describe ‘test’
• To insert data into a table: Use put command
• put ‘test’, ‘row1’, ‘cf:a’, ‘value1’
• put ‘test’, ‘row2’, ‘cf:b’, ‘value2’
• Retrieval of data using get command
• get ‘test’, ‘row1’
• Output: Column Cell
cf:a timestamp=6782168192, value=value1
• Other shell commands: disable, enable, drop,
▪ Documents are stored in some
standard format or encoding (e.g.,
XML, JSON, PDF or Office Documents)
▪ These are typically referred to as Binary
Document Large Objects (BLOBs)

Stores ▪ Documents can be indexed


▪ This allows document stores to
outperform traditional file systems
▪ E.g., MongoDB and CouchDB (both
can be queried using MapReduce)
Sample JSON
document
• Here you can see that the
JSON document holds
primitive types as values as
well as other JSON objects
and array types.
• JSON documents allow you to
create a hierarchy of
embedded JSON objects to an
unlimited level.
• It's completely up to the user
what shape he or she wants
to give to the data stored in a
NoSQL document database.
Document Database as a • Data is structured in the form of
documents and collections.

Collection • A document can be a PDF,


Microsoft word doc, XML or JSON
file.
• A document contains key value
pairs. Each document does not
have to be in the same structure as
other documents.
• Simply add more documents
without having to change the
structure of the entire database.
• Documents are grouped into
collections, which serve a similar
purpose to a relational table.
• Separation of collections by entity
(orders and customer profiles).
Query with Document Store – Example MongoDB
▪ Data are represented as vertices and edges
▪ Some don’t consider it under NoSQL
▪ E.g., Neo4j and VertexDB
▪ Resource Description Framework (RDF) of
Graph WWW has a triplet form (SPO) which is a
type of graph store.
Databases ▪ SPARQL is a semantic query language for
retrieving/ manipulating data in RDF format.
▪ Graph databases are powerful for graph-like
queries (e.g., find the shortest path between
two elements)
Graph Model Components

Id: 2
Name: Bob
Age: 34

Id: 1
Name: Alice
Age: 27

Id: 3
Name:
Chess
Type: Group
Graph Store - Example
RDF Triple and SPARQL query - Example
Slumdog Millionaire
2008
Danny Boyle
➢RDF Triples
(id1, hasTitle, " Slumdog Millionaire "),
releaseY
hasTitle ear
(id1, releaseYear, "2009"),
hasName
(id1, directedBy,id7)
id1
id7 (id7,hasName,“Danny Boyle"),
directedBy (id1, hasCasting, id2),
hasCasting
(id2, roleName, “Latika"),
roleName (id2, actor, id11),
Latika
(id11, hasName, " Freida Pinto"),…….
id2
➢SPARQL query
actor Select ?title Where { ?p <hasTitle> ?title.
hasName ?p <hasCasting> ?s. ?s <actor> ?c.
id11 Freida Pinto ?c <hasName> “Freida Pinto“ }
Primary and Simplest Version to Structured Version
Application Areas
of NoSQL

You might also like