Module 3
Module 3
Module 3
SQL Databases
RDMS Database
Era of Distributed Computing
But...
❑ Relational databases were not built
for distributed applications.
Because...
❑ Joins are expensive
❑ Hard to scale horizontally
❑ Impedance mismatch occurs
❑ Expensive (product cost, hardware,
Maintenance)
And....
It’s weak in:
❑ Speed (performance)
❑ High availability
❑ Partition tolerance
New Trends…
• Massive write performance.
• Fast key value look ups.
Characteristics • Flexible schema and data types.
Required – Use • No single point of failure.
Cases • Fast prototyping and development.
• Out of the box scalability.
• Easy maintenance.
Performance of
RDBMS
• Nothing. One size fits all? Not really.
• Impedance mismatch.
• Object Relational Mapping doesn't work
quite well.
What went • Rigid schema design.
wrong with • Harder to scale.
RDBMS? • Replication.
• Joins across multiple nodes? Hard.
• How does RDMS handle data growth? Hard.
• Need for a DBA.
Introduction to NoSQL
It’s schema-free
What is It works on many processors
NoSQL? It uses shared-nothing commodity computers
It’s innovative
It’s not about the SQL language
Velocity
Variability
NoSQL Business
Drivers Agility
14
• Consistency: Clients should read the
same data. There are many levels of
consistency.
– Strict Consistency – RDBMS.
– Tunable Consistency – Cassandra.
– Eventual Consistency – Amazon
Consistency Dynamo.
• Client perceives that a set of
operations has occurred all at once –
Pritchett
• More like Atomic in ACID transaction
properties
14 August 2024 15
• Availability: Data to be available.
• Node failures do not prevent
survivors from continuing to operate
Availability – Wikipedia
• Every operation must terminate in an
intended response – Pritchett
14 August 2024 16
• Partial Tolerance: Data to be
partitioned across network segments
due to network failures.
• the system continues to operate
despite arbitrary message loss –
Partition Tolerance Wikipedia
• Operations will complete, even if
individual components are
unavailable – Pritchett
14 August 2024 17
➢ ACID:
• Strong consistency.
• Less availability.
• Pessimistic concurrency.
• Complex.
A Clash of ➢ BASE:
cultures • Availability is the most important thing.
Willing to sacrifice for this (CAP).
• Weaker consistency (Eventual).
• Best effort.
• Simple and fast.
• Optimistic.
Why NoSQL?
14 August 2024 21
▪ Keys are mapped to (possibly) more complex value
(e.g., lists)
Employee Table
(Name – employees)
employee:$employee_id:$attribute_name = $value
Key-Value form representation employee:1:first_name = "John"
of Employee Table employee:1:last_name = "Doe"
employee:1:address = "New York“
employee:2:first_name = "Benjamin"
employee:2:last_name = "Button"
employee:2:address = "Chicago"
Retrieving data from Key-value store
• Consider SQL query:
SELECT employee_id FROM employees WHERE address = “New York”;
• In Key-value, method call: getEmployeeIDList(attribute:"address", value:"New York");
• You should implement the above Java function to achieve this
functionality.
1. public List<Integer> getEmployeeIDList(String attribute, String value) {
2. List<Integer> employeeIDs = new ArrayList();
3.
4. DBIterator keyIterator = levelDBStore.iterator();
5. keyIterator.seek(bytes("employee")); // moves the iterator to the keys starting with "employee"
Retrieving data from Key-value store- cont…
6. try { while (keyIterator.hasNext()) {
7. String key = asString(keyIterator.peekNext().getKey()); // key arrangement : employee:$employee_id:$attribute_name = $value
8. String[] keySplit = key.split(":"); // split the key
9. int employeeID = Integer.parseInt(keySplit[1]);
10. if (keySplit[keySplit.length - 1].equals(attribute)) { // check the attribute
11. String storedValue = asString(levelDBStore.get(bytes(key)));
12. if(storedValue.equals(value)){ // check the value
13. employeeIDs.add(employeeID); } } // if both checks are valid, employee id is added
14. if (!keySplit[0].equals("employee")) break; // breaking condition : prefix is not "employee"
15. keyIterator.next(); } }
16. finally { keyIterator.close(); }
17. return employeeIDs; // return resulted employee ids
▪ Columnar databases are a hybrid of RDBMSs and
Key-Value stores
▪ Values are stored in groups of zero or more columns,
but in Column-Order (as opposed to Row-Order)
▪ Values are queried by matching keys
▪ E.g., HBase and Vertica
Columnar Record 1 Column A
Id: 2
Name: Bob
Age: 34
Id: 1
Name: Alice
Age: 27
Id: 3
Name:
Chess
Type: Group
Graph Store - Example
RDF Triple and SPARQL query - Example
Slumdog Millionaire
2008
Danny Boyle
➢RDF Triples
(id1, hasTitle, " Slumdog Millionaire "),
releaseY
hasTitle ear
(id1, releaseYear, "2009"),
hasName
(id1, directedBy,id7)
id1
id7 (id7,hasName,“Danny Boyle"),
directedBy (id1, hasCasting, id2),
hasCasting
(id2, roleName, “Latika"),
roleName (id2, actor, id11),
Latika
(id11, hasName, " Freida Pinto"),…….
id2
➢SPARQL query
actor Select ?title Where { ?p <hasTitle> ?title.
hasName ?p <hasCasting> ?s. ?s <actor> ?c.
id11 Freida Pinto ?c <hasName> “Freida Pinto“ }
Primary and Simplest Version to Structured Version
Application Areas
of NoSQL