0% found this document useful (0 votes)

9 views

2 NoSQL Databases Principles

Uploaded by

khawla tadist

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

2 NoSQL Databases Principles

Uploaded by

khawla tadist

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 58

NoSQL Databases

Principles
Ecole d’Ingénierie Digitale et d’Intelligence Artificielle (EIDIA)
Cycle Préparatoire formation Ingénieur
Khawla TADIST
Année 2023-2024
Outline
Diﬀerent aspects of data distribution
 Scaling
 Vertical vs. horizontal
 Distribution models
 Sharding
 Replication
 Master-slave vs. peer-to-peer architectures
 CAP properties
 Consistency, Availability and Partition tolerance
 ACID vs. BASE
Scalability

What is scalability?
 Capability of a system to handle growing amounts of data and/or queries without
losing performance,
 Or its potential to be enlarged in order to accommodate such a growth.
 Two general approaches
 Vertical scaling
 Horizontal scaling
Scalability - Vertical Scalability

Vertical scaling (scaling up/down)

 Adding resources to a single node in a system
 Increasing the number of CPUs,
 Extending system memory,
 Using larger disk arrays,
 …
 i.e. larger and more powerful machines are involved
Scalability - Vertical Scalability

Vertical scaling (scaling up/down)

 Traditional choice
 In favor of strong consistency
 Easy to implement and deploy
 No issues caused by data distribution
 …
 Works well in many cases but …
Scalability - Vertical Scalability Drawbacks

Performance limits
 Even the most powerful machine has a limit
 Moreover, everything works well… unless we start approaching such limits
Higher costs
 The cost of expansion increases exponentially
 In particular, it is higher than the sum of costs of equivalent commodity
hardware
Scalability - Vertical Scalability Drawbacks

Proactive provisioning
 New projects/applications might evolve rapidly
 Upfront budget is needed when deploying new machines
 And so ﬂexibility is seriously suppressed
Scalability - Vertical Scalability Drawbacks

Vendor lock-in
 There are only a few manufacturers of large machines
 Customer is made dependent on a single vendor
 Their products, services, but also implementation details, proprietary formats,
interfaces, …
 i.e. it is difficult or impossible to switch to another vendor
Deployment downtime
 Inevitable downtime is often required when scaling up
Scalability - Horizontal Scalability

Horizontal scaling (scaling out/in)

 Adding more nodes to a system
 i.e. the system is distributed across multiple nodes in a cluster
 Choice of many NoSQL systems
Scalability - Horizontal Scalability

Horizontal scaling (scaling out/in)

 Advantages
 Commodity hardware, cost eﬀective
 Flexible deployment and maintenance
 Often surpasses the vertical scaling
 …
 Unfortunately, there are also plenty of false assumptions…
Scalability - Horizontal Scalability
Drawbacks
False assumptions
 Network is reliable
 Network is secure
 Latency is zero
 Bandwidth is infinite
 Topology does not change
 There is one administrator
 Transport cost is zero
Scalability - Horizontal Scalability
Consequences
Significantly increases complexity
 Complexity of management,
 Programming model, …
 Introduces new issues and problems
Synchronization of nodes
 Data distribution
 Data consistency
 Recovery from failures
 …
Scalability - Horizontal Scalability

A standalone node still might be a better option in certain cases

 e.g. for graph databases
 Simply because it is difficult to split and distribute graphs
 In other words
 It can make sense to run even a NoSQL database system on a single node
 No distribution at all is the most preferred/simple scenario
But in general, horizontal scaling does open new possibilities
Scalability - Horizontal Scalability
Architecture
What is a cluster?
 A collection of mutually interconnected commodity nodes
 Based on the shared-nothing architecture
 Nodes do not share their CPUs, memory, hard drives,…
 Each node runs its own operating system instance
 Nodes send messages to interact with each other
 Nodes of a cluster can be heterogeneous
 Data, queries, computation, workload, …
 This is all distributed among the nodes within a cluster
Distribution Models

Generic techniques of data distribution

 Sharding
 Different data on different nodes
 Motivation: increasing volume of data, increasing performance
 Replication
 Copies of the same data on different nodes
 Motivation: increasing performance, increasing fault tolerance
Distribution Models

Both the techniques are mutually orthogonal

 i.e. we can use either of them, or combine them both
NoSQL systems often oﬀer automatic sharding and replication
Distribution Models - Sharding

Sharding (horizontal partitioning)

 Placement of different data on different nodes
 What does different data mean? Different aggregates
– E.g. key-value pairs, documents, …
Distribution Models - Sharding

Sharding (horizontal partitioning)

 Placement of diﬀerent data on diﬀerent nodes
 Related pieces of data that are accessed together should also be kept together
– Specifically, operations involving data on multiple shards should be
avoided
Distribution Models - Sharding

Sharding (horizontal partitioning)

 The questions are…
 How to design aggregate structures?
 How to actually distribute these aggregates?
Distribution Models - Sharding
Sharding (horizontal partitioning)
Distribution Models - Sharding

Objectives
 Uniformly distributed data (volume of data)
 Balanced workload (read and write requests)
 Respecting physical locations
 e.g. diﬀerent data centers for users around the world
 …
Unfortunately, these objectives…
 May mutually contradict each other
 May change in time
Distribution Models - Sharding
Sharding (horizontal partitioning)

Source: Sadalage, Pramod J. - Fowler, Martin: NoSQL Distilled. Pearson Education, Inc., 2013.
Distribution Models - Sharding

How to actually determine shards for aggregates?

 We not only need to be able to place new data when handling write requests,
 But also ﬁnd the data in case of read requests
 i.e. when a given search criterion is provided (e.g. key, id, …),
Distribution Models - Sharding

How to actually determine shards for aggregates?

 We must be able to determine the corresponding shard to the given key
 So that the requested data can be accessed and returned,
 Or failure can be correctly detected when the data is missing
Distribution Models - Sharding

Sharding strategies
 Based on mapping structures
 Placing of data on shards in a random fashion (e.g. round-robin) (Not suitable)
 Based on general rules:
 Hash partitioning,
 Range partitioning
Distribution Models - Replication

Replication
 Placement of multiple copies – replicas – of the same data on diﬀerent nodes
 Replication factor = the number of copies
 Two approaches:
 Master-slave architecture
 Peer-to-peer architecture
Distribution Models - Replication - Master-
Slave
Master-Slave Architecture

Source: Sadalage, Pramod J. - Fowler, Martin: NoSQL Distilled. Pearson Education, Inc., 2013.
Distribution Models - Replication - Master-
Slave

Architecture
 One node is primary (master), all the other secondary (slave)
 Master node bears all the management responsibility
 All the nodes contain identical data
Distribution Models - Replication - Master-
Slave

Architecture
 Read requests can be handled by both the master or slaves
 Suitable for read-intensive applications
 More read requests to deal with → more slaves to deploy
 When the master fails, read operations can still be handled
Distribution Models - Replication - Master-
Slave

Write requests can only be handled by the master

Newly written replicas are propagated to all the slaves
Consistency issue
 Luckily enough, at most one write request is handled at a time
 But the propagation still takes some time during which obsolete reads might happen
 Hence certain synchronization is required to avoid conﬂicts
Distribution Models - Replication - Master-
Slave

In case of master failure, a new one needs to be appointed

 Manually (user-defined)
 Automatically (cluster-elected)
 Since the nodes are identical, appointment can be fast
Master might therefore represent a bottleneck (because of the performance or
failures)
Distribution Models - Replication - Peer-to-
Peer
Peer-to-Peer Architecture

Source: Sadalage, Pramod J. - Fowler, Martin: NoSQL Distilled. Pearson Education, Inc., 2013.
Distribution Models - Replication

Architecture
 All the nodes have equal roles and responsibilities
 All the nodes contain identical data once again
Distribution Models - Replication

Both read and write requests can be handled by any node

 No bottleneck, no single point of failure
 More requests to deal with → more nodes to deploy
Distribution Models - Replication

Both read and write requests can be handled by any node

 Consistency issues
 Unfortunately, multiple write requests can be initiated independently and handled at
the same time
 Hence synchronization is required to avoid conﬂicts
Distribution Models - Sharding and
Replication

Observations with respect to replication:

 Does the replication factor really need to correspond to the number of nodes?
 No, replication factor of 3 will often be the right choice
 Consequences
– Nodes will no longer contain identical data
– Replica placement strategy will be needed
 Sharding and replication can be combined… but how?
Distribution Models - Sharding and
Replication

Combinations of sharding and replication

 Sharding + master-slave replication
 Multiple masters, each for diﬀerent data
 Roles of the nodes can overlap
– Each node can be master for some data
and/or slave for other
Distribution Models - Sharding and
Replication

Combinations of sharding and replication

 Sharding + peer-to-peer replication
 Placement of anything anywhere
CAP Theorem

Assumptions
 System with sharding and replication
 Read and write operations on a single aggregate
CAP properties = properties of a distributed system
 Consistency
 Availability
 Partition tolerance
CAP Theorem

CAP theorem
 It is not possible to have a distributed system that would guarantee consistency,
availability, and partition tolerance at the same time.
 Only 2 of these 3 properties can be enforced.
 But, what do these properties actually mean?
CAP Theorem - Properties

Consistency
 Read and write operations must be executed atomically
 There must exist a total order on all operations such that each operation looks as if it
was completed at a single instant,
 i.e. as if all the operations were executed one by one on a single standalone node
CAP Theorem - Properties

Consistency
 Practical consequence: After a write operation, all readers see the same data
 Since any node can be used for handling of read requests, atomicity of write
operations means that changes must be propagated to all the replicas
CAP Theorem - Properties

Availability
 If a node is working, it must respond to user requests
 Every read or write request received by a non-failing node in the system must result in a
response
CAP Theorem - Properties

Partition tolerance
 System continues to operate even when two or more sets of nodes get isolated
 i.e. a connection failure MUST NOT shut the whole system down
CAP Theorem - Consequences

At most two properties can be guaranteed

 CA = Consistency + Availability
 CP = Consistency + Partition tolerance
 AP = Availability + Partition tolerance
CAP Theorem - Consequences

If at most two properties can be guaranteed…

 CA = Consistency + Availability
 Traditional ACID properties are easy to achieve
 Examples: RDBMS, Google BigTable
 Any single-node system
 However, should the network partition happen, all the nodes must be forced to stop
accepting user requests
CAP Theorem - Consequences

If at most two properties can be guaranteed…

 CP = Consistency + Partition tolerance
 Examples: MongoDB, HBase
CAP Theorem - Consequences

If at most two properties can be guaranteed…

 AP = Availability + Partition tolerance
 New concept of BASE properties
 Examples: Apache Cassandra, Apache CouchDB
 Other examples: web caching, DNS
CAP Theorem - Consequences

Partition tolerance is necessary in clusters

 Why?
 Because it is difficult to detect network failures
 Does it mean that only purely CP and AP systems are possible?
 No…
CAP Theorem - Consequences

The real meaning of the CAP theorem:

 Partition tolerance is a MUST,
 But we can trade off consistency versus availability
 Just a little bit relaxed consistency can bring a lot of availability
 Such trade-offs are not only possible, but often work very well in practice
ACID Properties
Traditional ACID properties
 Atomicity
 Partial execution of transactions is not allowed (all or nothing)
 Consistency
 Transactions bring the database from one consistent (valid) state to another
 Isolation
 Although multiple transactions may execute in parallel, each transaction must take into
consideration the execution of the other
 Durability
 Effects of committed transactions must remain durable
BASE Properties

New concept of BASE properties

 Basically Available
 The system works basically all the time
 Partial failures can occur, but without total system failure
 Soft State
 The system is in ﬂux (unstable)
 Changes occur all the time
 Eventual Consistency
 Sooner or later the system will be in some consistent state
ACID and BASE
ACID
 Choose consistency over availability
 Pessimistic approach
 Implemented by traditional relational databases
BASE
 Choose availability over consistency
 Optimistic approach
 Common in NoSQL databases
 Allows levels of scalability that cannot be acquired with ACID
Current trend in NoSQL:
 Strong consistency → eventual consistency
Consistency

Consistency in general…
 Consistency is the lack of contradiction in the database
 Strong consistency is achievable even in clusters, but eventual consistency might
often be sufficient
 Even when an already unavailable hotel room is booked once again, the situation can be
figured out in the real world
 …
Consistency

Write consistency (update consistency)

 Problem: write-write conﬂict
 Two or more write requests on the same aggregate are initiated concurrently
 Issue: lost update
 Question: Do we need to solve the problem in the ﬁrst place?
Consistency

Write consistency (update consistency)

 Question: Do we need to solve the problem in the first place?
 If yes, than there are two general solutions
 Pessimistic approaches
 Preventing conflicts from occurring
 Techniques: write locks, …
 Optimistic approaches
 Conflicts may occur, but are detected and resolved later on
 Techniques: version stamps, …
Consistency

Read consistency (replication consistency)

 Problem: read-write conflict
 Write and read requests on the same aggregate are initiated concurrently
 Issue: inconsistent read
 When not treated, inconsistency window will exist
 Propagation of changes to all the replicas takes some time
 Until this process is finished, inconsistent reads may happen even the initiator of the
write request may read wrong data!
Conclusion
There is a wide range of options influencing…
 Scalability
– how well the system scales (data and requests)?
 Availability
– when nodes may refuse to handle user requests?
 Consistency
– what level of consistency is required?
 Latency
– how complicated is to handle user requests?
 Durability
– are the committed data written reliably?
 Resilience
– can the data be recovered in case of failures?
It’s good to know these properties and choose the right trade-off

Updated Epidemiolologic Surveillance Policy
100% (1)
Updated Epidemiolologic Surveillance Policy
9 pages
Ch02 - Big Data Storage Concepts
No ratings yet
Ch02 - Big Data Storage Concepts
23 pages
Case Study Tesco
100% (1)
Case Study Tesco
2 pages
Big Data Management Basic Principles
No ratings yet
Big Data Management Basic Principles
55 pages
module 2 nosql
No ratings yet
module 2 nosql
31 pages
NoSQL - Unit2
No ratings yet
NoSQL - Unit2
8 pages
NoSQL Module 2
No ratings yet
NoSQL Module 2
76 pages
module 2
No ratings yet
module 2
36 pages
NoSQL M2
No ratings yet
NoSQL M2
47 pages
Big Data Management and Nosql Databases: Doc. Rndr. Irena Holubova, PH.D
No ratings yet
Big Data Management and Nosql Databases: Doc. Rndr. Irena Holubova, PH.D
27 pages
Distribution Model
100% (1)
Distribution Model
24 pages
NoSQL Databases UNIT-2
No ratings yet
NoSQL Databases UNIT-2
29 pages
Big Data Storage Concepts
No ratings yet
Big Data Storage Concepts
31 pages
BDT Assignment
No ratings yet
BDT Assignment
4 pages
0zI2XrFJX5tR CjuECI f5HwGdQkpL8DAkTmwDPyFm3H0eCERMEvG9fH
No ratings yet
0zI2XrFJX5tR CjuECI f5HwGdQkpL8DAkTmwDPyFm3H0eCERMEvG9fH
13 pages
III-sharding-strategies
No ratings yet
III-sharding-strategies
30 pages
NOSQL_MOD2
No ratings yet
NOSQL_MOD2
25 pages
DrKP-Module-2-1
No ratings yet
DrKP-Module-2-1
77 pages
NOSQL M2-P1-P2 PPT
No ratings yet
NOSQL M2-P1-P2 PPT
75 pages
6q9k5yndkd9j-SDE DF400 020 Full Deck
No ratings yet
6q9k5yndkd9j-SDE DF400 020 Full Deck
81 pages
Big Data - No SQL Databases and Related Concepts
100% (1)
Big Data - No SQL Databases and Related Concepts
101 pages
MODULE 3 (3)
No ratings yet
MODULE 3 (3)
14 pages
NoSQL - Unit 2
No ratings yet
NoSQL - Unit 2
11 pages
Nosql1
No ratings yet
Nosql1
40 pages
Unit 5 NOSQL
No ratings yet
Unit 5 NOSQL
102 pages
MOD5_CH2
No ratings yet
MOD5_CH2
36 pages
Module 2 Nosql
No ratings yet
Module 2 Nosql
10 pages
Module 7 - NoSQL
No ratings yet
Module 7 - NoSQL
34 pages
BDA CH 2 (StorageConcepts)
No ratings yet
BDA CH 2 (StorageConcepts)
33 pages
Nosql Overview: Implementation Free
No ratings yet
Nosql Overview: Implementation Free
40 pages
Chapter24 Nosql Dbs
No ratings yet
Chapter24 Nosql Dbs
35 pages
A Thorough Introduction To Distributed Systems
No ratings yet
A Thorough Introduction To Distributed Systems
31 pages
NoSQL Databases
No ratings yet
NoSQL Databases
8 pages
BDA MODULE 3
No ratings yet
BDA MODULE 3
20 pages
Data Base Ppt.... Dbms
No ratings yet
Data Base Ppt.... Dbms
8 pages
NoSQL M1
No ratings yet
NoSQL M1
48 pages
DBMS Module-5 2024 Chap 2
No ratings yet
DBMS Module-5 2024 Chap 2
25 pages
bda-ia2-bda
No ratings yet
bda-ia2-bda
7 pages
3 Bda Chapter3 Answer
No ratings yet
3 Bda Chapter3 Answer
7 pages
Module-2
No ratings yet
Module-2
100 pages
Unit 4
No ratings yet
Unit 4
13 pages
Sharding in MongoDB
No ratings yet
Sharding in MongoDB
3 pages
Module 7
No ratings yet
Module 7
30 pages
CS3492-DBMS unit-5
No ratings yet
CS3492-DBMS unit-5
9 pages
Nosql Systems: Sharding, Replication and Consistency: Riccardo Torlone Università Roma Tre
No ratings yet
Nosql Systems: Sharding, Replication and Consistency: Riccardo Torlone Università Roma Tre
28 pages
Lec21Notes Merged
No ratings yet
Lec21Notes Merged
20 pages
CH 2 BDA
No ratings yet
CH 2 BDA
3 pages
DISTRIBUTEDSYSTEMSDesignGurus Io
No ratings yet
DISTRIBUTEDSYSTEMSDesignGurus Io
17 pages
NoSQL Databases
No ratings yet
NoSQL Databases
20 pages
BDA Module-3
No ratings yet
BDA Module-3
7 pages
Big data Slides
No ratings yet
Big data Slides
26 pages
04 Surveys Cattell PDF
No ratings yet
04 Surveys Cattell PDF
16 pages
NO SQL IA-01_MICRO
No ratings yet
NO SQL IA-01_MICRO
6 pages
ECS781P-9-Cloud Data Management
No ratings yet
ECS781P-9-Cloud Data Management
79 pages
Nosql
No ratings yet
Nosql
12 pages
Nosql What Does It Mean
No ratings yet
Nosql What Does It Mean
15 pages
Nosql What Does It Mean
No ratings yet
Nosql What Does It Mean
8 pages
Bda Module 3
No ratings yet
Bda Module 3
24 pages
Mongo Nosql
No ratings yet
Mongo Nosql
12 pages
Ebook - Cracking The System Design Interview Course
100% (1)
Ebook - Cracking The System Design Interview Course
91 pages
NoSQL Gnosis. - Resp
No ratings yet
NoSQL Gnosis. - Resp
22 pages
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
OS Shell
No ratings yet
OS Shell
24 pages
Lecture 7
No ratings yet
Lecture 7
35 pages
Lecture 11
No ratings yet
Lecture 11
40 pages
Lecture 10
No ratings yet
Lecture 10
39 pages
4 Column-Family Stores Cassandra
No ratings yet
4 Column-Family Stores Cassandra
44 pages
Key-Value Stores - Updated
No ratings yet
Key-Value Stores - Updated
65 pages
6 Graph Databases Neo4j
No ratings yet
6 Graph Databases Neo4j
46 pages
Rotary Encoder LED Ring User Guide
No ratings yet
Rotary Encoder LED Ring User Guide
5 pages
En R26M Datasheet February2016
No ratings yet
En R26M Datasheet February2016
1 page
BAI 2008 Myanmar Paper
100% (1)
BAI 2008 Myanmar Paper
8 pages
Loan Agreement
No ratings yet
Loan Agreement
17 pages
2122 OSU Affidavit
No ratings yet
2122 OSU Affidavit
1 page
Hotel info2
No ratings yet
Hotel info2
2 pages
Evangelicalism in Modern Britain PDF
0% (3)
Evangelicalism in Modern Britain PDF
2 pages
BTA 10 Human Resource Management
No ratings yet
BTA 10 Human Resource Management
50 pages
Morningstar Bakery: Business Plan
No ratings yet
Morningstar Bakery: Business Plan
14 pages
Agricultural Marketing - AGB341 UNIT 6
No ratings yet
Agricultural Marketing - AGB341 UNIT 6
17 pages
Creating Ethernet Cables
No ratings yet
Creating Ethernet Cables
18 pages
Creative Representation of A Literary Text by Applying Multimedia Skills (Activity)
100% (1)
Creative Representation of A Literary Text by Applying Multimedia Skills (Activity)
2 pages
The Impact of Value Added Tax Vat On Small and Medium Enterprises in A Developing Country
No ratings yet
The Impact of Value Added Tax Vat On Small and Medium Enterprises in A Developing Country
6 pages
Answer To Civil-Law
No ratings yet
Answer To Civil-Law
7 pages
DB Battery Vrla LC-R121R3PG Panasonic en 2010
No ratings yet
DB Battery Vrla LC-R121R3PG Panasonic en 2010
2 pages
Competency-Based Versus Task-Based Job Descriptions - Effects On A
No ratings yet
Competency-Based Versus Task-Based Job Descriptions - Effects On A
64 pages
Wbuhs PG Thesis
100% (3)
Wbuhs PG Thesis
7 pages
E-Auction of Coal India Limited: A Project Report ON
No ratings yet
E-Auction of Coal India Limited: A Project Report ON
52 pages
ENTP MBTI Information
No ratings yet
ENTP MBTI Information
3 pages
File 3 Instructions R
No ratings yet
File 3 Instructions R
3 pages
Module 5 - Game Theory
No ratings yet
Module 5 - Game Theory
14 pages
Soil Engineering
No ratings yet
Soil Engineering
3 pages
PAREDES-08 Task Performance 1 - ARG 222
No ratings yet
PAREDES-08 Task Performance 1 - ARG 222
1 page
Thanavala 12 3 15
No ratings yet
Thanavala 12 3 15
62 pages
b1. Unit 9 - Topic Vocabulary
No ratings yet
b1. Unit 9 - Topic Vocabulary
15 pages
(Ebook) Fulton's Concrete Technology by Mark G. Alexander ISBN 9780992217624, 0992217628 2024 scribd download
100% (2)
(Ebook) Fulton's Concrete Technology by Mark G. Alexander ISBN 9780992217624, 0992217628 2024 scribd download
81 pages
Cert Arnes Industrial Edelweiss HERCULES EVO 2 Full Body
No ratings yet
Cert Arnes Industrial Edelweiss HERCULES EVO 2 Full Body
2 pages
Python (5)
No ratings yet
Python (5)
6 pages