NoSQL - Unit2

Uploaded by

ananthdumpa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views

NoSQL - Unit2

Uploaded by

ananthdumpa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Unit - 2

Distribution Models: Single Server, Shading, Master-Slave Replication, Peer-to-Peer Replication, Combining Shading
and Replication, The CAP Theorem.

Key-Value Databases: What Is a Key-Value Store, Key-Value Store Features, Consistency, Transactions, Query
Features, Suitable Use Cases, When Not to Use.

❖ Aggregate Data Model:

➢ Aggregate means a collection of objects that are treated as a unit. In NoSQL Databases, an
aggregate is a collection of data that interact as a unit. Aggregate Data Models in NoSQL make it
easier for the Databases to manage data storage over the clusters as the aggregate data or unit can
now reside on any of the machines. Whenever data is retrieved from the Database all the data
comes along with the Aggregate Data Models in NoSQL. Aggregate Data Models in NoSQL don’t
support ACID transactions and sacrifice one of the ACID properties. With the help of Aggregate Data
Models in NoSQL, you can easily perform OLAP operations on the Database. You can achieve high
efficiency of the Aggregate Data Models in the NoSQL Database if the data transactions and
interactions take place within the same aggregate.

Types of Aggregate Data Models in NoSQL Databases

The Aggregate Data Models in NoSQL are majorly classified into 4 Data Models listed below:

Key-Value Model

Document Model

Column Family Model

Graph-Based Model

❖ Distribution Models
➢ NoSQL's primary driver of interest has been its ability to run databases on a large cluster. As data
volumes increase, it becomes more difficult and expensive to scale up—buy a bigger server to run
the database on. A more appealing option is to scale out—run the database on a cluster of servers.
Aggregate orientation fits well with scaling out because the aggregate is a natural unit to use for
distribution.
➢ There are two paths to data distribution: replication and sharding. Replication takes the same data
and copies it over multiple nodes. Sharding puts different data on different nodes.
➢ There are two paths to data distribution: replication and sharding. Replication takes the same data
and copies it over multiple nodes. Sharding puts different data on different nodes.
❖ Single Server
➢ This model doesn’t use any distribution; the database is on a single machine - it handles all the
reads and writes. It is easy for operations people to manage and application developers to reason
about.
➢ Graph databases are the obvious category here—these work best in a single-server configuration.

❖ Shading.
➢ A busy data store is busy because different people access different dataset parts. In these
circumstances, we can support horizontal scalability by putting different parts of the data onto
different servers—a technique that’s called sharding
➢ In the ideal case, we have different users all talking to different server nodes. Each user only has to
talk to one server, so gets rapid responses from that server.
➢ Of course the ideal case is a pretty rare beast. In order to get close to it we have to ensure that data
that’s accessed together is clumped together on the same node and that these clumps are arranged
on the nodes to provide the best data access
➢ When it comes to arranging the data on the nodes, there are several factors that can help improve
performance. If you know that most accesses of certain aggregates are based on a physical location,
you can place the data close to where it’s being accessed. If you have orders for someone who lives
in Boston, you can place that data in your eastern US data center.
➢ Many NoSQLdatabases offer auto-sharding, where the database takes on the responsibility of
allocating data to shards and ensuring that data access goes to the right shard. This can make it
much easier to use sharding in an application.
➢ Sharding is particularly valuable for performance because it can improve both read and write
performance. Using replication, particularly with caching, can greatly improve read performance but
does little for applications that have a lot of writes. Sharding provides a way to horizontally scale
writes.
➢ Although the data is on different nodes, a node failure makes that shard’s data unavailable just as
surely as it does for a single-server solution. The resilience benefit it does provide is that only the
users of the data on that shard will suffer; however, it’s not good to have a database with part of its
data missing. With a single server it’s easier to pay the effort and cost to keep that server up and
running; clusters usually try to use less reliable machines, and you’re more likely to get a node
failure. So in practice, sharding alone is likely to decrease resilience.
❖ Master Slave Replication
➢ With master-slave distribution, you replicate data across multiple nodes. One node is designated as
the master, or primary. This master is the authoritative source for the data and is usually responsible
for processing any updates to that data. The other nodes are slaves, or secondaries. A replication
process synchronizes the slaves with the master.
➢ Master-slave replication is most helpful for scaling when you have a read-intensive dataset.
➢ It isn’t such a good scheme for datasets with heavy write traffic, although offloading the read traffic
will help a bit with handling the write load.
➢ Another advantage of master-slave replication is read resilience: Should the master fail, the slaves
can still handle read requests.
➢ The failure of the master does eliminate the ability to handle writes until either the master is restored
or a new master is appointed. However, having slaves as replicates of the master does speed up
recovery after a failure of the master since a slave can be appointed a new master very quickly.
❖ Peer-to-Peer Replication
➢ Master-slave replication helps with read scalability but doesn’t help with scalability of writes. It
provides resilience against failure of a slave, but not of a master. Essentially, the master is still a
bottleneck and a single point of failure.
➢ Peer-to-peer replication attacks these problems by not having a master. All the replicas have equal
weight, they can all accept writes, and the loss of any of them doesn’t prevent access to the data
store.

➢ With a peer-to-peer replication cluster, you can ride over node failures without losing access to data.
Furthermore, you can easily add nodes to improve your performance.
➢ The biggest complication is, again, consistency. When you can write to two different places, you run
the risk that two people will attempt to update the same record at the same time—a write-write
conflict. Inconsistencies on read lead to problems but at least they are relatively transient.
Inconsistent writes are forever.
❖ Combining Sharding & Replication.
➢ We can combine both master-slave replication and sharding this means that we have multiple
masters, but each data item only has a single master. Depending on your configuration, you may
choose a node to be a master for some data and slaves for others, or you may dedicate nodes for
master or slave duties.
➢ Using peer-to-peer replication and sharding is a common strategy for column-family databases. In a
scenario like this you might have tens or hundreds of nodes in a cluster with data sharded over
them. A good starting point for peer-to-peer replication is to have a replication factor of 3, so each
shard is present on three nodes. Should a node fail, then the shards on that node will be built on the
other nodes

❖ Key points of distribution models.

➢ There are two styles of distributing data:
➢ Sharding distributes different data across multiple servers, so each server acts as the single source
for a subset of data.
➢ Replication copies data across multiple servers, so each bit of data can be found in multiple places.
A system may use either or both techniques.
➢ Replication comes in two forms:
■ Master-slave replication makes one node the authoritative copy that handles writes while slaves
synchronize with the master and may handle reads.
■ Peer-to-peer replication allows writes to any node; the nodes coordinate to synchronize their
copies of the data. Master-slave replication reduces the chance of update conflicts but
peer-to-peer replication avoids loading all writes onto a single point of failure.
❖ CAP Theorem
➢ The CAP theorem maintains that a distributed system can deliver only two of three desired
characteristics: consistency, availability, and partition tolerance.
➢ Consistency
■ Consistency means that all clients see the same data at the same time, no matter
which node they connect to. For this to happen, whenever data is written to one node,
it must be instantly forwarded or replicated to all the other nodes in the system before
the write is deemed ‘successful.’
➢ Availability
■ Availability means that any client making a request for data gets a response, even if
one or more nodes are down. Another way to state this—all working nodes in the
distributed system return a valid response for any request, without exception.
➢ Partition tolerance
■ A partition is a communications break within a distributed system—a lost or
temporarily delayed connection between two nodes. Partition tolerance means that
the cluster must continue to work despite any number of communication breakdowns
between nodes in the system.
Key-Value Databases
❖ What Is a Key-Value Store
➢ A key-value store, or key-value database is a simple database that uses an associative array (think
of a map or dictionary) as the fundamental data model where each key is associated with one and
only one value in a collection. This relationship is referred to as a key-value pair.
➢ In each key-value pair the key is represented by an arbitrary string such as a filename, URI or hash.
The value can be any kind of data like an image, user preference file or document. The value is
stored as a blob requiring no upfront data modeling or schema definition.
➢ The storage of the value as a blob removes the need to index the data to improve performance.
However, you cannot filter or control what’s returned from a request based on the value because the
value is opaque.
➢ In general, key-value stores have no query language. They provide a way to store, retrieve and
update data using simple get, put and delete commands; the path to retrieve data is a direct request
to the object in memory or on disk. The simplicity of this model makes a key-value store fast, easy to
use, scalable, portable and flexible.

➢ Some of the popular key-value databases are Riak [Riak], Redis (often referred to as Data Structure
server) [Redis], Memcached DB and its flavors [Memcached], Berkeley DB [Berkeley DB],
HamsterDB (especially suited for embedded use) [HamsterDB], Amazon DynamoDB [Amazon’s
Dynamo] (not open-source), and Project Voldemort [Project Voldemort] (an open-source
implementation of Amazon DynamoDB).
➢ For example, all the user information can be stored as an object in the bucket but it may create
conflicts; the solution is to break down the object (bucket) into smaller buckets.
❖ Key-Value Store Features
➢ Some of the features we will discuss for all the NoSQL data stores are consistency, transactions,
query features.
❖ Consistency
➢ Consistency is a feature only applicable for operations on a single key in a key-value store. There
are various implementations in the key-value store for example in RIAK, the eventually consistent
model of consistency is implemented.
➢ In distributed key-value store implementations like Riak, the eventually consistent model of
consistency is implemented. Since the value may have already been replicated to other nodes, Riak
has two ways of resolving update conflicts: either the newest write wins and older writes loose, or
both (all) values are returned allowing the client to resolve the conflict.
➢ Sample code to create a bucket in Riak
Bucket bucket = connection
.createBucket(bucketName)
.withRetrier(attempts(3))
.allowSiblings(siblingsAllowed)
.nVal(numberOfReplicasOfTheData)
.w(numberOfNodesToRespondToWrite)
.r(numberOfNodesToRespondToRead)
.execute();
➢ If we need data in every node to be consistent, we can increase the
numberOfNodesToRespondToWrite set by w to be the same as nVal. Of course doing that will
decrease the write performance of the cluster.
❖ Transactions:
➢ In it, there are no guarantees on the writes as many data stores implement transactions in different
ways for example RIAK uses the concept of quorum implemented by using the W value replication
factor.
➢ Assume we have a Riak cluster with a replication factor of 5 and we supply the W value of 3. When
writing, the write is reported as successful only when it is written and reported as a success on at
least three of the nodes. This allows Riak to have write tolerance; in our example, with N equal to 5
and with a W value of 3, the cluster can tolerate N - W = 2 nodes being down for write operations,
though we would still have lost some data on those nodes for read.
❖ Query:
➢ All the key-value stores can be query by the key and that’s about it. If we have requirements to query
by using some of the attributes of the column, it is not possible for using the database in this
condition, our application needs to read the value to recognize if the attribute meets the conditions.
➢ Some key-value databases get around this by providing the ability to search inside the value, such
as Riak Search that allows you to query the data just like you would query it using Lucene indexes.
❖ Suitable Usecases
➢ Storing Session Information
■ Generally, every web session is unique and is assigned a unique sessionid value. Applications
that store the sessionid on disk or in an RDBMS will greatly benefit from moving to a key-value
store, since everything about the session can be stored by a single PUT request or retrieved
using GET. This single-request operation makes it very fast, as everything about the session is
stored in a single object. Solutions such as Memcached are used by many web applications,
and Riak can be used when availability is important.
➢ User Profiles, Preferences
■ Almost every user has a unique userId, username, or some other attribute, as well as
preferences such as language, color, timezone, which products the user has access to, and so
on. This can all be put into an object, so getting preferences of a user takes a single GET
operation. Similarly, product profiles can be stored
➢ Shopping Cart Data
■ E-commerce websites have shopping carts tied to the user. As we want the shopping carts to
be available all the time, across browsers, machines, and sessions, all the shopping
information can be put into the value where the key is the userid. A Riak cluster would be best
suited for these kinds of applications.
❖ When not to use
➢ Relationships among Data
■ If you need to have relationships between different sets of data, or correlate the data between
different sets of keys, key-value stores are not the best solution to use, even though some
key-value stores provide link-walking features.
➢ Multioperation Transactions
■ If you’re saving multiple keys and there is a failure to save any one of them, and you want to
revert or roll back the rest of the operations, key-value stores are not the best solution to be
used.
➢ Query by Data
■ If you need to search the keys based on something found in the value part of the key-value
pairs, then key-value stores are not going to perform well for you. There is no way to inspect
the value on the database side, with the exception of some products like Riak Search or
indexing engines like Lucene [Lucene] or Solr [Solr].
➢ Operations by Sets
■ Since operations are limited to one key at a time, there is no way to operate upon multiple
keys at the same time. If you need to operate upon multiple keys, you have to handle this from
the client side.

AWS Certified Solutions Architect - Professional
From Everand
AWS Certified Solutions Architect - Professional
VB Dev
No ratings yet
Ch02 - Big Data Storage Concepts
No ratings yet
Ch02 - Big Data Storage Concepts
23 pages
51zq6l51 The True Power of Water PDF
No ratings yet
51zq6l51 The True Power of Water PDF
6 pages
Distribution Model
100% (1)
Distribution Model
24 pages
NoSQL Module 2
No ratings yet
NoSQL Module 2
76 pages
NoSQL Databases UNIT-2
No ratings yet
NoSQL Databases UNIT-2
29 pages
module 2
No ratings yet
module 2
36 pages
NOSQL_MOD2
No ratings yet
NOSQL_MOD2
25 pages
0zI2XrFJX5tR CjuECI f5HwGdQkpL8DAkTmwDPyFm3H0eCERMEvG9fH
No ratings yet
0zI2XrFJX5tR CjuECI f5HwGdQkpL8DAkTmwDPyFm3H0eCERMEvG9fH
13 pages
NoSQL M2
No ratings yet
NoSQL M2
47 pages
Big Data Management and Nosql Databases: Doc. Rndr. Irena Holubova, PH.D
No ratings yet
Big Data Management and Nosql Databases: Doc. Rndr. Irena Holubova, PH.D
27 pages
Nosql1
No ratings yet
Nosql1
40 pages
module 2 nosql
No ratings yet
module 2 nosql
31 pages
Big Data - No SQL Databases and Related Concepts
100% (1)
Big Data - No SQL Databases and Related Concepts
101 pages
III-sharding-strategies
No ratings yet
III-sharding-strategies
30 pages
NOSQL M2-P1-P2 PPT
No ratings yet
NOSQL M2-P1-P2 PPT
75 pages
DrKP-Module-2-1
No ratings yet
DrKP-Module-2-1
77 pages
Nosql Module 2
100% (1)
Nosql Module 2
87 pages
Nosql Systems: Sharding, Replication and Consistency: Riccardo Torlone Università Roma Tre
No ratings yet
Nosql Systems: Sharding, Replication and Consistency: Riccardo Torlone Università Roma Tre
28 pages
2 NoSQL Databases Principles
No ratings yet
2 NoSQL Databases Principles
58 pages
Big Data Management Basic Principles
No ratings yet
Big Data Management Basic Principles
55 pages
Module 2 Nosql
No ratings yet
Module 2 Nosql
10 pages
Big Data Storage Concepts
No ratings yet
Big Data Storage Concepts
31 pages
BDA CH 2 (StorageConcepts)
No ratings yet
BDA CH 2 (StorageConcepts)
33 pages
NoSQL Databases
No ratings yet
NoSQL Databases
20 pages
Lec21Notes Merged
No ratings yet
Lec21Notes Merged
20 pages
NO SQL IA-01_MICRO
No ratings yet
NO SQL IA-01_MICRO
6 pages
NoSQL - Unit 2
No ratings yet
NoSQL - Unit 2
11 pages
Unit 5 NOSQL
No ratings yet
Unit 5 NOSQL
102 pages
Nosql What Does It Mean
No ratings yet
Nosql What Does It Mean
15 pages
MOD5_CH2
No ratings yet
MOD5_CH2
36 pages
A Thorough Introduction To Distributed Systems
No ratings yet
A Thorough Introduction To Distributed Systems
31 pages
Data Manipulation at Scale
No ratings yet
Data Manipulation at Scale
8 pages
BDT Assignment
No ratings yet
BDT Assignment
4 pages
Nosql What Does It Mean
No ratings yet
Nosql What Does It Mean
8 pages
MODULE 3 (3)
No ratings yet
MODULE 3 (3)
14 pages
Cheat Sheet v2
No ratings yet
Cheat Sheet v2
3 pages
Massively Parallel Cloud Data Storage Systems: S. Sudarshan IIT Bombay
No ratings yet
Massively Parallel Cloud Data Storage Systems: S. Sudarshan IIT Bombay
17 pages
Moving Queries To The Data, Not Data To The Queries
No ratings yet
Moving Queries To The Data, Not Data To The Queries
2 pages
bda-ia2-bda
No ratings yet
bda-ia2-bda
7 pages
BIG Data 2
No ratings yet
BIG Data 2
18 pages
Distributed Databases
No ratings yet
Distributed Databases
53 pages
BDA MODULE 3
No ratings yet
BDA MODULE 3
20 pages
NoSql 2024 Assign2
No ratings yet
NoSql 2024 Assign2
189 pages
CS3492-DBMS unit-5
No ratings yet
CS3492-DBMS unit-5
9 pages
3 Bda Chapter3 Answer
No ratings yet
3 Bda Chapter3 Answer
7 pages
Chap 2 Emerging Database Landscape
No ratings yet
Chap 2 Emerging Database Landscape
10 pages
Nosql
No ratings yet
Nosql
20 pages
Introduction To Nosql: Gabriele Pozzani
No ratings yet
Introduction To Nosql: Gabriele Pozzani
49 pages
Chapter 4 - Distributed Database System
No ratings yet
Chapter 4 - Distributed Database System
52 pages
Unit5_Notes_Short_DB
No ratings yet
Unit5_Notes_Short_DB
6 pages
04 Surveys Cattell PDF
No ratings yet
04 Surveys Cattell PDF
16 pages
20 - 04 - 2024 Cheatsheet
No ratings yet
20 - 04 - 2024 Cheatsheet
3 pages
DBMS Module-5 2024 Chap 2
No ratings yet
DBMS Module-5 2024 Chap 2
25 pages
MDS 271 2448001
No ratings yet
MDS 271 2448001
9 pages
Unit 2
No ratings yet
Unit 2
64 pages
26 Distributed Dbms Nosql
No ratings yet
26 Distributed Dbms Nosql
45 pages
DDIS U1-3
No ratings yet
DDIS U1-3
40 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
29 pages
Big data Slides
No ratings yet
Big data Slides
26 pages
Cheat Sheet v4
No ratings yet
Cheat Sheet v4
3 pages
Sap Company List
No ratings yet
Sap Company List
11 pages
Jupyter Notebook Readthedocs Io en v6.4.5
No ratings yet
Jupyter Notebook Readthedocs Io en v6.4.5
179 pages
Bluetooth-Controlled Smart Car
No ratings yet
Bluetooth-Controlled Smart Car
2 pages
Offer Form
No ratings yet
Offer Form
1 page
Poly Analyst
No ratings yet
Poly Analyst
4 pages
User Manual For Danelec Marine VDR APT Tool
No ratings yet
User Manual For Danelec Marine VDR APT Tool
31 pages
HCM Process and Forms
No ratings yet
HCM Process and Forms
9 pages
Openscape Desk Phone Cp200/Cp205/Cp400/Cp600 Phone Administration Hfa
No ratings yet
Openscape Desk Phone Cp200/Cp205/Cp400/Cp600 Phone Administration Hfa
189 pages
Tilta Nucleus Nano Wireless Focus Control System 317445 User Manual
No ratings yet
Tilta Nucleus Nano Wireless Focus Control System 317445 User Manual
2 pages
Embyserver
No ratings yet
Embyserver
23 pages
Social Media Management System Project Report
No ratings yet
Social Media Management System Project Report
92 pages
Assiengement 4 Software
No ratings yet
Assiengement 4 Software
12 pages
DC SET A Anna University
No ratings yet
DC SET A Anna University
2 pages
A Review RFID Smart Parking System Using IOT
No ratings yet
A Review RFID Smart Parking System Using IOT
5 pages
documents.pub_hitachi-solution-for-databases-oracle-rac-database-12c-hitachi-virtual-storage
No ratings yet
documents.pub_hitachi-solution-for-databases-oracle-rac-database-12c-hitachi-virtual-storage
35 pages
Tech Note 404 - Migrating To InTouch 9.0 - 10
No ratings yet
Tech Note 404 - Migrating To InTouch 9.0 - 10
5 pages
IEOR 4004: Programming Assignment 1: I I T I N I 1 I T I
No ratings yet
IEOR 4004: Programming Assignment 1: I I T I N I 1 I T I
1 page
Security Practise
No ratings yet
Security Practise
177 pages
Final Capstone Project Report
No ratings yet
Final Capstone Project Report
56 pages
Cascading Style Sheets (CSS)
No ratings yet
Cascading Style Sheets (CSS)
10 pages
Citra Log
No ratings yet
Citra Log
10 pages
Event - 34291 - KIPIC - 1016954: Project Information
No ratings yet
Event - 34291 - KIPIC - 1016954: Project Information
3 pages
System Fundamentals
No ratings yet
System Fundamentals
58 pages
Assignment Managment System
No ratings yet
Assignment Managment System
45 pages
HTML
No ratings yet
HTML
12 pages
DVM Dashboard - Via One Support Launchpad
No ratings yet
DVM Dashboard - Via One Support Launchpad
23 pages
FCP_FAZ_AD-7.4 (157 Questions)
No ratings yet
FCP_FAZ_AD-7.4 (157 Questions)
9 pages
H5 Client Deployment Instructions and Helpful Tips - v20
No ratings yet
H5 Client Deployment Instructions and Helpful Tips - v20
18 pages
Buy ebook Pragmatic Python Programming: Learning Python the Smart Way 1st Edition Gabor Guta cheap price
100% (2)
Buy ebook Pragmatic Python Programming: Learning Python the Smart Way 1st Edition Gabor Guta cheap price
40 pages