0% found this document useful (0 votes)
57 views

NOSQL Databases

The document discusses different types of NoSQL data stores including key-value stores, document stores, and column family stores. It provides examples like DynamoDB, MongoDB, Cassandra and HBase. The document outlines reasons for using NoSQL databases like scalability, flexibility and ability to handle large volumes of data across multiple servers. It also covers challenges in scaling relational databases and how NoSQL databases address issues like replication, partitioning, and eventual consistency.

Uploaded by

paresh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views

NOSQL Databases

The document discusses different types of NoSQL data stores including key-value stores, document stores, and column family stores. It provides examples like DynamoDB, MongoDB, Cassandra and HBase. The document outlines reasons for using NoSQL databases like scalability, flexibility and ability to handle large volumes of data across multiple servers. It also covers challenges in scaling relational databases and how NoSQL databases address issues like replication, partitioning, and eventual consistency.

Uploaded by

paresh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

NOSQL Data Stores

Not Only SQL

Tuesday, September 21, 2010


Data Store

Super Set
Relational Databases
Key Value Stores
Document Stores
Column Family Stores

Tuesday, September 21, 2010


Design This Schema
Student Course
Student
Address
Address Score
Course
Score

Tuesday, September 21, 2010


Scalable huh??

Use Case : This schema has to serve the whole student


community in this world
One Big Server?? How Big?
More than 1 Servers. How will that work?

Tuesday, September 21, 2010


WHY NOSQL ?

Scalability : Horizontal
Relational Databases do no good when distributed
NOSQL : Distributed, Flexible Schema, Relaxing Consistency

Tuesday, September 21, 2010


Issues with Relational DB

Scalability
Replication : Scaling by duplication
Partitioning(Sharding) : Scaling by division

Tuesday, September 21, 2010


Replication
Master - Slave
1 write = N * writes (N is number of slaves)
Faster reads ( Can Read from N nodes)
Critical Reads Go to Master (Application Aware)
Limitations of high volumes of data

Tuesday, September 21, 2010


Replication
Multi - Master
Adding more masters
Conflict resolution O(n^3) or O(n^2)

Tuesday, September 21, 2010


Partitioning(Sharding)
Scales Read as well as Writes
Application needs to be Partition Aware
Broken Relationships : Cartesian products across shards ??
Referential Integrity is no more
Rebalancing

Tuesday, September 21, 2010


Consistent Hashing
Hash Ring (Or Clock Face)

Balanced Distribution After Adding a new Node


Tuesday, September 21, 2010
Common Sharding Schemes
Vertical Partitioning
Range Based Partitioning
Hash Based Partitioning
Directory Based Partitioning

Tuesday, September 21, 2010


Can live without !!
UPDATE and DELETE
Loss of Information
Can be modeled as INSERT with versioning
Filter out inactive records

Tuesday, September 21, 2010


Avoid JOINS
Expensive, Fails with partitions
How to avoid?
De - normalize
Storage is cheap now
Burden of Consistency shifts to application

Tuesday, September 21, 2010


Still need ACID ??
Atomicity : Only Single key is enough
Consistency : CAP Theorem
Can only get any two of Consistency, Availability,
Partition Tolerance
Isolation : Not more than Read - Committed (Single Key)
Durability : Node failures. Peer Replication

Tuesday, September 21, 2010


Fixed Schema
Schema comes before Data
Modifying Schema is essential
Adding new features
Modifying Schema is hard
Locking of rows(Add/Modify a column)
Locking of table(Add/Remove index)

Tuesday, September 21, 2010


Model this!!
Hierarchal Data
Graphs

Tuesday, September 21, 2010


Desired Characteristics
High Scalability
Add nodes incrementally
No Diminishing Returns
High Availability
No single point of failure
Node Failures agnostic

Tuesday, September 21, 2010


Desired Characteristics
High Performance
Fast operations
Non - Blocking Writes
Consistency
No need of Strong consistency
Eventual Consistency, Read - Your - Write Consistency

Tuesday, September 21, 2010


Desired Characteristics
Deployment Flexibility
Add/Remove node automatically
NO DFS or shared storage
Should work with commodity heterogenous hardware
Modeling Flexibility
Key - Value Pairs, Hierarchal and Graph Data

Tuesday, September 21, 2010


Desired Characteristics
Query Flexibility
Multi Gets
Range Queries
Upserts

Tuesday, September 21, 2010


Inspiration
Memcached
In-memory Key Value
Blazing Fast
Infinite Horizontal Scalability

Tuesday, September 21, 2010


Key Value Stores
Simple Data Model
Amazon Dynamo
Amazon S3
Project Voldemort
Redis
Scalaris and lot others

Tuesday, September 21, 2010


Amazon Dynamo
Internal to Amazon
Distributed K-V store
Opaque Values
Partitioning
A variant of consistent hashing
Hash Ring division

Tuesday, September 21, 2010


Amazon Dynamo
Partitioning
Mapping Communication via Gossip protocol
Eventually consistent view of mappings
Replication
Each key is replicated on N nodes
Preference List

Tuesday, September 21, 2010


Amazon Dynamo
Replication
Read/Write through Coordinator nodes
Configurations
N = number of replicas
W = min. nodes that must ACK the receipt of a WRITE
R = min. nodes contacted for a READ
R+W > N will ensure Quorum
Tuesday, September 21, 2010
Amazon Dynamo
Tuning (N,R,W)
Increased W means more replication
Increased R mean high consistency low performance
Typical values for Amazon Apps (N,R,W)= (3,2,2)

Tuesday, September 21, 2010


Amazon Dynamo
Consistency
Eventually consistent
Uses Object versioning via Vector Clocks
Consistency Protocol
Return all versions
Reconcile divergent versions
Reconciled version superseding the current is written
Tuesday, September 21, 2010
Amazon Dynamo
Handling Temporary Failures
Hinted Handoff
Handling Permanent Failures
Node Sync

Tuesday, September 21, 2010


Amazon Dynamo
Ring membership
Add/Remove node needs rebalancing
Failure Detection
Gossip about failures
Check periodically about availability and gossip

Tuesday, September 21, 2010


Other K-V Stores
Check out others too. Worth a read and try.
S3,Voldemort,Redis,Scalaris.

Tuesday, September 21, 2010


Document Stores
Step further from K-V stores
Value is full blown record(document)
Document is not Opaque(Expose a structure to perform
operations)
Each document can have different schema e.g JSON
Relations are possible
One to Many and Many to Many
Tuesday, September 21, 2010
Document Stores
Mostly Similar to relational db(except upfront Schema)
Amazon Simple DB
Apache CouchDB
Riak
Mongo DB

Tuesday, September 21, 2010


Mongo DB
We use mongo in a large automated translation software
Data Model
Key - Value, value being binary serialized JSON(BSON)
4 Mb limit on BSON
For larger object use GridFS.
Collections : more of like a table
B-trees used for indexes
Tuesday, September 21, 2010
Mongo DB
Storage
Uses Memory Mapped Files(Cache controlled by OS VMM)
Writes
In place updates
partial updates
Single Document Atomic updates

Tuesday, September 21, 2010


Mongo DB
Queries
JSON style based syntax (powered by js engine)
Support for conditional operators,regex etc
Cursor support
Query optimizers
Map-Reduce over a collection

Tuesday, September 21, 2010


Mongo DB
Replication
Master Slave
Replica Pairs
Master - Master

Tuesday, September 21, 2010


Mongo DB
Partitioning
Auto Sharding Done through chunks(50 Mb max)
Easy node addition
Auto balancing
ZERO single point of failure
Automatic Failover

Tuesday, September 21, 2010


Column Family Stores
Sparse, Distributed, Persistent, Multi-Dimensional sorted Map
Column Keys are grouped into sets called column-families
BigTable
HBase
Cassandra

Tuesday, September 21, 2010


Big Table Column Family

Tuesday, September 21, 2010


Cassandra
Combines distributed architecture of Dynamo with column-
family data model of Big Table

Tuesday, September 21, 2010


Cassandra
Data Model : Multi Dimensional Map indexed by a key
Each app has its own key-space
Key can be any long string. Indexed by cassandra
Column - an attribute of record. Time Stamped
Column-Family: Grouping of columns. Similar to
relational table
Super Columns: List of columns
Tuesday, September 21, 2010
Cassandra
Data Model
Column family can contain any one of column/super
column
KeySpace.ColumnFamily.Key.[SuperColumn].Column
Sorting
Data is sorted at write time
Columns are sorted within their row by column name
(pluggable sorting providers)
Tuesday, September 21, 2010
Cassandra
Partitioning : Mostly Like Dynamo
Consistent hashing under order preserving hash function
Uses Chord approach to load balance(dynamo used v-
node)

Tuesday, September 21, 2010


Cassandra
Replication
Coordinator nodes and preference list as Dynamo
DataCenter aware, rack aware, rack-unaware
Rack aware uses Zookeeper
Membership based on ScuttleButt- anti-entropy gossip

Tuesday, September 21, 2010


Cassandra
Failure Detection
Modified version of Accrual failure detection
Failure Handling
Same as hinted handoff in Dynamo

Tuesday, September 21, 2010


Cassandra
Write
Writing to commit log, followed by an update to
memtable.
Dedicated disk for commit log(Makes write sequential)
No seeks-always sequential, so blazing fast
Atomic With in column family

Tuesday, September 21, 2010


Cassandra
Read
Similar to dynamo to figure out which nodes will serve
Similar to Big Table for storage level

Tuesday, September 21, 2010


Thanks!!!
Due regards to Reddy Raja for this invite.

Tuesday, September 21, 2010

You might also like